Testing Basic IRT Assumptions on Students’ Assessment Data: An Application to the Italian Context

Author(s):

Michela Gnaldi(presenting / submitting)Francesco Bartolucci

Conference:

ECER 2011

Network:

9. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 08 B, Theoretical and Methodological Issues in Testing and Measurement (Part 1)

Paper Session

Time:

2011-09-15

08:30-10:00

Room:

KL 23/138,G, 21

Chair:

Pasi Reinikainen

Contribution

In the Italian system, assessment data on primary, lower middle, and high-school students are yearly collected by the National Institute for the Evaluation of the Educational System (INVALSI). Before collecting these data, INVALSI questionnaires are calibrated using the outcomes from pretesting sessions. These preliminary data are analysed by standard Item Response Theory (IRT) models, such as Rasch (1961) model.

In this paper, we focus on data collected on middle school students; these data are having an increasing relevance in the Italian educational context and their collection will become compulsory in the near future. In particular, we aim at studying if the assumptions of the IRT models used by the INVALSI are met for the “live” data collected by this Institution. In particular, we focus on the assumption of unidimensionality, which characterizes the most used IRT models. The data are based on a nationally representative sample made of 27,592 students within 1,305 schools (one class is sampled in each school) and refer to the students’ performance on the reading comprehension national test administered in June 2009. The methodology applied is based on a sequence of likelihood ratio tests between pairs of models belonging to a class of multidimensional latent class IRT models studied by Bartolucci (2007); see also Goodman (1974), Lazarsfeld and Henry (1968), Martin-Löf (1973), Verhelst (2001), and Bartolucci and Forcina (2005).

According to the assumption of unidimensionality, the difference between two subjects in responding to a set of items depends on a single latent trait, which corresponds to the ability measured by the items. Obviously, if unidimensionality does not hold, the conclusions reached on the basis of a unidimensional IRT model may be misleading and summarizing the test performance of a subject through a single score is not sensible any more. Several authors have dealt with testing unidimensionality in connection with the Rasch model (Glas and Verhelst, 1995; Verhelst, 2001). One of the main contributions is due to Martin-Löf (1973) who developed a likelihood ratio test for the hypothesis that the Rasch model holds for the whole set of items against the hypothesis that this model holds for two disjoint subsets of items defined in advance. A major problem of this test is that it implicitly assumes that the items discriminate equally well between subjects. Therefore, the test based on this assumption may lead to wrong conclusions when items have different discriminating power, as it often happens. In this contribution, we address the above issues through multidimensional IRT models (Bartolucci, 2007) in which (i) a two-parameter logistic parametrization (Birnbaum, 1968) may be even used for the probability of success in responding to an item, given the ability; and (ii) the latent traits are represented through a random vector with a discrete distribution, any level of which identifies a different latent class in the population of students.

Method

The applied methodology is based on the comparison between different multidimensional IRT models. These models (Bartolucci, 2007) are of latent class type, as they rely on the assumption that the population under study is made up by a finite number of classes, with subjects in the same class having the same ability level. This way of representing the ability distribution is more flexible than that based on a continuous distribution, such as the Normal distribution, and is compatible with the assumption of multidimensionality. The latter means that the adopted questionnaire indeed measures more than one type of ability (or dimension); we formulate this assumption by relying on a two-parameter logistic parameterization, which is well known in IRT. The comparison between models belonging to the above family of models is based on a likelihood ratio test. Just to fix the ideas, consider the case of testing if two different groups of items measure only one dimension instead of two. This hypothesis is tested by comparing a bidimensional model, in which the two groups of items are held distinct, with the unidimensional counterpart on the basis of the difference between their maximum log-likelihoods. We compute these log-likelihoods through an EM algorithm.

Expected Outcomes

Through the application to the INVALSI “live” data, it will be shown how the above mentioned method can be used to test the hypothesis of unidimensionality against a specific multidimensional alternative, even when all test items are not assumed to have the same discriminating power. In fact, since we rely on two-parameter logistic parametrization, these items are allowed to discriminate differently between subjects; this is an interesting feature with respect to more traditional method, such as that of Martin-Löf (1973). Given that the national performance test used for this application is developed to assess only one ability (i.e. students’ ability on reading comprehension) it is expected that the hypothesis of unidimensionality is met. Otherwise, the present method will allow to identify different groups of items, corresponding to different, although correlated, abilities. In fact, by performing a sequence of likelihood ratio tests between nested multidimensional IRT models, of the type considered here, it is possible to cluster items according to the dimension they indeed measure. The results of this clustering procedure may be effectively illustrated by a dendrogram.

References

Bartolucci, F., & Forcina, A. (2005). Likelihood inference on the underlying structure of IRT models. Psychometrika, 70, 31–43. Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141–157. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord & M.R. Novick (eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley. Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231. Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. Boston: Houghton Mifflin. Martin-Löf, P. (1973). Statistiska modeller.Anteckningar från seminarier lasåret 1969–1970, utarbetade av Rolf Sundberg. Obetydligt ändrat nytryck, October 1973. Stockholm: Institütet för Försäkringsmatemetik ochMatematisk Statistisk vid Stockholms Universitet. Rasch, G. (1961). On general laws and the meaning of measurement in psychology. Proceedings of the IV Berkeley Symposium on Mathematical Statistics and Probability, 4, 321–333. Verhelst, N.D. (2001). Testing the unidimensionality assumption of the Rasch model. Methods of Psychological ResearchOnline, 6, 231–271.

Author Information

Michela Gnaldi (presenting / submitting)

University of Perugia

Economics, Finance and Statistics

Perugia

Francesco Bartolucci

University of Perugia, Italy

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.