Session Information
09 SES 12 A, Findings from International Comparative Achievement Studies: Methodological Challenges
Paper Session
Contribution
International large-scale assessments such as TIMSS, PIRLS, and PISA are charged with monitoring educational achievement around the world in a number of learning areas, including math, science, and reading. A particularly important part of these studies is to track the educational achievement, across time and geographic region, of different policy-relevant populations, including males and females and students from an immigration background. The scale and scope of such studies necessitate that sophisticated data collection designs are used whereby each individual student is administered just a small number of the total possible items, yet all items are administered throughout each of the reporting groups. This approach to item administration is often referred to as item-sampling (Lord, 1962) or, more commonly in current LSA literature, as multiple-matrix sampling (Shoemaker, 1973). Although this method of item delivery is efficient from an administration perspective, the approach poses currently intractable challenges for precisely estimating individual student achievement. Because only a fraction of the students in the population take any one item, and any selected student takes only a fraction of the total available items, the actual distribution of student ability cannot be approximated by its empirical estimate (Mislevy, Johnson, & Muraki, 1992).
To overcome the methodological challenges associated with multiple-matrix sampling, LSA programs adopted a population or latent regression modeling approach that uses marginal estimation techniques to generate population- and subpopulation-level achievement estimates (Mislevy, 1991; Mislevy, Beaton, Kaplan & Sheehan; Mislevy, Johnson & Muraki). Under the population modeling approach, consistent population- and subpopulation-level ability estimates are achieved by treating achievement as missing (latent) data. These data points are missing for all examinees and are ‘filled in’ using an approach analogous to multiple imputation (Rubin, 1976; 1987). As in multiple imputation methods, an imputation model (called a “conditioning model”) is developed to predict individual student achievement values (from the posterior population model). This model uses all available student data (cognitive as well as background information) to generate a conditional proficiency distribution for each student from which to draw a number of plausible values (usually five) for each student on each latent trait (e.g. mathematics, science and associated subdomains). Although these methods are well-established theoretically and empirically, little is known regarding the influence of less-than-optimal quality background data on subpopulation estimates.
This paper focuses on the degree to which subpopulation estimates may be biased as a result of systematically misclassified group membership. Specifically, a Monte Carlo approach, with known item and examinee characteristics, is used to investigate the behavior of group differences when varying proportions of examinees are misclassified on a selection of background variables that are used in the latent regression model. Initial findings suggest that poor quality background data can lead to under- or over-estimates of group differences, particularly across countries and over time. The potential policy implications of this research, particularly as they relate to disadvantaged populations, and likely areas for improvement are also discussed.
Method
Expected Outcomes
References
References Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. Hodges, S. D. and Moore, P. G. (1972). Data uncertainties and least squares regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 21(2), 185-195. Lord, F. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259-267. Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177-196. Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement 29(2), 133-161. Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17(2), 131-154. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. Rubin, D. (1987). Multiple imputation for nonresponse in sample surveys. New York: Wiley. Shoemaker, D. M. (1973). Principles and procedures of multiple matrix sampling. Cambridge, MA: Ballinger Publishing Company.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.