Sensitivity of Achievement Estimates to Conditioning Model Misclassification

Author(s):

Leslie RutkowskiLeslie Rutkowski(presenting / submitting)

Conference:

ECER 2011

Network:

9. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 12 A, Findings from International Comparative Achievement Studies: Methodological Challenges

Paper Session

Time:

2011-09-16

08:30-10:00

Room:

KL 23/121a,G, 45

Chair:

Michael Martin

Contribution

International large-scale assessments such as TIMSS, PIRLS, and PISA are charged with monitoring educational achievement around the world in a number of learning areas, including math, science, and reading. A particularly important part of these studies is to track the educational achievement, across time and geographic region, of different policy-relevant populations, including males and females and students from an immigration background. The scale and scope of such studies necessitate that sophisticated data collection designs are used whereby each individual student is administered just a small number of the total possible items, yet all items are administered throughout each of the reporting groups. This approach to item administration is often referred to as item-sampling (Lord, 1962) or, more commonly in current LSA literature, as multiple-matrix sampling (Shoemaker, 1973). Although this method of item delivery is efficient from an administration perspective, the approach poses currently intractable challenges for precisely estimating individual student achievement. Because only a fraction of the students in the population take any one item, and any selected student takes only a fraction of the total available items, the actual distribution of student ability cannot be approximated by its empirical estimate (Mislevy, Johnson, & Muraki, 1992).

To overcome the methodological challenges associated with multiple-matrix sampling, LSA programs adopted a population or latent regression modeling approach that uses marginal estimation techniques to generate population- and subpopulation-level achievement estimates (Mislevy, 1991; Mislevy, Beaton, Kaplan & Sheehan; Mislevy, Johnson & Muraki). Under the population modeling approach, consistent population- and subpopulation-level ability estimates are achieved by treating achievement as missing (latent) data. These data points are missing for all examinees and are ‘filled in’ using an approach analogous to multiple imputation (Rubin, 1976; 1987). As in multiple imputation methods, an imputation model (called a “conditioning model”) is developed to predict individual student achievement values (from the posterior population model). This model uses all available student data (cognitive as well as background information) to generate a conditional proficiency distribution for each student from which to draw a number of plausible values (usually five) for each student on each latent trait (e.g. mathematics, science and associated subdomains). Although these methods are well-established theoretically and empirically, little is known regarding the influence of less-than-optimal quality background data on subpopulation estimates.

This paper focuses on the degree to which subpopulation estimates may be biased as a result of systematically misclassified group membership. Specifically, a Monte Carlo approach, with known item and examinee characteristics, is used to investigate the behavior of group differences when varying proportions of examinees are misclassified on a selection of background variables that are used in the latent regression model. Initial findings suggest that poor quality background data can lead to under- or over-estimates of group differences, particularly across countries and over time. The potential policy implications of this research, particularly as they relate to disadvantaged populations, and likely areas for improvement are also discussed.

Method

To investigate the impact of misclassified background data on subpopulation achievement, a 70-item assessment was simulated according to a number of known item and person parameters. The 70 items were then assembled into seven booklets containing three blocks with ten multiple-choice items each. Under this design every examinee attempted 30 items. For the latent regression, background variables with three levels (high, medium, and low) and a proficiency mean and variance associated with each level were used as generating ability distributions for each of the subpopulations, with 1,000 examinees each (N=9,000). Using known item parameters and generating examinee ability distributions, responses were simulated, with the probability of a correct answer determined by an examinee’s ability. This test administration was replicated 500 times. Three-parameter item response theory models were then fit to the resulting 500 examinee by item response matrices to estimate item parameters. To model the effect of misclassified background information, several conditions were simulated. For example, 10, 20, and 30 percent of examinees with a high level of background variable 1 were randomly reclassified as low on the same variable. Sub-population achievement was then estimated for each simulated condition and compared to the results for complete and accurate background information.

Expected Outcomes

Given the political sensitivities associated with making comparisons across subgroups on variables such as socioeconomic status and gender, it is important to understand the impact that less-than-optimal background instruments can have on achievement estimates in large-scale assessment. The current paper investigates, in a limited but controlled context, the impact of misclassified background data on subpopulation achievement. In particular, test data are simulated under a multiple-matrix sampled design. Then, subpopulation level achievement is estimated for each of the study conditions. Finally, the results of each condition are compared with the correctly classified estimates. Data for the current study have been simulated and sub-population achievement estimates for the correctly classified background data conditions have been completed. The remainder of the analysis is under way, with a full draft expected by 1 April. In related work ([removed], 2011), results suggest that inaccuracies in background information effect subpopulation estimates, particularly when compared across time or across other populations. It is expected that results from the current study will be similar. That is, sub-population estimates will be biased when student-level information is inaccurate. This is similar to the situation when predictors in a regression model contain measurement error (Hodges & Moore, 1972).

References

References Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. Hodges, S. D. and Moore, P. G. (1972). Data uncertainties and least squares regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 21(2), 185-195. Lord, F. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259-267. Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177-196. Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement 29(2), 133-161. Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17(2), 131-154. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. Rubin, D. (1987). Multiple imputation for nonresponse in sample surveys. New York: Wiley. Shoemaker, D. M. (1973). Principles and procedures of multiple matrix sampling. Cambridge, MA: Ballinger Publishing Company.

Author Information

Leslie Rutkowski

Indiana University

Counseling and Educational Psychology

Bloomington

Leslie Rutkowski (presenting / submitting)

Indiana University, United States of America

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.