Session Information
09 SES 08 C, Assessments, Examinations and Grades
Paper Session
Contribution
The reliability of marking (scoring) in high-stakes assessment is arguably of more concern to the examinees than other aspects of reliability. That is, they are more willing to accept that they might have scored better or worse with different questions, or on a different day, than they are to accept that the performance (script) they produced might have received a different score if it had been marked by a different person. The research reported here aimed to find the most appropriate way to conceptualise and quantify marker reliability in externally-marked school examinations of the kind taken in England at age 16 (GCSEs) and age 18 (A levels). These examinations contain a wide range of question (item) types from highly constrained objective questions to open-ended essays. Although the data came from a single Awarding Body in England, the issues involved in monitoring marker reliability affect all Awarding Bodies and should be relevant to assessment systems in other European countries.
The appropriateness of different frameworks for conceptualising reliability (Classical Test Theory and Item Response Theory) is considered in the light of the operational procedures that are used in practice for monitoring the quality of marking. Some examinations are marked on-screen and monitoring is achieved by ‘seeding’ scripts for which the ‘correct’ or ‘definitive’ marks are known into each examiner’s allocation of scripts to be marked. The ultimate aim of monitoring processes is to ensure that the final grades awarded to examinees reflect as accurately as possible their performance in the examination. The purpose of this research was to find ways of presenting information about marking reliability based on the data collected in ‘live’ examinations processing (as opposed to a research exercise) that are clear, informative, and allow fair and relevant comparisons to be made between examinations in different subjects.
Method
Expected Outcomes
References
Baird, J., Greatorex, J., & Bell, J.F. (2004). What makes marking reliable? Experiments with UK examinations. Assessment in Education: Principles, Policy & Practice, 11(3), 331-348. Black, B., Suto, W.M.I. & Bramley, T. (in press). The interrelations of features of questions, mark schemes and examinee responses and their impact upon marker agreement. Assessment in Education: Principles, Policy & Practice. Bland, J.M., & Altman, D.G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, (i), 307-310. Bramley, T. (2007). Quantifying marker agreement: terminology, statistics and issues. Research Matters: A Cambridge Assessment Publication, 4, 22-28. Johnson, S., & Johnson, R. (2009). Conceptualising and interpreting reliability. Assessment Europe. Ofqual/10/4709. Linacre, J.M. (1994). Many-Facet Rasch Measurement. (2nd ed.). Chicago: MESA Press. Meadows, M., & Billington, L. (2005). A review of the literature on marking reliability. Manchester: AQA. Murphy, R.J.L. (1978). Reliability of marking in eight GCE examinations. British Journal of Educational Psychology, 48, 196-200. Murphy, R.J.L. (1982). A further report of investigations into the reliability of marking of GCE examinations. British Journal of Educational Psychology, 52, 58-63. Newton, P.E. (1996). The reliability of marking of General Certificate of Secondary Education scripts: Mathematics and English. British Educational Research Journal, 22, 405-420. Newton, P.E. (2005a). The public understanding of measurement inaccuracy. British Educational Research Journal, 31(4), 419-442. Vidal Rodeiro, C. (2007). Agreement between outcomes from different double-marking models. Research Matters: A Cambridge Assessment Publication, 4, 28-34.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.