Article thumbnail
Location of Repository

Standardized testing and school accountability

By Dylan Wiliam


This article explores the use of standardized tests to hold schools accountable. The history of testing for accountability is reviewed, and it is shown that currently between-school differences account for less than ten percent of the variance in student scores, in part because the progress of individuals is small compared to the spread of achievement within an age cohort, and, possibly, due to lack of alignment between instruction and assessment. A review of the literature on the effects of the introduction of such tests in high-stakes accountability regimes suggests that the effects can be positive, and the size of the effects is substantial. Therefore, while the validity of such tests may be problematic in terms of the intended inferences, their introduction may nevertheless be justified by their impact. The paper concludes with a number of suggestions on improving tests for high-stakes accountability

Year: 2010
DOI identifier: 10.1080/00461521003703060
OAI identifier:

Suggested articles


  1. (1972). 12-16). Accountability: for whom, to whom, for what?
  2. (1999). 12). Social accountability and educational outcomes: interpreting the episode of payment by results in Victorian England. Paper presented at a meeting of the Social Science History Association held at Fort
  3. (2009). 16 April). Design of the vertical scale: test development, data collection design, linking design. Paper presented at the annual meeting of the American Educational Research Association held at
  4. (1917). 16 November). The Divine Afflatus . New York Evening Mail.
  5. (2004). 5). Reconsidering the impact of high-stakes testing,
  6. (2005). A historical perspective on validity arguments for accountability testing. In
  7. (2001). A steeper, better road to graduation.
  8. (2002). A systematic review of the impact of summative assessment and tests on students’ motivation for learning.
  9. (1996). Accountability in human services collaboratives —for what? And to whom?
  10. (2002). American leadership in the human capital century: have the virtues of the past become the vices of the present? In
  11. (1994). Assessing the effects of standardized testing on teaching in schools. doi
  12. (1976). Assessing the impact of planned social change
  13. (2005). assessment frameworks.
  14. (2001). Authentic intellectual work and standardized tests: conflict or coexistence? Chicago, IL: Consortium on Chicago School Research.
  15. (1994). Can portfolios assess student performance and influence instruction? The 1991-92 Vermont experience (Vol. RP -259).
  16. (1981). Children’s understanding of mathematics:
  17. (1982). Committee of Inquiry into the Teaching of Mathematics in Schools.
  18. (2003). Construct equivalence of multiple-choice and constructedresponse Items: a random effects synthesis of correlations.
  19. (2006). Courting failure: how school finance lawsuits exploit judges' good intentions and harm our children. Stanford, CA: Hoover Institution.
  20. (2003). Do school accountability systems make it more difficult for low performing schools to attract and retain high quality teachers? Paper presented at the Annual Meeting of the American Economic Association held at
  21. (2002). Does external accountability affect student outcomes? A cross-state analysis.
  22. (2005). Does school accountability lead to improved student performance?
  23. (1987). Education and learning to think.
  24. (2007). Educational outcomes and value added by specialist schools:
  25. (1957). Educational Testing Service Cooperative Test Division.
  26. (2002). from Educational Policy Studies Laboratory, Education Policy Research Unit:
  27. (1904). General Intelligence” objectively determined and measured.
  28. (1992). Graded Assessment in Mathematics: teacher's guide.
  29. (2003). High-stakes testing: another analysis.
  30. (2005). High-stakes testing: contexts, characteristics, critiques, and consequences. In
  31. (1997). Hit- and-miss affair.
  32. (2007). Impact of early cognitive and noncognitive skills on later outcomes.
  33. (2005). Instructionally supportive accountability tests in science: a viable assessment option? Measurement: Interdisciplinary Research and Perspectives,
  34. (2007). Intelligence and educational achievement.
  35. (2000). IQ and human intelligence.
  36. (2008). Is test-driven external accountability effective? Synthesizing the evidence from cross-state causal-comparative and correlational studies.
  37. (2008). Learning progressions: supporting instruction and formative assessment.
  38. (2005). Lessons from around the world: how policies, politics and cultures constrain and afford assessment practices.
  39. (1974). Manual for administrators, supervisors and counselors – levels edition (forms 5 &6): Iowa tests of basic skills.
  40. (1980). Mathematical development: primary survey report no 1.
  41. (1980). Mathematical development: secondary survey report no 1.
  42. (1992). Monitoring school performance: a guide for educators. doi
  43. (1988). Nationally normed elementary achievement testin g in America’s public schools: How all fifty states are above the national average. Educational Measurement: Issues and Practice,
  44. (2002). On the evaluation of systemic science education reform: searching for instructional sensitivity.
  45. (2007). Once you know what they've learned, what do you do next? Designing curriculum and assessment for growth. In
  46. (2000). Organisation for Economic Cooperation and Development.
  47. (1994). Payment by results: an example of assessment in elementary education from nineteenth century Britain. Education Policy Analysis Archives,
  48. (1974). Robert Lowe and Education.
  49. (1989). Scaling, norming and equating. In
  50. (2006). School value added measures in England: a paper for the OECD project on the development of value-added models in education systems.
  51. (2009). Secondary students' understanding of mathematics 30 years on. Paper presented at the Annual meeting of the British Educational Research Association.
  52. (2002). Seven lessons learned from minimum competency testing.
  53. (1988). Statistical power analysis for the behavioural sciences. Hillsdale, NJ: Lawrence Erlbaum Associates.
  54. (2007). The costs and benefits of an excellent education for all of America’s children.
  55. (1983). The courts, validity and minimum competency testing.
  56. (1991). The effects of high-stakes testing: preliminary evidence about generalization across tests.
  57. (2002). The impact of high-stakes tests on student academic performance: An analysis of NAEP results in states with high-stakes tests and ACT, SAT, and AP test results in states with high school graduation exams.
  58. (2006). The nation's report card: Mathematics
  59. (2004). The New York adequacy study: determining the cost of providing all children in New York an adequate education. Washington, DC: American Institutes for Research/Management Analysis and Planning.
  60. (2005). The rich, robust research literature on testing's achievement benefits. In
  61. (2004). The role of classroom assessment in student performance on TIMSS.
  62. (2008). The role of the OECD in international comparative studies of achievement.
  63. (2008). The social and personal benefits of learning: a summary of key research findings.
  64. (1986). Understanding of number concepts in lowattaining 7 -9 year olds: part 1. Development of descriptive framework and diagnostic instrument.
  65. (1986). Understanding of number concepts in lowattaining 7 -9 year olds: part II. The teaching studies.
  66. (2005). Using student progress to evaluate teachers: a primer on valueadded models.
  67. (1992). Value-added attacks? Technical issues in publishing national curriculum assessments. doi
  68. (2010). What counts as evidence of educational achievement? The role of constructs in the pursuit of equity in assessment. In doi
  69. (2001). Why do students learn more when achievement is examined externally? Retrieved
  70. xxi). London: Her Majesty’s Stationery Office.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.