71 research outputs found
Recommended from our members
How Do American Students Measure Up? Making Sense of International Comparisons
Recommended from our members
Self-Monitoring Assessments for Educational Accountability Systems
Test-based accountability is now the cornerstone of U.S. education policy, and it is becoming more important in many other nations as well. Educators sometimes respond to test-based accountability in ways that produce score inflation. In the past, score inflation has usually been evaluated by comparing trends in scores on a high-stakes test to trends on a lower-stakes audit test. However, separate audit tests are often unavailable, and their use has several important drawbacks, such as potential bias from motivational differences. As an alternative, we propose self-monitoring assessments (SMAs) that incorporate audit components into operational high-stakes assessments. This paper provides a framework for designing SMAs. It describes five specific SMA designs that could be incorporated into the non-equivalent groups anchor test linking approaches used by most large-scale assessments and discusses analytical issues that would arise in their use
Recommended from our members
Sensitivity of School-Performance Ratings to Scaling Decisions
Policymakers usually leave decisions about scaling the scores used for accountability to their appointed technical advisory committees and the testing contractors. However, scaling decisions can have an appreciable impact on school ratings (Briggs & Weeks, 2009). Using middle-school data from New York State, we examined the consistency of school ratings based on two scaling approaches that differed in scaling decisions that are important in high-stakes testing contexts. We found that, depending on subject, grade, and year, a switch in scaling approach led to (1) average absolute shifts in ranks of between 50 and 132 positions (median = 69), which are appreciable shifts for a listing of 1,243 schools; and (2) between 7% and 45% (average = 20%) of schools experiencing shifts in assigned performance bands, depending on the classification scheme. Further, the effect of scaling approach was larger when the raw-score distribution has more severe ceiling effect, and in these cases, it was driven primarily by the difference in the location of the highest obtainable scale score from the two scaling approaches.
Recommended from our members
Adapting Educational Measurement to the Demands of Test-Based Accountability
Accountability has become a primary function of large-scale testing in the U.S. The pressure on educators to raise scores is vastly greater than it was several decades ago. Research has shown that high-stakes testing can generate behavioral responses that inflate scores, often severely. I argue that because of these responses, using tests for accountability necessitates major changes in the practices of educational measurement. The needed changes span the entire testing endeavor. This paper addresses implications for design, linking, and validation. It offers suggestions about possible new approaches and calls for research evaluating them
Adapting Educational Measurement to the Demands of Test-Based Accountability
Accountability has become a primary function of large-scale testing in the U.S. The pressure on educators to raise scores is vastly greater than it was several decades ago. Research has shown that high-stakes testing can generate behavioral responses that inflate scores, often severely. I argue that because of these responses, using tests for accountability necessitates major changes in the practices of educational measurement. The needed changes span the entire testing endeavor. This paper addresses implications for design, linking, and validation. It offers suggestions about possible new approaches and calls for research evaluating them
Recommended from our members
The roots of score inflation: An examination of opportunities in two states' tests
Recommended from our members
The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests
Testing and Diversity in Postsecondary Education: The Case of California
The past several years have seen numerous efforts to scale back or eliminate affirmative action in postsecondary admissions. In response, policymakers and postsecondary institutions in many states are searching for ways to maintain the diversity of student populations without resorting to a prohibited focus on race. In response to these changes, this study used data from California and a simplified model of the University of California admissions process to explore how various approaches to admissions affect the diversity of the admitted student population. "Race-neutral" admissions based solely on test scores and grades were compared with the results of actual admissions before and after the elimination of affirmative action. A final set of analyses explored the effects on diversity of alternative approaches that take into account factors other than grades and scores, but not race or ethnicity. Replacing the former admissions process that included preferences with a race-neutral model based solely on GPA and SAT-I scores substantially reduced minority representation at the two most selective UC campuses but had much smaller effects at the other six, less selective campuses. SAT-I scores contributed to but were not the sole cause of the underrepresentation of African American and Hispanic students. A race-neutral model based solely on GPA also produced an underrepresentation of minorities, albeit a less severe one. None of the alternative admissions models analyzed could replicate the composition of the student population that was in place before the termination of affirmative action in California. The only approach that substantially increased the representation of minority students was accepting most students on the basis of within-school rather than statewide rankings, and this approach caused a sizable drop in both the average SAT scores and the average GPA of admitted applicants, particularly among African American and Hispanic students. Although admissions systems differ, the basic findings of this study are likely to apply at a general level to many universities and underscore the difficulty of providing proportional representation for underserved minority students at highly selective institutions without explicit preferences
Recommended from our members
Auditing for Score Inflation Using Self-Monitoring Assessments: Findings from Three Pilot Studies
Research has shown that test-based accountability programs often produce score inflation. Most studies have evaluated inflation by comparing trends on a high-stakes test and a lower-stakes audit test. However, Koretz and Benguin (2010) noted the weaknesses of using external audit tests and suggested instead using self-monitoring assessments (SMAs), which incorporate into high-stakes tests audit items that are not susceptible to test preparation aimed at more predictable items. This paper reports the results of the first three trials of the SMA approach, evaluating whether SMAs can detect inflation in a context in which it has been demonstrated to exist. The studies were conducted with the New York State mathematics tests in grades 4, 7, and 8 in 2011 and 2012. Despite a severe conservative bias created by numerous aspects of the study designs, we found that the audit component functioned as expected in many of the trials. The difference in performance between nonaudit and audit items was associated with factors that earlier research showed to be related to test preparation and score inflation, such as "bubble-student" status (scoring just below the Proficient cut in the previous year) and school poverty. However, a number of trials yielded null findings. These findings underscore the need for additional research investigating the optimal characteristics of audit items
- …