Search CORE

9 research outputs found

State-Of-The-Art Automated Essay Scoring: Competition, Results, and Future Directions from a United States Demonstration

Author: Shermis Mark D
Publication venue: 'Elsevier BV'
Publication date: 01/04/2014
Field of study

This article summarizes the highlights of two studies: a national demonstration that contrasted commercial vendors\u27 performance on automated essay scoring (AES) with that of human raters: and an international competition to match or exceed commercial vendor performance benchmarks. In these studies, the automated essay scoring engines performed well on five of seven measures and approximated human rater performance on the other two. With additional validity studies, it appears that automated essay scoring holds the potential to play a viable role in high-stakes writing assessments. (C) 2013 Elsevier Ltd. All rights reserved

Crossref

The University of Akron

Use of microcomputers in gathering educational research data /

Author: Shermis Mark D.
Publication venue: American Educational Research Association,
Publication date
Field of study

Bibliogr

ÉDUQ

Contrasting state-of-the-art automated scoring of essays: analysis

Author: Ben Hamner
Mark D. Shermis
Publication venue: National Public Radio
Publication date
Field of study

This study compared the results from nine automated essay scoring engines on eight essay scoring prompts drawn from six states that annually administer high-stakes writing assessments. Student essays from each state were randomly divided into three sets: a training set (used for modeling the essay prompt responses and consisting of text and ratings from two human raters along with a final or resolved score), a second test set used for a blind test of the vendor-developed model (consisting of text responses only), and a validation set that was not employed in this study. The essays encompassed writing assessment items from three grade levels (7, 8, 10) and were evenly divided between source-based prompts (i.e., essay prompts developed on the basis of provided source material) or those drawn from traditional writing genre (i.e., narrative, descriptive, persuasive). The total sample size was N = 22,029. Six of the eight essays were transcribed from their original handwritten responses using two transcription vendors. Transcription accuracy rates were computed at 98.70% for 17,502 essays. The remaining essays were typed in by students during the actual assessment and provided in ASCII form. Seven of the eight essays were holistically scored and one employed score assignments for two traits. Scale ranges, rubrics, and scoring adjudications for the essay sets were quite variable. Results were presented on distributional properties of the data (mean and standard deviation) along with traditional measures used in automated essay scoring: exact agreement, exact+adjacent agreement, kappa, quadratic-weighted kappa, and the Pearson r. The results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre

Analysis and Policy Observatory (APO)

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Author: American Educational Research Association American Psychological Association, &amp
Dikli S.
Edward W. Wolfe
Elliot N.
Engelhard G.
Engelhard G.
Foltz P.
Foltz P. W.
Foltz P. W.
George Engelhard
Guilford J. P.
Linacre J. M.
Mark Rosenstein
Myford C. M.
Myford C. M.
Page E. B.
Peter Foltz
Shermis M. D.
Shermis M. D.
Stefanie A. Wind
Wolfe E W.
Wolfe E. W.
Wolfe E. W.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref