22 research outputs found

    Maintaining and monitoring quality of a continuously administered digital assessment

    Get PDF
    Digital-first assessments are a new generation of high-stakes assessments that can be taken anytime and anywhere in the world. The flexibility, complexity, and high-stakes nature of these assessments pose quality assurance challenges and require continuous data monitoring and the ability to promptly identify, interpret, and correct anomalous results. In this manuscript, we illustrate the development of a quality assurance system for anomaly detection for a new high-stakes digital-first assessment, for which the population of test takers is still in flux. Various control charts and models are applied to detect and flag any abnormal changes in the assessment statistics, which are then reviewed by experts. The procedure of determining the causes of a score anomaly is demonstrated with a real-world example. Several categories of statistics, including scores, test taker profiles, repeaters, item analysis and item exposure, are monitored to provide context and evidence for evaluating the score anomaly as well as assure the quality of the assessment. The monitoring results and alerts are programmed to be automatically updated and delivered via an interactive dashboard every day

    The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Processes Within Assessment Design

    Get PDF
    Evidence-centered design (ECD) is a framework for the design and development of assessments that ensures consideration and collection of validity evidence from the onset of the test design. Blending learning and assessment requires integrating aspects of learning at the same level of rigor as aspects of testing. In this paper, we describe an expansion to the ECD framework (termed e-ECD) such that it includes the specifications of the relevant aspects of learning at each of the three core models in the ECD, as well as making room for specifying the relationship between learning and assessment within the system. The framework proposed here does not assume a specific learning theory or particular learning goals, rather it allows for their inclusion within an assessment framework, such that they can be articulated by researchers or assessment developers that wish to focus on learning

    Innovative assessment of collaboration

    No full text
    ix, 330 hal.: ilus.; 24 cm

    Examining the Impact of Covariates on Anchor Tests to Ascertain Quality Over Time in a College Admissions Test

    No full text
    We propose a comprehensive procedure for the implementation of a quality control process of anchor tests for a college admissions test with multiple consecutive administrations. We propose to examine the anchor tests and their items in connection with covariates to investigate if there was any unusual behavior in the anchor test results over time and if the test results differed for different groups of test takers. Descriptive statistics, ANOVA, and linear mixed effect models were used to examine the impact of covariates on the anchor test over time. Descriptive statistics and logistic regression were used to examine the quality of each anchor item over time. It is concluded that examining the impact of the test takers' covariates at different administrations can help in the process of providing fair tests for all test takers. The results are discussed with recommendations how it can be used in other large testing programs

    Computerized adaptive and multistage testing with R: using packages catR and mstR

    No full text
    The goal of this guide and manual is to provide a practical and brief overview of the theory on computerized adaptive testing (CAT) and multistage testing (MST) and to illustrate the methodologies and applications using R open source language and several data examples.  Implementation relies on the R packages catR and mstR that have been already or are being developed by the first author (with the team) and that include some of the newest research algorithms on the topic. The book covers many topics along with the R-code: the basics of R, theoretical overview of CAT and MST, CAT designs, CAT assembly methodologies, CAT simulations, catR package, CAT applications, MST designs, IRT-based MST methodologies, tree-based MST methodologies, mstR package, and MST applications.  CAT has been used in many large-scale assessments over recent decades, and MST has become very popular in recent years.  R open source language also has become one of the most useful tools for applications in almost all fields, including business and education.  Though very useful and popular, R is a difficult language to learn, with a steep learning curve.  Given the obvious need for but with the complex implementation of CAT and MST, it is very difficult for users to simulate or implement CAT and MST.  Until this manual, there has been no book for users to design and use CAT and MST easily and without expense; i.e., by using the free R software.  All examples and illustrations are generated using predefined scripts in R language, available for free download from the book's website. Provides exhaustive descriptions of CAT and MST processes in an R environment  Guides users to simulate and implement CAT and MST using R for their applications   Summarizes the latest developments and challenges of packages catR and mstR  Provides R packages catR and mstR and illustrates to users how to do CAT and MST simulations and implementations using R David Magis, PhD, is Research Associate of the “Fonds de la Recherche Scientifique – FNRS”  at the Department of Education, University of Liège, Belgium. His specialization is statistical methods in psychometrics, with special interest in item response theory, differential item functioning and computerized adaptive testing. His research interests include both theoretical and methodological development as well as open source implementation and dissemination in R. He is the main developer and maintainer of the packages catR and mstR, among others. Duanli Yan, PhD, is Manager of Data Analysis and Computational Research for Automated Scoring group in the Research and Development division at the Educational Testing Service (ETS).  She is also an Adjunct Professor at Rutgers University.  Dr. Yan has been the statistical coordinator for the EXADEP™ test, and the TOEIC® Institutional programs, a Development Scientist for innovative research applications, and a Psychometrician for several operational programs.  Dr. Yan received many awards, including the 2011 ETS Presidential Award, the 2013 NCME Brenda Lyod award, and the 2015 IACAT Early Career Award.  She is a co-editor for Computerized Multistage Testing: Theory and Applications and a co-author for Bayesian Networks in Educational Assessment.   Alina A. von Davier, PhD, is Senior Research Director of the Computational Psychometrics Research Center at Educational Testing Service (ETS) and an Adjunct Professor at Fordham University.  At ETS she leads the Computational Psychometrics Research Center, where she is responsible for developing a team of experts and a psychometric research agenda in support of next generation assessments.  Computational psychometrics, which include machine learning and data mining techniques, Bayesian inference methods, stochastic processes and psychometric models are the main set of tools employed in her current work.  She also works with psychometric models applied to educational testing: test score equating methods, item response theory models, and adaptive testing. 

    A note on the Poisson's binomial distribution in Item Response Theory

    No full text
    The Poisson's binomial (PB) is the probability distribution of the number of successes in independent but not necessarily identically distributed binary trials. The independent non-identically distributed case emerges naturally in the field of item response theory, where answers to a set of binary items are conditionally independent given the level of ability, but with different probabilities of success. In many applications, the number of successes represents the score obtained by individuals, and the compound binomial (CB) distribution has been used to obtain score probabilities. It is shown here that the PB and the CB distributions lead to equivalent probabilities. Furthermore, one of the proposed algorithms to calculate the PB probabilities coincides exactly with the well-known Lord and Wingersky (LW) algorithm for CBs. Surprisingly, we could not find any reference in the psychometric literature pointing to this equivalence. In a simulation study, different methods to calculate the PB distribution are compared with the LW algorithm. Providing an exact alternative to the traditional LWapproximation for obtaining score distributions is a contribution to the field

    Computational Psychometrics for the Measurement of Collaborative Problem Solving Skills

    No full text
    This paper describes a psychometrically-based approach to the measurement of collaborative problem solving skills, by mining and classifying behavioral data both in real-time and in post-game analyses. The data were collected from a sample of middle school children who interacted with a game-like, online simulation of collaborative problem solving tasks. In this simulation, a user is required to collaborate with a virtual agent to solve a series of tasks within a first-person maze environment. The tasks were developed following the psychometric principles of Evidence Centered Design (ECD) and are aligned with the Holistic Framework developed by ACT. The analyses presented in this paper are an application of an emerging discipline called computational psychometrics which is growing out of traditional psychometrics and incorporates techniques from educational data mining, machine learning and other computer/cognitive science fields. In the real-time analysis, our aim was to start with limited knowledge of skill mastery, and then demonstrate a form of continuous Bayesian evidence tracing that updates sub-skill level probabilities as new conversation flow event evidence is presented. This is performed using Bayes' rule and conversation item conditional probability tables. The items are polytomous and each response option has been tagged with a skill at a performance level. In our post-game analysis, our goal was to discover unique gameplay profiles by performing a cluster analysis of user's sub-skill performance scores based on their patterns of selected dialog responses

    Gamified performance assessment of collaborative problem solving skills

    No full text
    In this paper we introduce a game-based approach for Collaborative Problem Solving (CPS) Skills assessment and provide preliminary evidence from a validation pilot study. To date, educational assessments have focused more heavily on the concrete, and accessible aspects of CPS with a diminished representation of the social aspects of CPS. We addressed this issue through the integration of our CPS construct into the game-based assessment "Circuit Runner" in which participants interact with a virtual agent to solve a series of challenges in a first-person maze environment (von Davier, 2017). Circuit Runner provides an environment that allows for controlled interdependence between a user and a virtual agent that facilitates the demonstration of the broad range of cognitive and social skills required for effective CPS. Tasks are designed to incorporate telemetry-based (e.g., log file, clickstream, interaction-based) and item response data to provide a more comprehensive measure of CPS skills. Our study included 500 participants on Amazon Mechanical Turk, who completed Circuit Runner, pre- and post-game surveys, and a CPS situational judgment test (CPS-SJT). These elements, in conjunction with the game-play, allowed for an expanded exploration of CPS skills with different modalities and types of instruments. The findings support and extend efforts to provide a stronger theoretical and empirical foundation for insights regarding CPS as a skillset, as well as the design of scalable game-based CPS assessments
    corecore