622 research outputs found

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

    Modeling of Responses and Response Times with the Package cirt

    Get PDF
    In computerized testing, the test takers' responses as well as their response times on the items are recorded. The relationship between response times and response accuracies is complex and varies over levels of observation. For example, it takes the form of a tradeoff between speed and accuracy at the level of a fixed person but may become a positive correlation for a population of test takers. In order to explore such relationships and test hypotheses about them, a conjoint model is proposed. Item responses are modeled by a two-parameter normal-ogive IRT model and response times by a lognormal model. The two models are combined using a hierarchical framework based on the fact that response times and responses are nested within individuals. All parameters can be estimated simultaneously using an MCMC estimation approach. A R-package for the MCMC algorithm is presented and explained

    A Two-Level Adaptive Test Battery

    Get PDF
    A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint distribution of their abilities. The presentation of the model is followed by an optimized MCMC algorithm to update the posterior distribution of each of its ability parameters, select the items to Bayesian optimality, and adaptively move from one subtest to the next. Thanks to extremely rapid convergence of the Markov chain and simple posterior calculations, the algorithm can be used in real-world applications without any noticeable latency. Finally, an empirical study with a battery of short diagnostic subtests is shown to yield score accuracies close to traditional one-level adaptive testing with subtests of double lengths.</p

    Bayesian psychometric scaling

    Get PDF
    In educational and psychological studies, psychometric methods are involved in the measurement of constructs, and in constructing and validating measurement instruments. Assessment results are typically used to measure student proficiency levels and test characteristics. Recently, Bayesian item response models received considerable attention to analyze test data and to measure latent variables. Bayesian psychometric modeling allows to include prior information about the assessment in addition to information available in the observed response data. An introduction is given to Bayesian psychometric modeling, and it is shown that this approach is very flexible, provides direct estimates of student proficiencies, and depends less on asymptotic results. Various Bayesian item response models are discussed to provide insight in Bayesian psychometric scaling and the Bayesian way of making psychometric inferences. This is done according to a general multilevel modeling approach, where observations are nested in students and items, and students are nested in schools. Different examples are given to illustrate the influence of prior information, the effects of clustered response data following a PISA study, and Bayesian methods for scale construction

    Comparisons of subscoring methods in computerized adaptive testing: a simulation study

    Get PDF
    Given the increasing demands of subscore reports, various subscoring methods and augmentation techniques have been developed aiming to improve the subscore estimates, but few studies have been conducted to systematically compare these methods under the framework of computerized adaptive tests (CAT). This research conducts a simulation study, for the purpose of comparing five subscoring methods on score estimation under variable simulated CAT conditions. Among the five subscoring methods, the IND-UCAT scoring ignores the correlations among subtests, whereas the other four correlation-based scoring methods (SEQ-CAT, PC-MCAT, reSEQ-CAT, and AUG-CAT) capitalize on the correlation information in the scoring procedure. By manipulating the sublengths, the correlation structures, and the item selection algorithms, more comparable, pragmatic, and systematic testing scenarios are created for comparison purposes. Also, to make the best of the sources underlying the assessments, the study proposes a successive scoring procedure according to the structure of the higher-order IRT model, in which the test total score of individual examinees can be calculated after the subscore estimation procedure is conducted. Through the successive scoring procedure, the subscores and the total score of an examinee can be sequentially derived from one test. The results of the study indicate that in the low correlation structure, the original IND-CAT is suggested for subscore estimation considering the ease of implementation in practice, while the suggested total score estimation procedure is not recommended given the large divergences from the true total scores. For the mixed correlation structure with two moderate correlations and one strong correlation, the original SEQ-CAT or the combination of the SEQ-CAT item selection and the PC-MCAT scoring should be considered not only for subscore estimation but also for total score estimation. If the post-hoc estimation procedure is allowed, the original SEQ-CAT and the reSEQ-CAT scoring could be jointly conducted for the best score estimates. In the high correlation structure, the original PC-MCAT and the combination of the PC-MCAT scoring and the SEQ-CAT item selection are suggested for both the subscore estimation and the total score estimation. In terms of the post-hoc score estimation, the reSEQ-CAT scoring in conjunction with the original SEQ-CAT is strongly recommended. If the complexity of the implementation is an issue in practice, the reSEQ-CAT scoring jointly conducted with the original IND-UCAT could be considered for reasonable score estimates. Additionally, to compensate for the constrained use of item pools in PC-MCAT, the PC-MCAT with adaptively sequencing subtests (SEQ-MCAT) is proposed for future investigations. The simplifications of item and/or subtest selection criteria in a simple-structure MCAT, PC-MCAT, and SEQ-MCAT are also pointed out for the convenience of their applications in practice. Last, the limitations of the study are discussed and the directions for future studies are also provided

    Detecting test cheating using a Deterministic, gated item response theory model

    Get PDF
    High-stakes tests are widely used as measurement tools to make inferences about test takers' proficiency, achievement, competence or knowledge. The stakes may be directly related to test performance, such as obtaining a high-school diploma, being granted a professional license or certificate, etc. Indirect stakes may include state accountability where test results are partially included in course grades and also tied to resource allocations for schools and school districts. Whether direct or indirect, high stakes can create an incentive for test cheating, which, in turn, severely jeopardizes the accuracy and validity of the inferences being made. Testing agencies and other stakeholders therefore endeavor to prevent or at least minimize the opportunities for test cheating by including multiple, spiraled test forms, minimizing item exposure, proctoring, and a variety of other preventive methods. However, even the best test prevention methods cannot totally eliminate cheating. For example, even if exposure is minimized, there is still some chance for a highly motivated group of examinees to collaborate to gain prior access to the exposed test items. Cheating detection methods, therefore, are developed as a complement to monitor and identify test cheating, afterward. There is a fairly strong research base of statistical cheating detection methods. However, many existing statistical cheating detection methods are in applied settings. This dissertation proposes a novel statistical cheating detection model, called the Deterministic, Gated Item Response Theory Model (DGIRTM). As its name implies, the DGIRTM uses a statistical gating mechanism to decompose observed item performance as a gated mixture of a true- proficiency function and a response function due to cheating. The gating mechanism and specific choice of parameters in the model further allow estimation of a statistical cheating effect at the level of individual examinees or groups (e.g., individual suspected of collaborating). Extensive simulation research was carried out to demonstrate the DGIRTM's characteristics and power to detect cheating. These studies rather clearly show that this new model may significantly improve our capability to sensitively detect and proactively respond to instances of test cheating

    A hierarchical framework for modeling speed and accuracy on test items

    Get PDF
    Current modeling of response times on test items has been influenced by the experimental paradigm of reaction-time research in psychology. For instance, some of the models have a parameter structure that was chosen to represent a speed-accuracy tradeoff, while others equate speed directly with response time. Other response-time models seem to be unclear as to the level of parameterization they represent. A hierarchical framework of modeling is proposed to better represent the nature of speed and accuracy on test items as well as the different levels of dependency between them. The framework allows a “plug-and-play approach” with alternative choices of response and response-time models to deal with different types of test items as well as population and item-domain models to represent key relations between their parameters. Bayesian treatment of the framework with Markov chain Monte Carlo (MCMC) computation facilitates the approach. Use of the framework is illustrated for the choice of a normal-ogive response model, a lognormal model for the response times, and multivariate normal models for the population and item domain with Gibbs sampling from the joint posterior distribution
    • 

    corecore