25,825 research outputs found
Components of the item selection algorithm in computerized adaptive testing
Computerized adaptive testing (CAT) greatly improves measurement efficiency in high-stakes testing operations through the selection and administration of test items with the difficulty level that is most relevant to each individual test taker. This paper explains the 3 components of a conventional CAT item selection algorithm: test content balancing, the item selection criterion, and item exposure control. Several noteworthy methodologies underlie each component. The test script method and constrained CAT method are used for test content balancing. Item selection criteria include the maximized Fisher information criterion, the b-matching method, the a-stratification method, the weighted likelihood information criterion, the efficiency balanced information criterion, and the Kullback-Leibler information criterion. The randomesque method, the Sympson-Hetter method, the unconditional and conditional multinomial methods, and the fade-away method are used for item exposure control. Several holistic approaches to CAT use automated test assembly methods, such as the shadow test approach and the weighted deviation model. Item usage and exposure count vary depending on the item selection criterion and exposure control method. Finally, other important factors to consider when determining an appropriate CAT design are the computer resources requirement, the size of item pools, and the test length. The logic of CAT is now being adopted in the field of adaptive learning, which integrates the learning aspect and the (formative) assessment aspect of education into a continuous, individualized learning experience. Therefore, the algorithms and technologies described in this review may be able to help medical health educators and high-stakes test developers to adopt CAT more actively and efficiently
Recommended from our members
Extension of the item pocket method allowing for response review and revision to a computerized adaptive test using the generalized partial credit model
Computerized Adaptive Testing (CAT) has increased in the last few decades, due in part to the increased use and availability of personal computers, but also partly due to the benefits of CATs. CATs provide increased measurement precision of ability estimates while decreasing the demand on examinees with shorter tests. This is accomplished by tailoring the test to each examinee and selecting items that are not too difficult or too easy based on the examinees’ interim ability estimate and responses to previous items. These benefits come at the cost of the flexibility to move through the test as an examinee would with a Paper and Pencil (P & P) test. The algorithms used in CATs for item selection and ability estimation require restrictions to response review and revision; however, a large portion of examinees desire options for review and revision of responses (Vispoel, Clough, Bleiler, Hendrickson, and Ihrig, 2002). Previous research has examined response review and revision in CATs with limited review and revision options and are limited to after all items had been administered. The development of the Item Pocket (IP) method (Han, 2013) has allowed for response review and revision
during the test, relaxing the restrictions, while maintaining an acceptable level of measurement precision. This is achieved by creating an item pocket in which items are placed, which are excluded from use in the interim ability estimation and the item selection procedures. The initial simulation study was conducted by Han (2013) who investigated the use of the IP method using a dichotomously-scored fixed length test. The findings indicated that the IP method does not substantially decrease measurement precision and bias in the ability estimates were within acceptable ranges for operational tests.
This simulation study extended the IP method to a CAT using polytomously-scored items using the Generalized Partial Credit model with exposure control and content balancing. The IP method was implemented in tests with three IP sizes (2, 3, and 4), two termination criteria (fixed and variable), two test lengths (15 and 20), and two item completion conditions (forced to answer and ignored) for items remaining in the IP at the end of the test. Additionally, four traditional CAT conditions, without implementing the IP method, were included in the design. Results found that the longer, 20 item IP method conditions using the forced answer method had higher measurement precision, with higher mean correlations between known and estimated theta, lower mean bias and RMSE, and measurement precision increased as IP size increased. The two item completion conditions (forced to answer and ignored) resulted in similar measurement precision. The variable length IP conditions resulted in comparable measurement precision as the corresponding fixed length IP conditions. The implications of the findings and the limitations with suggestions for future research are also discussed.Educational Psycholog
Introduction to the LIVECAT web-based computerized adaptive testing platform
This study introduces LIVECAT, a web-based computerized adaptive testing platform. This platform provides many functions, including writing item content, managing an item bank, creating and administering a test, reporting test results, and providing information about a test and examinees. The LIVECAT provides examination administrators with an easy and flexible environment for composing and managing examinations. It is available at http://www.thecatkorea.com/. Several tools were used to program LIVECAT, as follows: operating system, Amazon Linux; web server, nginx 1.18; WAS, Apache Tomcat 8.5; database, Amazon RDMS—Maria DB; and languages, JAVA8, HTML5/CSS, Javascript, and jQuery. The LIVECAT platform can be used to implement several item response theory (IRT) models such as the Rasch and 1-, 2-, 3-parameter logistic models. The administrator can choose a specific model of test construction in LIVECAT. Multimedia data such as images, audio files, and movies can be uploaded to items in LIVECAT. Two scoring methods (maximum likelihood estimation and expected a posteriori) are available in LIVECAT and the maximum Fisher information item selection method is applied to every IRT model in LIVECAT. The LIVECAT platform showed equal or better performance compared with a conventional test platform. The LIVECAT platform enables users without psychometric expertise to easily implement and perform computerized adaptive testing at their institutions. The most recent LIVECAT version only provides a dichotomous item response model and the basic components of CAT. Shortly, LIVECAT will include advanced functions, such as polytomous item response models, weighted likelihood estimation method, and content balancing method
Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests
Computerized adaptive testing is becoming increasingly popular due to
advancement of modern computer technology. It differs from the conventional
standardized testing in that the selection of test items is tailored to
individual examinee's ability level. Arising from this selection strategy is a
nonlinear sequential design problem. We study, in this paper, the sequential
design problem in the context of the logistic item response theory models. We
show that the adaptive design obtained by maximizing the item information leads
to a consistent and asymptotically normal ability estimator in the case of the
Rasch model. Modifications to the maximum information approach are proposed for
the two- and three-parameter logistic models. Similar asymptotic properties are
established for the modified designs and the resulting estimator. Examples are
also given in the case of the two-parameter logistic model to show that without
such modifications, the maximum likelihood estimator of the ability parameter
may not be consistent.Comment: Published in at http://dx.doi.org/10.1214/08-AOS614 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Development and Simulation Testing of a Computerized Adaptive Measure of Communicative Functioning in Aphasia
Computerized adaptive testing (CAT), based on the mathematical framework of item response theory (IRT), has increasingly been implemented in patient reported outcome measures over the past decade (Fries, Bruce, & Cella, 2005). Given a calibrated item pool fit by an appropriate IRT measurement model, a CAT can produce reliable ability estimates more efficiently than traditional paper-and-pencil tests by administering items that are most informative given the examinee’s estimated ability level (Wainer, 2000). As conventional measures employed in the measurement of aphasia were developed under traditional measurement theory, many of these measures are long and inefficient, and are consequently unsuitable for regular clinical care. In addition, these conventional measures often fail to meet the needs of many community-dwelling stroke survivors whose impairments falls outside the range reliably measured by these tests (Doyle et al. 2012). IRT-based and in particular CAT patient reported outcome measures offer the possibility of substantial improvements in measurement technology for persons with aphasia
Recommended from our members
A comparison of three statistical testing procedures for computerized classification testing with multiple cutscores and item selection methods
textComputerized classification tests (CCT) have been used in high-stakes assessment settings where the express purpose of the testing is to assign a classification decision (e.g. pass/fail). One key feature of sequential probability ratio test-type procedures is that items are selected to maximize information around the cutscore region of the examinee ability distribution as opposed to common features of CATs where items are selected to maximize information at examinees' interim estimates. Previous research has examined the effectiveness of computerized adaptive tests (CAT) utilizing classification testing procedures a single cutscore as well as multiple cutscores (e.g. below basic/proficient/advanced). Several variations of the SPRT procedure have been advanced recently including a generalized likelihood ratio (GLR). While the GLR procedure has shown evidences of improved average test length while reasonably maintaining classification accuracy, it also introduces unnecessary error. The purpose of this dissertation was to propose and investigate the functionality of a modified GLR procedure which does not incorporate the unnecessary error inherent in the GLR procedure. Additionally this dissertation explored the use of the multiple cutscores and the use of ability-based item selection. This dissertation investigated the performance of three classification procedures (SPRT, GLR, and modified GLR), multiple cutscores, and two test lengths. An additional set of conditions were developed in which an ability-based item selection method was used with the modified GLR. A simulation study was performed to gather evidences of the effectiveness and efficiency of a modified GLR procedure by comparing it to the SPRT and GLR procedures. The study found that the GLR and mGLR procedures were able to yield shorter test lengths as anticipated. Additionally, the mGLR procedure using ability-based item selection produced even shorter test lengths than the cutscore-based mGLR method. Overall, the classification accuracy of the procedures were reasonably close. Examination of conditional classification accuracy in the multiple-cutscore conditions showed unexpectedly low values for each of the procedures. Implications and future research are discussed herein.Educational Psycholog
Recommended from our members
Optimal test designs with content balancing and variable target information functions as constraints.
Optimal test design involves the application of an item selection heuristic to construct a test to fit the target information function in order that the standard error of the test can be controlled at different regions of the ability continuum. The real data simulation study assessed the efficiency of binary programming in optimal item selection by comparing the degree in which the obtained test information was approximated to different target information functions with a manual heuristic. The effects of imposing a content balancing constraint was studied in conventional, two-stage and adaptive tests designed using the automated procedure. Results showed that the automated procedure improved upon the manual procedure significantly when a uniform target information function was used. However, when a peaked target information function was used, the improvement over the manual procedure was marginal. Both procedures were affected by the distribution of the item parameters in the item pool. The degree in which the examinee empirical scores were recovered was lower when a content balancing constraint was imposed in the conventional test designs. The effect of uneven item parameter distribution in the item pool was shown by the poorer recovery of the empirical scores at the higher regions of the ability continuum. Two-stage tests were shown to limit the effects of content balancing. Content balanced adaptive tests using optimal item selection was shown to be efficient in empirical score recovery, especially in maintaining equiprecision in measurement over a wide ability range despite the imposition of content balancing constraint in the test design. The study had implications for implementing automated test designs in the school systems supported by hardware and expertise in measurement theory and addresses the issue of content balancing using optimal test designs within an adaptive testing framework
Generating Adaptive and Non-Adaptive Test Interfaces for Multidimensional Item Response Theory Applications
Computerized adaptive testing (CAT) is a powerful technique to help improve measurement precision and reduce the total number of items required in educational, psychological, and medical tests. In CATs, tailored test forms are progressively constructed by capitalizing on information available from responses to previous items. CAT applications primarily have relied on unidimensional item response theory (IRT) to help select which items should be administered during the session. However, multidimensional CATs may be constructed to improve measurement precision and further reduce the number of items required to measure multiple traits simultaneously. A small selection of CAT simulation packages exist for the R environment; namely, catR (Magis and Raîche 2012), catIrt (Nydick 2014), and MAT (Choi and King 2014). However, the ability to generate graphical user interfaces for administering CATs in realtime has not been implemented in R to date, support for multidimensional CATs have been limited to the multidimensional three-parameter logistic model, and CAT designs were required to contain IRT models from the same modeling family. This article describes a new R package for implementing unidimensional and multidimensional CATs using a wide variety of IRT models, which can be unique for each respective test item, and demonstrates how graphical user interfaces and Monte Carlo simulation designs can be constructed with the mirtCAT package
- …