20 research outputs found

    Speeding up without loss of accuracy::Item position effects on performance in university exams

    Get PDF
    The quality of exams drives test-taking behavior of exam- inees and is a proxy for the quality of teaching. As most university exams have strict time limits, and speededness is an important measure of the cognitive state of examinees, this might be used to assess the connection between exams’ quality and examinees’ performance. The practice of randomization within university exams enables the analysis of item position effects within individual exams as a measure of speededness, and as such it enables the creation of a measure of the quality of an exam. In this research, we use generalized linear mixed models to evaluate item position effects on response accuracy and response time in a large dataset of randomized exams from Utrecht University. We find that there is an effect of item position on response time for most exams, but the same is not true for response accuracy, which might be a starting point for identifying factors that influence speededness and can affect the mental state of examinees

    Urnings:A new method for tracking dynamically changing parameters in paired comparison systems

    Get PDF
    We introduce a new rating system for tracking the development of parameters based on a stream of observations that can be viewed as paired comparisons. Rating systems are applied in competitive games, adaptive learning systems and platforms for product and service reviews. We model each observation as an outcome of a game of chance that depends on the parameters of interest (e.g. the outcome of a chess game depends on the abilities of the two players). Determining the probabilities of the different game outcomes is conceptualized as an urn problem, where a rating is represented by a probability (i.e. proportion of balls in the urn). This setup allows for evaluating the standard errors of the ratings and performing statistical inferences about the development of, and relations between, parameters. Theoretical properties of the system in terms of the invariant distributions of the ratings and their convergence are derived. The properties of the rating system are illustrated with simulated examples and its potential for answering research questions is illustrated using data from competitive chess, a movie review system, and an adaptive learning system for math

    Tracking a multitude of abilities as they develop

    Get PDF
    Recently, the Urnings algorithm (Bolsinova et al., 2022, J. R. Stat. Soc. Ser. C Appl. Statistics, 71, 91) has been proposed that allows for tracking the development of abilities of the learners and the difficulties of the items in adaptive learning systems. It is a simple and scalable algorithm which is suited for large-scale applications in which large streams of data are coming into the system and on-the-fly updating is needed. Compared to alternatives like the Elo rating system and its extensions, the Urnings rating system allows the uncertainty of the ratings to be evaluated and accounts for adaptive item selection which, if not corrected for, may distort the ratings. In this paper we extend the Urnings algorithm to allow for both between-item and within-item multidimensionality. This allows for tracking the development of interrelated abilities both at the individual and the population level. We present formal derivations of the multidimensional Urnings algorithm, illustrate its properties in simulations, and present an application to data from an adaptive learning system for primary school mathematics called Math Garden

    Urnings: A new method for tracking dynamically changing parameters in paired comparison systems

    Get PDF
    We introduce a new rating system for tracking the development of parameters based on a stream of observations. Rating systems are applied in competitive games, adaptive learning systems, and platforms for product and service ratings. We model each observation as an outcome of a game of chance that depends on the parameters of interest (e.g., the outcome of a chess game depends on the abilities of the two players). Determining the probabilities of the different game outcomes is conceptualized as an urn problem, where a rating is represented by a proportion of colored balls in an urn. This setup allows for evaluating the standard errors of the ratings and performing statistical inferences about the development of and relations between parameters. Theoretical properties of the system in terms of the invariant distributions of the ratings and their convergence are derived. The properties of the rating system are illustrated with simulated examples and its potential for answering research questions is illustrated using data from competitive chess

    Speed accuracy tradeoff? Not so fast: Marginal changes in speed have inconsistent relationships with accuracy in real-world settings

    Get PDF
    The speed-accuracy tradeoff suggests that responses generated under time constraints will be less accurate. While it has undergone extensive experimental verification, it is less clear whether it applies in settings where time pressures are not being experimentally manipulated (but where respondents still vary in their utilization of time). Using a large corpus of 29 response time datasets containing data from cognitive tasks without experimental manipulation of time pressure, we probe whether the speed-accuracy tradeoff holds across a variety of tasks using idiosyncratic within-person variation in speed. We find inconsistent relationships between marginal increases in time spent responding and accuracy; in many cases, marginal increases in time do not predict increases in accuracy. However, we do observe time pressures (in the form of time limits) to consistently reduce accuracy and for rapid responses to typically show the anticipated relationship (i.e., they are more accurate if they are slower). We also consider analysis of items and individuals. We find substantial variation in the item-level associations between speed and accuracy. On the person side, respondents who exhibit more within-person variation in response speed are typically of lower ability. Finally, we consider the predictive power of a person's response time in predicting out-of-sample responses; it is generally a weak predictor. Collectively, our findings suggest the speed-accuracy tradeoff may be limited as a conceptual model in its application in non-experimental settings and, more generally, offer empirical results and an analytic approach that will be useful as more response time data is collected

    Speeding up without Loss of Accuracy: Item Position Effects on Performance in University Exams

    No full text
    The quality of exams drives test-taking behavior of exam- inees and is a proxy for the quality of teaching. As most university exams have strict time limits, and speededness is an important measure of the cognitive state of examinees, this might be used to assess the connection between exams’ quality and examinees’ performance. The practice of randomization within university exams enables the analysis of item position effects within individual exams as a measure of speededness, and as such it enables the creation of a measure of the quality of an exam. In this research, we use generalized linear mixed models to evaluate item position effects on response accuracy and response time in a large dataset of randomized exams from Utrecht University. We find that there is an effect of item position on response time for most exams, but the same is not true for response accuracy, which might be a starting point for identifying factors that influence speededness and can affect the mental state of examinees

    Speeding up without loss of accuracy:: Item position effects on performance in university exams

    No full text
    The quality of exams drives test-taking behavior of examinees and is a proxy for the quality of teaching. As most university exams have strict time limits, and speededness is an important measure of the cognitive state of examinees, this might be used to assess the connection between exams’ quality and examinees’ performance. The practice of randomization within university exams enables the analysis of item position effects within individual exams as a measure of speededness, and as such it enables the creation of a measure of the quality of an exam. In this research, we use generalized linear mixed models to evaluate item position effects on response accuracy and response time in a large dataset of randomized exams from an international research university. We find that there is an effect of item position on response time for most exams, but the same is not valid for response accuracy, which might be a starting point for identifying factors that influence speededness and can affect the mental state of examinees

    Data_Sheet_2_Properties and performance of the one-parameter log-linear cognitive diagnosis model.pdf

    No full text
    Diagnostic classification models (DCMs) are psychometric models that yield probabilistic classifications of respondents according to a set of discrete latent variables. The current study examines the recently introduced one-parameter log-linear cognitive diagnosis model (1-PLCDM), which has increased interpretability compared with general DCMs due to useful measurement properties like sum score sufficiency and invariance properties. We demonstrate its equivalence with the Latent Class/Rasch Model and discuss interpretational consequences. The model is further examined in a DCM framework. We demonstrate the sum score sufficiency property and we derive an expression for the cut score for mastery classification. It is shown by means of a simulation study that the 1-PLCDM is fairly robust to model constraint violations in terms of classification accuracy and reliability. This robustness in combination with useful measurement properties and ease of interpretation can make the model attractive for stakeholders to apply in various assessment settings.</p

    Data_Sheet_1_Properties and performance of the one-parameter log-linear cognitive diagnosis model.zip

    No full text
    Diagnostic classification models (DCMs) are psychometric models that yield probabilistic classifications of respondents according to a set of discrete latent variables. The current study examines the recently introduced one-parameter log-linear cognitive diagnosis model (1-PLCDM), which has increased interpretability compared with general DCMs due to useful measurement properties like sum score sufficiency and invariance properties. We demonstrate its equivalence with the Latent Class/Rasch Model and discuss interpretational consequences. The model is further examined in a DCM framework. We demonstrate the sum score sufficiency property and we derive an expression for the cut score for mastery classification. It is shown by means of a simulation study that the 1-PLCDM is fairly robust to model constraint violations in terms of classification accuracy and reliability. This robustness in combination with useful measurement properties and ease of interpretation can make the model attractive for stakeholders to apply in various assessment settings.</p

    Constructing and Predicting School Advice for Academic Achievement: A Comparison of Item Response Theory and Machine Learning Techniques

    No full text
    Educational tests can be used to estimate pupils’ abilities and thereby give an indication of whether their school type is suitable for them. However, tests in education are usually conducted for each content area separately which makes it difficult to combine these results into one single school advice. To this end, we provide a comparison between both domain-specific and domain-agnostic methods for predicting school advice. Both use data from a pupil monitoring system in the Netherlands, which keeps track of pupils’ educational progress over several years by a series of tests measuring multiple skills. An IRT model is calibrated from which an ability score is extracted and is subsequently plugged into a multinomial log- linear regression model. Second, we train a random forest (RF) and a shallow neural network (NN) and apply case weighting to give extra attention to pupils who switched between school types. When considering the performance of all pupils, RFs provided the most accurate predictions followed by NNs and IRT respectively. When only looking at the performance of pupils who switched school type, IRT performed best followed by NNs and RFs. Case weighting proved to provide a major improvement for this group. Lastly, IRT was found to be much easier to explain in comparison to the other models. Thus, while ML provided more accurate results, this comes at the cost of a lower explainability in comparison to IRT
    corecore