136 research outputs found
Automatic coding of short text responses via clustering in educational assessment
Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.
Validating Test Score Interpretations Using Time Information
A validity approach is proposed that uses processing times to collect validity evidence for the construct interpretation of test scores. The rationale of the approach is based on current research of processing times and on classical validity approaches, providing validity evidence based on relationships with other variables. Within the new approach, convergent validity evidence is obtained if a component skill, that is expected to underlie the task solution process in the target construct, positively moderates the relationship between effective speed and effective ability in the corresponding target construct. Discriminant validity evidence is provided if a component skill, that is not expected to underlie the task solution process in the target construct, does indeed not moderate the speed-ability relation in this target construct. Using data from a study that follows up the German PIAAC sample, this approach was applied to reading competence, assessed with PIAAC literacy items, and to quantitative reasoning, assessed with Number Series. As expected from theory, the effect of speed on ability in the target construct was only moderated by the respective underlying component skill, that is, word meaning activation skill as an underlying component skill of reading competence, and perceptual speed as an underlying component skill of reasoning. Accordingly, no positive interactions were found for the component skill that should not underlie the task solution process, that is, word meaning activation for reasoning and perceptual speed for reading. Furthermore, the study shows the suitability of the proposed validation approach. The use of time information in association with task results brings construct validation closer to the actual response process than widely used correlations of test scores
Structural analysis of attention components : the development of an integrating attention model
Die Arbeit befasst sich mit der Zusammenhangsstruktur einer Vielzahl von Aufmerksamkeitskomponenten, welche aus deutlich unterscheidbaren theoretischen Perspektiven postuliert werden. Untersucht wird insbesondere die Frage, inwieweit zu den auf konzeptueller Ebene differenzierten Aufmerksamkeitskomponenten entsprechende kognitive Fähigkeiten empirisch separiert und in ein gemeinsames, theorienübergreifendes Aufmerksamkeitsmodell integriert werden können. Zunächst wurde unter Fragestellung 1 untersucht, ob fünf häufig postulierte Aufmerksamkeitskomponenten in der Tradition Posners (s. Posner & Boies, 1971; Posner & Rafal, 1987), Alertness, räumliche Aufmerksamkeit, fokussierte Aufmerksamkeit, Aufmerksamkeitswechsel und geteilte Aufmerksamkeit, empirisch unterscheidbare kognitive Fähigkeiten darstellen. Im Anschluss daran wurde unter Fragestellung 2 analysiert, welchen Beitrag die Aufmerksamkeitskomponenten in der Tradition Posners zur Erklärung konzeptuell unterscheidbarer Aufmerksamkeitskomponenten aus Sicht der Handlungstheorie (Neumann, 1992), aus der Arbeitsgedächtnisperspektive (Baddeley, 1986) sowie aus Sicht der psychometrischen Konzentrationsforschung (z.B. Brickenkamp, 1994; Moosbrugger & Goldhammer, 2006) leisten. Unter Fragestellung 3 wurde eine faktorenanalytische Integration aller einbezogenen Aufmerksamkeitskomponenten in ein theorienübergreifendes, integratives Aufmerksamkeitsmodell vorgenommen. In Anlehnung an Theorien, welche Aufmerksamkeit sowohl auf perzeptive Prozesse als auch auf die exekutive Kontrolle bei der Bearbeitung komplexer Aufgaben beziehen (Bundesen, 1990; Logan & Gordon, 2001), wurden zwei Basisdimensionen bzw. grundlegende Aufmerksamkeitsfaktoren vermutet, nämlich perceptual attention und executive attention, welche interindividuelle Fähigkeitsunterschiede über die verschiedenen Aufmerksamkeitskomponenten hinweg erklären. Schließlich wurde unter Fragestellung 4 auf theoretischer wie auch auf empirischer Ebene untersucht, inwieweit sich Konzentration in den Erklärungsrahmen mehrdimensionaler Aufmerksamkeitsmodelle eingliedern lässt, indem eine konzeptuelle sowie statistische Rückführung von Konzentration auf mehrdimensionale Aufmerksamkeit vorgenommen wurde. An einer Stichprobe von 232 Studierenden wurde die Leistung in 13 Aufmerksamkeits- und Konzentrationstests erfasst. Konfirmatorische Faktorenanalysen zeigten, dass die fünf Aufmerksamkeitskomponenten in der Tradition Posners miteinander in Beziehung stehen, jedoch klar separierbar sind. Aus den getesteten Strukturgleichungsmodellen ging hervor, dass diese Aufmerksamkeitskomponenten signifikant und in unterschiedlicher Weise zur Erklärung von konzeptuell unterscheidbaren Aufmerksamkeitskomponenten beitragen. In hypothesenkonformer Weise erwiesen sich Aufmerksamkeitswechsel und geteilte Aufmerksamkeit als bedeutsam für die Erklärung von Konzentration sowie von handlungsorientierten Aufmerksamkeitskomponenten. Hinsichtlich des integrativen Aufmerksamkeitsmodells konnte die vermutete 2-Faktorenstruktur, welche einen Generalfaktor perceptual attention sowie einen spezifischen Faktor executive attention enthält, bestätigt werden. Das vielfach von Aufmerksamkeit getrennt behandelte Konzentrationskonstrukt konnte auf theoretischer Ebene durch Aufzeigen konzeptueller Entsprechungen in den Erklärungsrahmen mehrdimensionaler Aufmerksamkeit eingegliedert werden. In konsistenter Weise ließ sich Konzentration auf empirischer Ebene durch Aufmerksamkeitskomponenten nach Posner sowie die beiden Faktoren des postulierten integrativen Aufmerksamkeitsmodells substanziell erklären.The dissertation addresses the structure of a variety of attention components postulated from different theoretical perspectives. Especially, it is investigated whether conceptually distinct attention components correspond to empirically distinguishable cognitive abilities and whether they can be integrated into a general attention model. The first goal was to investigate whether five attention abilities related to Posner’s attention components (see Posner & Boies, 1971; Posner & Rafal, 1987) – alertness, spatial attention, focused attention, attentional switching, and divided attention – represent empirically distinguishable cognitive mechanisms from the perspective of individual differences. The second goal was to assess the extent to which attention abilities related to Posner’s attention components contribute to conceptually distinct attention abilities related to action theory (Neumann, 1992), working memory (Baddeley, 1986), and psychometric assessment (“concentration”, e.g., Brickenkamp, 1994; Moosbrugger & Goldhammer, 2006). The third goal was to identify major latent factors accounting for variance in attention measures. Following theoretical considerations which relate attention to perception and to the executive control of performance in complex tasks (Bundesen, 1990; Logan & Gordon, 2001), two latent factors underlying individual differences in attention measures were assumed, namely, perceptual attention and executive attention. Finally, the fourth goal was to explain “concentration” theoretically and empirically by a variety of attention components and related abilities. A sample of 232 students completed a test battery of 13 attention and concentration tests. Confirmatory factor analyses revealed that the five attention abilities based on Posner’s work are moderately related, but clearly distinguishable. The proposed confirmatory factor model consisted of one common and five specific attention ability factors. Structural equation modeling indicated that these five specific attention abilities contribute differentially to attention abilities associated with working memory, action theory and psychometric assessment whereas the common factor contributes significantly to all of them. Especially, the results suggested that both divided attention and attentional switching are involved in action-oriented attention abilities as well as in attention abilities associated with psychometric assessment (“concentration”). With respect to the integrative attention model, the assumed 2-factor structure was tested by confirmatory factor models including “perceptual attention” and “executive attention” as latent factors. Results supported the 2-factor structure and, thereby, the hypothesis, that perceptual and executive attention are major factors underlying individual differences in attention measures. The concept of “concentration” which is often considered to be unrelated to attention could be explained by attention components at the conceptual level. Moreover, results demonstrated that attention abilities based on Posner’s work as well as the two proposed major latent factors, i.e., “perceptual attention” and “executive attention”, account for variance in concentration performance
Semi-automatic coding of open-ended text responses in large-scale assessments
Background:In the context of large-scale educational assessments, the effortrequired to code open-ended text responses is considerably more expensive andtime-consuming than the evaluation of multiple-choice responses because it requirestrained personnel and long manual coding sessions.Aim:Our semi-supervised coding methodeco(exploring coding assistant) dynamicallysupports human raters by automatically coding a subset of the responses.Method:We map normalized response texts into a semantic space and clusterresponse vectors based on their semantic similarity. Assuming that similar codes rep-resent semantically similar responses, we propagate codes to responses in optimallyhomogeneous clusters. Cluster homogeneity is assessed by strategically queryinginformative responses and presenting them to a human rater. Following each manualcoding, the method estimates the code distribution respecting a certainty intervaland assumes a homogeneous distribution if certainty exceeds a predefined threshold.If a cluster is determined to certainly comprise homogeneous responses, all remainingresponses are coded accordingly automatically. We evaluated the method in a simu-lation using different data sets.Results:With an average miscoding of about 3%, the method reduced the manualcoding effort by an average of about 52%.Conclusion: Combining the advantages of automatic and manual coding producesconsiderable coding accuracy and reduces the required manual effort. (DIPF/Orig.
Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics
Background: A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC) is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods: Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results: The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant’s native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item’s location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions: The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement)
Invariance of the Response Processes Between Gender and Modes in an Assessment of Reading
In this paper, we developed a method to extract item-level response times from log data that are available in computer-based assessments (CBA) and paper-based assessments (PBA) with digital pens. Based on response times that were extracted using only time differences between responses, we used the bivariate generalized linear IRT model framework (B-GLIRT, [1]) to investigate response times as indicators for response processes. A parameterization that includes an interaction between the latent speed factor and the latent ability factor in the cross-relation function was found to fit the data best in CBA and PBA. Data were collected with a within-subject design in a national add-on study to PISA 2012 administering two clusters of PISA 2009 reading units. After investigating the invariance of the measurement models for ability and speed between boys and girls, we found the expected gender effect in reading ability to coincide with a gender effect in speed in CBA. Taking this result as indication for the validity of the time measures extracted from time differences between responses, we analyzed the PBA data and found the same gender effects for ability and speed. Analyzing PBA and CBA data together we identified the ability mode effect as the latent difference between reading measured in CBA and PBA. Similar to the gender effect the mode effect in ability was observed together with a difference in the latent speed between modes. However, while the relationship between speed and ability is identical for boys and girls we found hints for mode differences in the estimated parameters of the cross-relation function used in the B-GLIRT model
Controlling the Speededness of Assembled Test Forms: A Generalization to the Three-Parameter Lognormal Response Time Model
When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe limitation, in that the two-parameter lognormal model lacks a slope parameter. This means that the model assumes that all items are equally speed sensitive. From a conceptual perspective, this assumption seems very restrictive. Furthermore, various other empirical studies and new data analyses performed by us show that this assumption almost never holds in practice. To overcome this shortcoming, we bring together the already frequently used three-parameter lognormal model for response times, which contains a slope parameter, and the ATA approach for controlling speededness by van der Linden. Using multiple empirically based illustrations, the proposed extension is illustrated, including complete and documented R code. Both the original van der Linden approach and our newly proposed approach are available to practitioners in the freely available R package eatATA.Peer Reviewe
From byproduct to design factor: on validating the interpretation of process indicators based on log data
International large-scale assessments such as PISA or PIAAC have started to provide public or scientific use files for log data; that is, events, event-related attributes and timestamps of test-takers’ interactions with the assessment system. Log data and the process indicators derived from it can be used for many purposes. However, the intended uses and interpretations of process indicators require validation, which here means a theoretical and/or empirical justification that inferences about (latent) attributes of the test-taker’s work process are valid. This article reviews and synthesizes measurement concepts from various areas, including the standard assessment paradigm, the continuous assessment approach, the evidence-centered design (ECD) framework, and test validation. Based on this synthesis, we address the questions of how to ensure the valid interpretation of process indicators by means of an evidence-centered design of the task situation, and how to empirically challenge the intended interpretation of process indicators by developing and implementing correlational and/or experimental validation strategies. For this purpose, we explicate the process of reasoning from log data to low-level features and process indicators as the outcome of evidence identification. In this process, contextualizing information from log data is essential in order to reduce interpretative ambiguities regarding the derived process indicators. Finally, we show that empirical validation strategies can be adapted from classical approaches investigating the nomothetic span and construct representation. Two worked examples illustrate possible validation strategies for the design phase of measurements and their empirical evaluation
From byproduct to design factor. On validating the interpretation of process indicators based on log data
International large-scale assessments such as PISA or PIAAC have started to provide public or scientific use files for log data; that is, events, event-related attributes and timestamps of test-takers\u27 interactions with the assessment system. Log data and the process indicators derived from it can be used for many purposes. However, the intended uses and interpretations of process indicators require validation, which here means a theoretical and/or empirical justification that inferences about (latent) attributes of the test-taker\u27s work process are valid. This article reviews and synthesizes measurement concepts from various areas, including the standard assessment paradigm, the continuous assessment approach, the evidence-centered design (ECD) framework, and test validation. Based on this synthesis, we address the questions of how to ensure the valid interpretation of process indicators by means of an evidence-centered design of the task situation, and how to empirically challenge the intended interpretation of process indicators by developing and implementing correlational and/or experimental validation strategies. For this purpose, we explicate the process of reasoning from log data to low-level features and process indicators as the outcome of evidence identification. In this process, contextualizing information from log data is essential in order to reduce interpretative ambiguities regarding the derived process indicators. Finally, we show that empirical validation strategies can be adapted from classical approaches investigating the nomothetic span and construct representation. Two worked examples illustrate possible validation strategies for the design phase of measurements and their empirical evaluation. (DIPF/Orig.
- …