56 research outputs found
Using Connectionist Models to Evaluate Examinees’ Response Patterns to Achievement Tests
The attribute hierarchy method (AHM) applied to assessment engineering is described. It is a psychometric method for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components in a cognitive model of task performance. Attribute probabilities, computed using a neural network, can be estimated for each examinee thereby providing specific information about the examinee’s attribute-mastery level. The pattern recognition approach described in this study relies on an explicit cognitive model to produce the expected response patterns. The expected response patterns serve as the input to the neural network. The model also yields the cognitive test specifications. These specifications identify the examinees’ attribute patterns which are used as output for the neural network. The purpose of the statistical pattern recognition analysis is to estimate the probability that an examinee possess specific attribute combinations based on their observed item response patterns. Two examples using student response data from a sample of algebra items on the SAT illustrate our pattern recognition approach
Recommended from our members
Three Applications of Automated Test Assembly within a User-Friendly Modeling Environment
While linear programming is a common tool in business and industry, there have not been many applications in educational assessment and only a handful of individuals have been actively involved in conducting psychometric research in this area. Perhaps this is due, at least in part, to the complexity of existing software packages. This article presents three applications of linear programming to automate test assembly using an add-in to Microsoft Excel 2007. These increasingly complex examples permit the reader to readily see and manipulate the programming objectives and constraints within a familiar modeling environment. A spreadsheet used in this demonstration is available for downloading. Accessed 12,243 times on https://pareonline.net from June 21, 2009 to December 31, 2019. For downloads from January 1, 2020 forward, please click on the PlumX Metrics link to the right
Differential Validity and Utility of Successive and Simultaneous Approaches to the Development of Equivalent Achievement Tests in French and English
Described in this article are the first three activities of a research program designed to assess the differential validity and utility of successive and simultaneous approaches to the development of equivalent achievement tests in the French and English languages. Two teams of multilingual/multicultural French-English teachers used the simultaneous approach to develop 70 items respectively for mathematics and social studies at the grade 9 level. The evidence gained from the pilot study suggests that the issue of differential item performance attributable to translation differences appears to be confounded by the presence of socioeconomic differences between the two groups of students. Consequently, the next activities of this research will be directed toward disentangling these two issues to obtain a clearer view of the efficacy of the simultaneous method in reducing differential group performance and enhancing linguistic and cultural decentering
Multiple-Choice Item Distractor Development Using Topic Modeling Approaches
Writing a high-quality, multiple-choice test item is a complex process. Creating plausible but incorrect options for each item poses significant challenges for the content specialist because this task is often undertaken without implementing a systematic method. In the current study, we describe and demonstrate a systematic method for creating plausible but incorrect options, also called distractors, based on students’ misconceptions. These misconceptions are extracted from the labeled written responses. One thousand five hundred and fifteen written responses from an existing constructed-response item in Biology from Grade 10 students were used to demonstrate the method. Using a topic modeling procedure commonly used with machine learning and natural language processing called latent dirichlet allocation, 22 plausible misconceptions from students’ written responses were identified and used to produce a list of plausible distractors based on students’ responses. These distractors, in turn, were used as part of new multiple-choice items. Implications for item development are discussed
Validation of the conceptual research utilization scale: an application of the standards for educational and psychological testing in healthcare
<p>Abstract</p> <p>Background</p> <p>There is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale).</p> <p>Methods</p> <p>We used the <it>Standards for Educational and Psychological Testing </it>as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression.</p> <p>Results</p> <p>Content validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression.</p> <p>Conclusions</p> <p>The CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change.</p
A Methodology for Multilingual Automatic Item Generation
Testing agencies require large numbers of high-quality items that are produced in a cost-effective and timely manner. Increasingly, these agencies also require items in different languages. In this paper we present a methodology for multilingual automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. We describe a three-step AIG approach where, first, test development specialists identify the content that will be used for item generation. Next, the specialists create item models to specify the content in the assessment task that must be manipulated to produce new items. Finally, elements in the item model are manipulated with computer algorithms to produce new items. Language is added in the item model step to permit multilingual AIG. We illustrate our method by generating 360 English and 360 French medical education items. The importance of item banking in multilingual test development is also discussed.Les agences d’évaluation ont besoin d’un grand nombre d’items de première qualitĂ© produits de façon rapide et Ă©conomique, et de plus en plus souvent dans diffĂ©rentes langues. Dans cet article, une mĂ©thodologie de gĂ©nĂ©ration automatique d’items (AIG) multilingues est proposĂ©e. L’AIG correspond au processus d’utilisation de modèles d’items dans le but de gĂ©nĂ©rer les items d’un test Ă l’aide de la technologie informatique. Une approche AIG en trois Ă©tapes est dĂ©crite, dans laquelle les spĂ©cialistes en dĂ©veloppement de test doivent d’abord identifier le contenu qui sera utilisĂ© pour gĂ©nĂ©rer les items. Par la suite, ces spĂ©cialistes crĂ©ent des modèles d’items afin de prĂ©ciser le contenu de la tâche d’évaluation qui doit ĂŞtre manipulĂ©e pour produire de nouveaux items. Enfin, les Ă©lĂ©ments du modèle d’items sont manipulĂ©s Ă l’aide d’algorithmes informatiques pour gĂ©nĂ©rer de nouveaux items. L’ajout des langues dĂ©sirĂ©es Ă l’étape de crĂ©ation des modèles d’items permet d’effectuer une gĂ©nĂ©ration automatique d’items multilingues. Cette mĂ©thode est illustrĂ©e en gĂ©nĂ©rant 360 items en français et 360 items en anglais dans le domaine de la formation mĂ©dicale. L’importance de crĂ©er des banques d’items lors du dĂ©veloppement de tests multilingues est Ă©galement discutĂ©e.As agĂŞncias de avaliação precisam de um grande nĂşmero de itens de primeira qualidade produzidos de forma rápida e econĂłmica, e, cada vez mais, em diferentes lĂnguas. Neste artigo, Ă© proposta uma metodologia para a geração automática de itens (AIG) multilingues. A AIG Ă© o processo de utilização de modelos de itens com a finalidade de gerar itens de um teste com o apoio da tecnologia informática. Descreve-se uma abordagem AIG em trĂŞs etapas, na qual os especialistas em desenvolvimento de testes devem identificar, desde logo, o conteĂşdo que será utilizado para gerar os itens. De seguida, estes especialistas criam os modelos de itens para especificar o conteĂşdo da tarefa de avaliação que deve ser manipulado para produzir novos itens. Finalmente, os elementos do modelo de itens sĂŁo manipulados usando algoritmos informáticos para gerar novos itens. Adicionando as lĂnguas desejadas Ă etapa de criação de modelos de itens Ă© possĂvel efetuar a geração automática de itens multilingues. Este mĂ©todo Ă© ilustrado atravĂ©s da geração de 360 itens em francĂŞs e 360 itens em inglĂŞs no campo da formação mĂ©dica. Discute-se tambĂ©m a importância da criação de bancos de itens no desenvolvimento de testes multilingues
- …