56 research outputs found

    Using Connectionist Models to Evaluate Examinees’ Response Patterns to Achievement Tests

    Get PDF
    The attribute hierarchy method (AHM) applied to assessment engineering is described. It is a psychometric method for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components in a cognitive model of task performance. Attribute probabilities, computed using a neural network, can be estimated for each examinee thereby providing specific information about the examinee’s attribute-mastery level. The pattern recognition approach described in this study relies on an explicit cognitive model to produce the expected response patterns. The expected response patterns serve as the input to the neural network. The model also yields the cognitive test specifications. These specifications identify the examinees’ attribute patterns which are used as output for the neural network. The purpose of the statistical pattern recognition analysis is to estimate the probability that an examinee possess specific attribute combinations based on their observed item response patterns. Two examples using student response data from a sample of algebra items on the SAT illustrate our pattern recognition approach

    Differential Validity and Utility of Successive and Simultaneous Approaches to the Development of Equivalent Achievement Tests in French and English

    Get PDF
    Described in this article are the first three activities of a research program designed to assess the differential validity and utility of successive and simultaneous approaches to the development of equivalent achievement tests in the French and English languages. Two teams of multilingual/multicultural French-English teachers used the simultaneous approach to develop 70 items respectively for mathematics and social studies at the grade 9 level. The evidence gained from the pilot study suggests that the issue of differential item performance attributable to translation differences appears to be confounded by the presence of socioeconomic differences between the two groups of students. Consequently, the next activities of this research will be directed toward disentangling these two issues to obtain a clearer view of the efficacy of the simultaneous method in reducing differential group performance and enhancing linguistic and cultural decentering

    Multiple-Choice Item Distractor Development Using Topic Modeling Approaches

    Get PDF
    Writing a high-quality, multiple-choice test item is a complex process. Creating plausible but incorrect options for each item poses significant challenges for the content specialist because this task is often undertaken without implementing a systematic method. In the current study, we describe and demonstrate a systematic method for creating plausible but incorrect options, also called distractors, based on students’ misconceptions. These misconceptions are extracted from the labeled written responses. One thousand five hundred and fifteen written responses from an existing constructed-response item in Biology from Grade 10 students were used to demonstrate the method. Using a topic modeling procedure commonly used with machine learning and natural language processing called latent dirichlet allocation, 22 plausible misconceptions from students’ written responses were identified and used to produce a list of plausible distractors based on students’ responses. These distractors, in turn, were used as part of new multiple-choice items. Implications for item development are discussed

    Validation of the conceptual research utilization scale: an application of the standards for educational and psychological testing in healthcare

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale).</p> <p>Methods</p> <p>We used the <it>Standards for Educational and Psychological Testing </it>as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression.</p> <p>Results</p> <p>Content validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression.</p> <p>Conclusions</p> <p>The CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change.</p

    A Methodology for Multilingual Automatic Item Generation

    No full text
    Testing agencies require large numbers of high-quality items that are produced in a cost-effective and timely manner. Increasingly, these agencies also require items in different languages. In this paper we present a methodology for multilingual automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. We describe a three-step AIG approach where, first, test development specialists identify the content that will be used for item generation. Next, the specialists create item models to specify the content in the assessment task that must be manipulated to produce new items. Finally, elements in the item model are manipulated with computer algorithms to produce new items. Language is added in the item model step to permit multilingual AIG. We illustrate our method by generating 360 English and 360 French medical education items. The importance of item banking in multilingual test development is also discussed.Les agences d’évaluation ont besoin d’un grand nombre d’items de première qualité produits de façon rapide et économique, et de plus en plus souvent dans différentes langues. Dans cet article, une méthodologie de génération automatique d’items (AIG) multilingues est proposée. L’AIG correspond au processus d’utilisation de modèles d’items dans le but de générer les items d’un test à l’aide de la technologie informatique. Une approche AIG en trois étapes est décrite, dans laquelle les spécialistes en développement de test doivent d’abord identifier le contenu qui sera utilisé pour générer les items. Par la suite, ces spécialistes créent des modèles d’items afin de préciser le contenu de la tâche d’évaluation qui doit être manipulée pour produire de nouveaux items. Enfin, les éléments du modèle d’items sont manipulés à l’aide d’algorithmes informatiques pour générer de nouveaux items. L’ajout des langues désirées à l’étape de création des modèles d’items permet d’effectuer une génération automatique d’items multilingues. Cette méthode est illustrée en générant 360 items en français et 360 items en anglais dans le domaine de la formation médicale. L’importance de créer des banques d’items lors du développement de tests multilingues est également discutée.As agências de avaliação precisam de um grande número de itens de primeira qualidade produzidos de forma rápida e económica, e, cada vez mais, em diferentes línguas. Neste artigo, é proposta uma metodologia para a geração automática de itens (AIG) multilingues. A AIG é o processo de utilização de modelos de itens com a finalidade de gerar itens de um teste com o apoio da tecnologia informática. Descreve-se uma abordagem AIG em três etapas, na qual os especialistas em desenvolvimento de testes devem identificar, desde logo, o conteúdo que será utilizado para gerar os itens. De seguida, estes especialistas criam os modelos de itens para especificar o conteúdo da tarefa de avaliação que deve ser manipulado para produzir novos itens. Finalmente, os elementos do modelo de itens são manipulados usando algoritmos informáticos para gerar novos itens. Adicionando as línguas desejadas à etapa de criação de modelos de itens é possível efetuar a geração automática de itens multilingues. Este método é ilustrado através da geração de 360 itens em francês e 360 itens em inglês no campo da formação médica. Discute-se também a importância da criação de bancos de itens no desenvolvimento de testes multilingues

    The learning sciences in educational assessment : the role of cognitive models

    No full text
    270 hlm.; 23 cm
    • …
    corecore