108 research outputs found

    Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing examinations.

    Get PDF
    Abstract Computer-based testing with many credentialing examination agencies has become a common occurence. At the same time, selecting a test design is difficult because several are available-parallel-forms, computer-adaptive (CAT), and multi-stage (MST), and the merits of these designs interact with exam conditions. These conditions include item quality, bank size, candidate score distribution, placement of the passing score, exam length, and more. In this study three popular computer-based test designs under some common examination conditions were investigated using computer simulation techniques. Item quality and bank size were varied. The results from the study were clear: both item bank size and item quality had a practically significant impact on decision consistency and accuracy. Interestingly, even in nearly ideal situations, the choice of test design was not a factor in the results. Two conclusions seem to follow from the findings: (1) more time and resources should be committed to expanding both the size and quality of item banks, and (2) designs that individualize an exam administration such as MST and CAT, may not be especially helpful when the primary purpose of an examination is to make pass-fail decisions, and conditions are present for using parallel-forms of examinations with a target information function that can be centered at the passing score. Obviously, the validity of these conclusions needs to be thoroughly checked with additional simulations and real data

    NAEP State Reports in Mathematics: Valuable Information for Monitoring Education Reform

    Get PDF
    The National Assessment of Educational Progress (NAEP), a congressionally mandated program, can provide valuable data to educational policymakers in Massachusetts and other New England states about the status of their educational reform initiatives and their performance standards. The three purposes of this article are to describe NAEP and its goals and structure, to present some of the results of the 1992 Mathematics NAEP Assessment as an example of the utility of this national assessment program, and to highlight ways in which background data collected by NAEP can be helpful in interpreting assessment results and monitoring educational reform. The six New England states aspire to performance standards that approximate national and international standards of excellence. NAEP, which provides an excellent database to influence the standard-setting process, therefore should be of considerable interest to policymakers who are serious about setting meaningful performance standards and monitoring the quality of educational progress

    Advances in item response theory and applications: an introduction

    Get PDF
    Test theories can be divided roughly into two categories. The first is classical test theory, which dates back to Spearman’s conception of the observed test score as a composite of true and error components, and which was introduced to psychologists at the beginning of this century. Important milestones in its long and venerable tradition are Gulliksen’s Theory of Mental Tests (1950) and Lord and Novick’s Statistical Theories of Mental Test Scores (1968). The second is item response theory, or latent trait theory, as it has been called until recently. At the present time, item response theory (IRT) is having a major impact on the field of testing. Models derived from IRT are being used to develop tests, to equate scores from nonparallel tests, to investigate item bias, and to report scores, as well as to address many other pressing measurement problems (see, e.g., Hambleton, 1983; Lord, 1980). IRT differs from classical test theory in that it assumes a different relation of the test score to the variable measured by the test. Although there are parallels between models from IRT and psychophysical models formulated around the turn of the century, only in the last 10 years has IRT had any impact on psychometricians and test users. Work by Rasch (1980/1960), Fischer (1974), 9 Birnbaum (1968), ivrighi and Panchapakesan (1969), Bock (1972), and Lord (1974) has been especially influential in this turnabout; and Lazarsfeld’s pioneering work on latent structure analysis in sociology (Lazarsfeld, 1950; Lazarsfeld & Henry, 1968) has also provided impetus. One objective of this introduction is to review the conceptual differences between classical test theory and IRT. A second objective is to introduce the goals of this special issue on item response theory and the seven papers. Some basic problems with classical test theory are reviewed in the next section. Then, IRT approaches to educational and psychological measurement are presented and compared to classical test theory. The final two sections present the goals for this special issue and an outline of the seven invited papers

    An Application of Item Response Theory to Psychological Test Development

    Get PDF
    Item response theory (IRT) has become a popular methodological framework for modeling response data from assessments in education and health; however, its use is not widespread among psychologists. This paper aims to provide a didactic application of IRT and to highlight some of these advantages for psychological test development. IRT was applied to two scales (a positive and a negative affect scale) of a self-report test. Respondents were 853 university students (57 % women) between the ages of 17 and 35 and who answered the scales. IRT analyses revealed that the positive affect scale has items with moderate discrimination and are measuring respondents below the average score more effectively. The negative affect scale also presented items with moderate discrimination and are evaluating respondents across the trait continuum; however, with much less precision. Some features of IRT are used to show how such results can improve the measurement of the scales. The authors illustrate and emphasize how knowledge of the features of IRT may allow test makers to refine and increase the validity and reliability of other psychological measures

    International Test Commission guidelines for test adaptation: A criterion checklist

    Get PDF
    Background: To improve the quality of test translation and adaptation, and hence the comparability of scores across cultures, the International Test Commission (ITC) proposed a number of guidelines for the adaptation process. Although these guidelines are well-known, they are not implemented as often as they should be. One possible reason for this is the broad scope of the guidelines, which makes them difficult to apply in practice. The goal of this study was therefore to draw up an evaluative criterion checklist that would help test adapters to implement the ITC recommendations and which would serve as a model for assessing the quality of test adaptations. Method: Each ITC guideline was operationalized through a number of criteria. For each criterion, acceptable and excellent levels of accomplishment were proposed. The initial checklist was then reviewed by a panel of 12 experts in testing and test adaptation. The resulting checklist was applied to two different tests by two pairs of independent reviewers. Results: The final evaluative checklist consisted of 29 criteria covering all phases of test adaptation: planning, development, confirmation, administration, score interpretation, and documentation. Conclusions: We believe that the proposed evaluative checklist will help to improve the quality of test adaptation

    Profiles of Mathematics Anxiety Among 15-Year-Old Students: A Cross-Cultural Study Using Multi-Group Latent Profile Analysis

    Get PDF
    Using PISA 2012 data, the present study explored profiles of mathematics anxiety (MA) among 15-year old students from Finland, Korea, and the United States to determine the similarities and differences of MA across the three national samples by applying a multi-group latent profile analysis (LPA). The major findings were that (a) three MA profiles were found in all three national samples, i.e., Low MA, Mid MA, and High MA profile, and (b) the percentages of students classified into each of the three MA profiles differed across the Finnish, Korean, and American samples, with United States having the highest prevalence of High MA, and Finland the lowest. Multi-group LPA also provided clear and useful latent profile separation. The High MA profile demonstrated significant poorer mathematics performance and lower mathematics interest, self-efficacy, and self-concept than the Mid and Low MA profiles. Same differences appeared between the Mid and Low MA profiles. The implications of the findings seem clear: (1) it is possible that there is some relative level of universality in MA among 15-year old students which is independent of cultural context; and (2) multi-group LPA could be a useful analytic tool for research on the study of classification and cultural differences of MA

    Envelope Determinants of Equine Lentiviral Vaccine Protection

    Get PDF
    Lentiviral envelope (Env) antigenic variation and associated immune evasion present major obstacles to vaccine development. The concept that Env is a critical determinant for vaccine efficacy is well accepted, however defined correlates of protection associated with Env variation have yet to be determined. We reported an attenuated equine infectious anemia virus (EIAV) vaccine study that directly examined the effect of lentiviral Env sequence variation on vaccine efficacy. The study identified a significant, inverse, linear correlation between vaccine efficacy and increasing divergence of the challenge virus Env gp90 protein compared to the vaccine virus gp90. The report demonstrated approximately 100% protection of immunized ponies from disease after challenge by virus with a homologous gp90 (EV0), and roughly 40% protection against challenge by virus (EV13) with a gp90 13% divergent from the vaccine strain. In the current study we examine whether the protection observed when challenging with the EV0 strain could be conferred to animals via chimeric challenge viruses between the EV0 and EV13 strains, allowing for mapping of protection to specific Env sequences. Viruses containing the EV13 proviral backbone and selected domains of the EV0 gp90 were constructed and in vitro and in vivo infectivity examined. Vaccine efficacy studies indicated that homology between the vaccine strain gp90 and the N-terminus of the challenge strain gp90 was capable of inducing immunity that resulted in significantly lower levels of post-challenge virus and significantly delayed the onset of disease. However, a homologous N-terminal region alone inserted in the EV13 backbone could not impart the 100% protection observed with the EV0 strain. Data presented here denote the complicated and potentially contradictory relationship between in vitro virulence and in vivo pathogenicity. The study highlights the importance of structural conformation for immunogens and emphasizes the need for antibody binding, not neutralizing, assays that correlate with vaccine protection. © 2013 Craigo et al

    Contributions of mean and shape of blood pressure distribution to worldwide trends and variations in raised blood pressure: A pooled analysis of 1018 population-based measurement studies with 88.6 million participants

    Get PDF
    © The Author(s) 2018. Background: Change in the prevalence of raised blood pressure could be due to both shifts in the entire distribution of blood pressure (representing the combined effects of public health interventions and secular trends) and changes in its high-blood-pressure tail (representing successful clinical interventions to control blood pressure in the hypertensive population). Our aim was to quantify the contributions of these two phenomena to the worldwide trends in the prevalence of raised blood pressure. Methods: We pooled 1018 population-based studies with blood pressure measurements on 88.6 million participants from 1985 to 2016. We first calculated mean systolic blood pressure (SBP), mean diastolic blood pressure (DBP) and prevalence of raised blood pressure by sex and 10-year age group from 20-29 years to 70-79 years in each study, taking into account complex survey design and survey sample weights, where relevant. We used a linear mixed effect model to quantify the association between (probittransformed) prevalence of raised blood pressure and age-group- and sex-specific mean blood pressure. We calculated the contributions of change in mean SBP and DBP, and of change in the prevalence-mean association, to the change in prevalence of raised blood pressure. Results: In 2005-16, at the same level of population mean SBP and DBP, men and women in South Asia and in Central Asia, the Middle East and North Africa would have the highest prevalence of raised blood pressure, and men and women in the highincome Asia Pacific and high-income Western regions would have the lowest. In most region-sex-age groups where the prevalence of raised blood pressure declined, one half or more of the decline was due to the decline in mean blood pressure. Where prevalence of raised blood pressure has increased, the change was entirely driven by increasing mean blood pressure, offset partly by the change in the prevalence-mean association. Conclusions: Change in mean blood pressure is the main driver of the worldwide change in the prevalence of raised blood pressure, but change in the high-blood-pressure tail of the distribution has also contributed to the change in prevalence, especially in older age groups
    corecore