21 research outputs found

    A multiple objective test assembly approach for exposure control problems in computerized adaptive testing

    Get PDF
    Overexposure and underexposure of items in the bank are serious problems in operational computerized adaptive testing (CAT) systems. These exposure problems might result in item compromise, or point at a waste of investments. The exposure control problem can be viewed as a test assembly problem with multiple objectives. Information in the test has to be maximized, item compromise has to be minimized, and pool usage has to be optimized. In this paper, a multiple objectives method is developed to deal with both types of exposure problems. In this method, exposure control parameters based on observed exposure rates are implemented as weights for the information in the item selection procedure. The method does not need time consuming simulation studies, and it can be implemented conditional on ability level. The method is compared with Sympson Hetter method for exposure control, with the Progressive method and with alphastratified testing. The results show that the method is successful in dealing with both kinds of exposure problems

    Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

    Get PDF
    We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance (p < .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion (p < .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely highpowered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (< 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied.UCR::Vicerrectoría de Investigación::Unidades de Investigación::Ciencias Sociales::Instituto de Investigaciones Psicológicas (IIP

    Improvement of Measurement Efficiency in Multistage Tests by Targeted Assignment

    Get PDF
    A good match between item difficulty and student ability ensures efficient measurement and prevents students from becoming discouraged or bored by test items that are too easy or too difficult. Targeted test designs consider ability-related background variables to assign students to matching test forms. However, these designs do not consider that students might significantly differ in ability within the resulting groups. In contrast, multistage test designs consider students' performance during test taking to route them to the most informative modules. Yet, multistage test designs usually include one starting module of moderate difficulty in the first stage, which does not account for differences in ability. In this paper, we investigated whether measurement efficiency can be improved by targeted multistage test designs that consider ability-related background information for a targeted assignment at the beginning of the test and performance during test taking for selecting matching test modules. By means of simulations, we compared the efficiency of the traditional targeted test design, the multistage test (MST) design, and the targeted multistage test (TMST) design for estimating student ability. Furthermore, we analyzed the extent to which the efficiency of the different designs depends on the correlation between the ability-related background variable and the true ability, students' ability level and their categorization into an ability group, and the length of the starting module. The results indicated that TMST designs were generally more efficient for estimating student ability than targeted test designs and MST designs, especially if the ability-related background variable correlated high with and, thus, was a good indicator of, students' true ability. Furthermore, TMST designs were particularly efficient in estimating abilities for low- and high-ability students within a given population. Finally, very long starting modules resulted in less efficient estimation of low and high abilities than shorter starting modules. However, this finding was more prominent for MST than for TMST designs. In conclusion, TMST designs are recommended for assessing students from a wide ability distribution if a reliable ability-related background variable is available

    Infeasibility in automatic test assembly models: a comparison study of different methods

    Get PDF
    Several techniques exist to automatically put together a test meeting a number of specifications. In an item bank, the items are stored with their characteristics. A test is constructed by selecting a set of items that fulfills the specifications set by the test assembler. Test assembly problems are often formulated in terms of a model consisting of restrictions and an objective to be maximized or minimized. A problem arises when it is impossible to construct a test from the item pool that meets all specifications, that is, when the model is not feasible. Several methods exist to handle these infeasibility problems. In this article, test assembly models resulting from two practical testing programs were reconstructed to be infeasible. These models were analyzed using methods that forced a solution (Goal Programming, Multiple-Goal Programming, Greedy Heuristic), that analyzed the causes (Relaxed and Ordered Deletion Algorithm (RODA), Integer Randomized Deletion Algorithm (IRDA), Set Covering (SC), and Item Sampling), or that analyzed the causes and used this information to force a solution (Irreducible Infeasible Set-Solver). Specialized methods such as the IRDA and the Irreducible Infeasible Set-Solver performed best. Recommendations about the use of different methods are given

    Microneutralization (MN) and hemagglutination inhibition (HAI) titres are highly correlated for seasonal influenza subtypes H1N1 and H3N2.

    No full text
    <p>Log-transformed MN and HAI titres for subtypes A) H1N1 (A/Brisbane/59/2007) and B) H3N2 (A/Brisbane/10/2007) are presented, along with the significance of correlation as determined by linear regression.</p
    corecore