5 research outputs found

    Item-Score Reliability as a Selection Tool in Test Construction

    Get PDF
    This study investigates the usefulness of item-score reliability as a criterion for item selection in test construction. Methods MS, 位6, and CA were investigated as item-assessment methods in item selection and compared to the corrected item-total correlation, which was used as a benchmark. An ideal ordering to add items to the test (bottom-up procedure) or omit items from the test (top-down procedure) was defined based on the population test-score reliability. The orderings the four item-assessment methods produced in samples were compared to the ideal ordering, and the degree of resemblance was expressed by means of Kendall's 蟿. To investigate the concordance of the orderings across 1,000 replicated samples, Kendall's W was computed for each item-assessment method. The results showed that for both the bottom-up and the top-down procedures, item-assessment method CA and the corrected item-total correlation most closely resembled the ideal ordering. Generally, all item assessment methods resembled the ideal ordering better, and concordance of the orderings was greater, for larger sample sizes, and greater variance of the item discrimination parameters

    Methods for Estimating Item-Score Reliability

    Get PDF
    Reliability is usually estimated for a test score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the item鈥檚 contribution to the test score鈥檚 reliability, for identifying unreliable scores in aberrant item-score patterns in person-fit analysis, and for selecting the most reliable item from a test to use as a single-item measure. Four methods were discussed for estimating item-score reliability: the Molenaar鈥揝ijtsma method (method MS), Guttman鈥檚 method 位6, the latent class reliability coefficient (method LCRC), and the correction for attenuation (method CA). A simulation study was used to compare the methods with respect to median bias, variability (interquartile range [IQR]), and percentage of outliers. The simulation study consisted of six conditions: standard, polytomous items, unequal 伪 parameters, two-dimensional data, long test, and small sample size. Methods MS and CA were the most accurate. Method LCRC showed almost unbiased results, but large variability. Method 位6 consistently underestimated item-score reliabilty, but showed a smaller IQR than the other methods

    Item-score reliability in empirical-data sets and its relationship with other item indices

    Get PDF
    Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method 位6, and method CA. The item-score reliability methods are compared with four well-known and widely accepted item indices, which are the item-rest correlation, the item-factor loading, the item scalability, and the item discrimination. Realistic values for item-score reliability in empirical-data sets are monitored to obtain an impression of the values to be expected in other empirical-data sets. The relation between the three item-score reliability methods and the four well-known item indices are investigated. Tentatively, a minimum value for the item-score reliability methods to be used in item analysis is recommended. Keywords Coefficient 位6, correction for attenuation, item discrimination, item-factor loading, item-rest correlation, item scalability, item-score reliabilit
    corecore