Weighting Test Samples in IRT Linking and Equating: Toward an Improved Sampling Design for Complex Equating ETS Research Report Series EIGNOR EXECUTIVE EDITOR Weighting Test Samples in IRT Linking and Equating: Toward an Improved Sampling Design for Compl

Abstract

Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. SAT is a registered trademark of the College Board. i Abstract Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal samples of examinees so that the IRT linking and equating can be more precise at an administration level as well as over a large number of administrations? This is a question of optimal sampling design in linking and equating. To obtain an improved sampling design for invariant linking and equating across testing administrations, we applied weighting techniques to yield a weighted sample distribution that is consistent with the target population distribution. The goal is to obtain a stable StockingLord test characteristic curve (TCC) linking and a true-score equating that is invariant across administrations. To study the weighting effects on linking, we first selected multiple subsamples from a data set. We then compared the linking parameters from subsamples with those from the data and examined whether the linking parameters from the weighted sample yielded smaller mean square errors (MSE) than those from the unweighted subsample. To study the weighting effects on true-score equating, we also compared the distributions of the equated scores. Generally, the findings were that the weighting produced good results

    Similar works

    Full text

    thumbnail-image

    Available Versions