The current study investigated multistage testing (MST) as an alternative to classical linear testing (CLT) for the General Aptitude Test (GAT). The aim was to assess the effects of two assembly methods (narrow vs. wide range—NR vs. WR), two routing methods (Defined Population Intervals—DPI— and the Approximate Maximum Information method—AMI), and two panel structures (two-stage and three-stage) on precision of ability estimates and accuracy of classification for both sections of the GAT (Verbal and Mathematics). Thus, eight conditions were examined and compared: 2 (assembly conditions) * 2 (panel structures) * 2 (routing methods).
The dataset that included a sample of 9,108 examinees was obtained from the National Center of Assessment, Saudi Arabia. The MST designs were evaluated with the criteria that the more accurate condition was the condition with the smallest standard error mean for ability estimates, and the highest agreement percentage of classification between CLT and MST.
Findings revealed trivial differences in the estimated ability and standard error means among all conditions, but the design influenced the correlations between MST and CLT ability estimates. The NR and the WR condition performed equally regarding accuracy of ability estimate and classification. The performance of the DPI and AMI were similar in precision of ability estimates, but the DPI performed better than AMI regarding classification accuracy in all conditions. The results indicated that the number of stages was important. The correlation coefficients between the examinees’ scores on MST-3Stage conditions and CLT were higher than the coefficients between examinees scores on MST-2Stage conditions and CLT.
Overall, MST can be an appropriate alternative to CLT and CAT when the MST designs are structured well using an optimal item pool. Factors such as assembling and routing methods did not have a substantial impact on the accuracy of ability estimates. That means there is flexibility to use either method—a simpler method would be as effective as a complex method. The number of stages had some impact on the precision of estimations; however, it is possible that increasing the number of items in the second stage MST-2Stage can compensate for differences. Two main recommendations from this study were: (a) the item pool should be satisfactory in MST regarding the coverage of content and range of item difficulty, and (b) the MST design with the simpler method and simpler panel structure and the complex design can perform equally. Thus, advice is to use a simpler approach and reduce effort and cost.
The main limitation of the current study was the small size of the item pool and the lack of hard and easy items. For future research, studies that compare the current combinations of various factors in different conditions of MST, using an optimal item pool, is needed to enhance the results. The influence of other factors, such as different panel structures of MST (1-2-3, 1-2-3-4), different routing, and other cut-scores can be examined to identify the optimal condition for MST and for GAT