International surveys of educational achievement and functional literacy are increasingly common. We consider two aspects of the robustness of their results. First, we compare results from four surveys: TIMSS, PISA, PIRLS and IALS. This contrasts with the standard approach which is to analyse just one survey in isolation. Second, we investigate whether results are sensitive to the choice of item response model used by survey organisers to aggregate respondents’ answers into a single score. In both cases we focus on countries’ average scores, the within-country differences in scores, and on the association between the two
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.