Valid comparisons of group scores on additive measures such as political knowledge scales require that the conditional response probabilities for individuals on the observed items be invariant across groups after controlling for their overall level of the latent trait of interest. Using a multi-group confirmatory factor analysis of knowledge items drawn from American National Election Studies, we find that the scales used in recent research are not sufficiently invariant for valid comparisons across a host of theoretically important grouping variables. We demonstrate that it is possible to construct valid invariant scales using a subset of items and show the impact of invariance by comparing results from the valid and invalid scales. We provide an analysis of differential item functioning (DIF) based on grouping variables commonly used in political science research to explore the utility of each item in the construction of valid knowledge scales. An application of the vanishing tetrad test suggests it is more appropriate to conceive of these items as effects of a latent variable rather than cause or formative indicators. These results suggest that models attempting to explain apparent knowledge gaps between subgroups have been unsuccessful because previously constructed scales were validated by fiat