Outcome tests are a popular method for detecting bias in lending, hiring, and
policing decisions. These tests operate by comparing the success rate of
decisions across groups. For example, if loans made to minority applicants are
observed to be repaid more often than loans made to whites, it suggests that
only exceptionally qualified minorities are granted loans, indicating
discrimination. Outcome tests, however, are known to suffer from the problem of
infra-marginality: even absent discrimination, the repayment rates for minority
and white loan recipients might differ if the two groups have different risk
distributions. Thus, at least in theory, outcome tests can fail to accurately
detect discrimination. We develop a new statistical test of
discrimination---the threshold test---that mitigates the problem of
infra-marginality by jointly estimating decision thresholds and risk
distributions via a hierarchical Bayesian latent variable model. Applying our
test to a dataset of 4.5 million police stops in North Carolina, we find that
the problem of infra-marginality is more than a theoretical possibility, and
can cause the outcome test to yield misleading results in practice.Comment: To appear in The Annals of Applied Statistics, 201