slides
Correlation metric
- Publication date
- Publisher
Abstract
The logit model is a widely used regression technique in social research. However, the use and interpretation of coefficients from logit models have proven contentious. Problems arise because the mean and the variance of discrete variables cannot be separated. Logit coefficients are identified relative to an arbitrary scale, which makes the coefficients difficult both to interpret and to compare across groups or samples. Do differences in coefficients reflect true differences or differences in scales? This cross-sample comparison problem raises concerns for comparative research. However, we suggest a new correlation metric, derived from logit models, which gives new interpretation to the estimates of logit models (log odds-ratios). The metric leads the way to a reorientation of the use of logit models, because it helps to clarify what logit coefficients are and how and when logit coefficients can (or cannot) be used in comparative research. The metric recovers the correlation between a predictor variable x and a continuous latent outcome variable y* assumed to underlie a binary observed outcome y. This metric is truly invariant to differences in the marginal distributions of x and y* across groups or samples, making it suitable for situations met in real applications in comparative research. Our derivations also extend to the probit and to ordered and multinomial models. The new metric is implemented in the Stata command nlcorr.