Building a Better Model: Variable Selection to Predict Poverty in Pakistan and Sri Lanka

Abstract

Numerous studies have developed models to predict poverty, but surprisingly few have rigorously examined different approaches to developing prediction models. This paper applies out of sample validation techniques to household data from Pakistan and Sri Lanka, to compare the accuracy of regional poverty predictions from models derived using manual selection, stepwise regression, and Lasso-based procedures. It also examines how much incorporating publically available satellite data into the model improves its accuracy. The five main findings are that: 1) Lasso tends to outperform both discretionary and stepwise models in Pakistan, where the set of potential predictors is large. 2) Lasso and stepwise models give comparable results in Sri Lanka, where the set of predictors is smaller. 3) The accuracy of the prediction model depends considerably on the poverty threshold 4) Including publically available satellite data makes poverty predictions more accurate in Sri Lanka, where predictors are scarce, but slightly less accurate in Pakistan and 5) Including the satellite data increases the benefit of using Lasso in Sri Lanka. We conclude that among the three model selection methods considered, lasso-based models are preferred for generating poverty predictions, especially when the pool of candidate variables is large. Furthermore, when the pool of candidate variables available from household surveys is smaller, incorporating publicly available satellite data can considerably improve the accuracy of regional poverty predictions

    Similar works