Estimating county health statistics with twitter

Abstract

Understanding the relationships among environment, behav-ior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, lit-tle work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insur-ance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a signifi-cant correlation with 6 of the 27 health statistics. When com-pared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statis-tics, suggesting that this new methodology can complement existing approaches

    Similar works

    Full text

    thumbnail-image

    Available Versions