5,345 research outputs found
Estimating county health statistics with twitter
Understanding the relationships among environment, behav-ior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, lit-tle work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insur-ance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a signifi-cant correlation with 6 of the 27 health statistics. When com-pared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statis-tics, suggesting that this new methodology can complement existing approaches
Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions
Opinion mining and demographic attribute inference have many applications in
social science. In this paper, we propose models to infer daily joint
probabilities of multiple latent attributes from Twitter data, such as
political sentiment and demographic attributes. Since it is costly and
time-consuming to annotate data for traditional supervised classification, we
instead propose scalable Learning from Label Proportions (LLP) models for
demographic and opinion inference using U.S. Census, national and state
political polls, and Cook partisan voting index as population level data. In
LLP classification settings, the training data is divided into a set of
unlabeled bags, where only the label distribution in of each bag is known,
removing the requirement of instance-level annotations. Our proposed LLP model,
Weighted Label Regularization (WLR), provides a scalable generalization of
prior work on label regularization to support weights for samples inside bags,
which is applicable in this setting where bags are arranged hierarchically
(e.g., county-level bags are nested inside of state-level bags). We apply our
model to Twitter data collected in the year leading up to the 2016 U.S.
presidential election, producing estimates of the relationships among political
sentiment and demographics over time and place. We find that our approach
closely tracks traditional polling data stratified by demographic category,
resulting in error reductions of 28-44% over baseline approaches. We also
provide descriptive evaluations showing how the model may be used to estimate
interactions among many variables and to identify linguistic temporal
variation, capabilities which are typically not feasible using traditional
polling methods
Understanding and Measuring Psychological Stress using Social Media
A body of literature has demonstrated that users' mental health conditions,
such as depression and anxiety, can be predicted from their social media
language. There is still a gap in the scientific understanding of how
psychological stress is expressed on social media. Stress is one of the primary
underlying causes and correlates of chronic physical illnesses and mental
health conditions. In this paper, we explore the language of psychological
stress with a dataset of 601 social media users, who answered the Perceived
Stress Scale questionnaire and also consented to share their Facebook and
Twitter data. Firstly, we find that stressed users post about exhaustion,
losing control, increased self-focus and physical pain as compared to posts
about breakfast, family-time, and travel by users who are not stressed.
Secondly, we find that Facebook language is more predictive of stress than
Twitter language. Thirdly, we demonstrate how the language based models thus
developed can be adapted and be scaled to measure county-level trends. Since
county-level language is easily available on Twitter using the Streaming API,
we explore multiple domain adaptation algorithms to adapt user-level Facebook
models to Twitter language. We find that domain-adapted and scaled social
media-based measurements of stress outperform sociodemographic variables (age,
gender, race, education, and income), against ground-truth survey-based stress
measurements, both at the user- and the county-level in the U.S. Twitter
language that scores higher in stress is also predictive of poorer health, less
access to facilities and lower socioeconomic status in counties. We conclude
with a discussion of the implications of using social media as a new tool for
monitoring stress levels of both individuals and counties.Comment: Accepted for publication in the proceedings of ICWSM 201
Correcting Sociodemographic Selection Biases for Population Prediction from Social Media
Social media is increasingly used for large-scale population predictions,
such as estimating community health statistics. However, social media users are
not typically a representative sample of the intended population -- a
"selection bias". Within the social sciences, such a bias is typically
addressed with restratification techniques, where observations are reweighted
according to how under- or over-sampled their socio-demographic groups are.
Yet, restratifaction is rarely evaluated for improving prediction. Across four
tasks of predicting U.S. county population health statistics from Twitter, we
find standard restratification techniques provide no improvement and often
degrade prediction accuracies. The core reasons for this seems to be both
shrunken estimates (reduced variance of model predicted values) and sparse
estimates of each population's socio-demographics. We thus develop and evaluate
three methods to address these problems: estimator redistribution to account
for shrinking, and adaptive binning and informed smoothing to handle sparse
socio-demographic estimates. We show that each of these methods significantly
outperforms the standard restratification approaches. Combining approaches, we
find substantial improvements over non-restratified models, yielding a 53.0%
increase in predictive accuracy (R^2) in the case of surveyed life
satisfaction, and a 17.8% average increase across all tasks
Effectiveness of a Faith-placed Cardiovascular Health Promotion Intervention for Rural Adults
Introduction: Cardiovascular disease (CVD) is the leading cause of mortality in the US. Further, rural US adults experience disproportionately high CVD prevalence and mortality compared to non-rural. Cardiovascular risk-reduction interventions for rural adults have shown short-term effectiveness, but long-term maintenance of outcomes remains a challenge. Faith organizations offer promise as collaborative partners for translating evidence-based interventions to reduce CVD.
Methods: We adapted and implemented a collaborative, faith-placed, CVD risk-reduction intervention in rural Illinois. We used a quasi-experimental, pre-post design to compare changes in dietary and physical activity among participants. Intervention components included Heart Smart for Women (HSFW), an evidence-based program implemented weekly for 12 weeks followed by Heart Smart Maintenance (HSM), implemented monthly for two years. Participants engaged in HSFW only, HSM only, or both. We used regression and generalized estimating equations models to examine changes in outcomes after one year.
Results: Among participants who completed both baseline and one-year surveys (n = 131), HSFW+HSM participants had significantly higher vegetable consumption (p = .007) and combined fruit/vegetable consumption (p = .01) compared to the HSM-only group at one year. We found no differences in physical activity.
Conclusion: Improving and maintaining CVD-risk behaviors is a persistent challenge in rural populations. Advancing research to improve our understanding of effective translation of CVD risk-reduction interventions in rural populations is critical
360 Quantified Self
Wearable devices with a wide range of sensors have contributed to the rise of
the Quantified Self movement, where individuals log everything ranging from the
number of steps they have taken, to their heart rate, to their sleeping
patterns. Sensors do not, however, typically sense the social and ambient
environment of the users, such as general life style attributes or information
about their social network. This means that the users themselves, and the
medical practitioners, privy to the wearable sensor data, only have a narrow
view of the individual, limited mainly to certain aspects of their physical
condition.
In this paper we describe a number of use cases for how social media can be
used to complement the check-up data and those from sensors to gain a more
holistic view on individuals' health, a perspective we call the 360 Quantified
Self. Health-related information can be obtained from sources as diverse as
food photo sharing, location check-ins, or profile pictures. Additionally,
information from a person's ego network can shed light on the social dimension
of wellbeing which is widely acknowledged to be of utmost importance, even
though they are currently rarely used for medical diagnosis. We articulate a
long-term vision describing the desirable list of technical advances and
variety of data to achieve an integrated system encompassing Electronic Health
Records (EHR), data from wearable devices, alongside information derived from
social media data.Comment: QCRI Technical Repor
Deer Herd Management Using the Internet: A Comparative Study of California Targeted By Data Mining the Internet
An ongoing project to investigate the use of the internet as an information source for decision support identified the decline of the California deer population as a significant issue. Using Google Alerts, an automated keyword search tool, text and numerical data were collected from a daily internet search and categorized by region and topic to allow for identification of information trends. This simple data mining approach determined that California is one of only four states that do not currently report total, finalized deer harvest (kill) data online and that it is the only state that has reduced the amount of information made available over the internet in recent years. Contradictory information identified by the internet data mining prompted the analysis described in this paper indicating that the graphical information presented on the California Fish and Wildlife website significantly understates the severity of the deer population decline over the past 50 years. This paper presents a survey of how states use the internet in their deer management programs and an estimate of the California deer population over the last 100 years. It demonstrates how any organization can use the internet for data collection and discovery
- …