23 research outputs found
On the Ground Validation of Online Diagnosis with Twitter and Medical Records
Social media has been considered as a data source for tracking disease.
However, most analyses are based on models that prioritize strong correlation
with population-level disease rates over determining whether or not specific
individual users are actually sick. Taking a different approach, we develop a
novel system for social-media based disease detection at the individual level
using a sample of professionally diagnosed individuals. Specifically, we
develop a system for making an accurate influenza diagnosis based on an
individual's publicly available Twitter data. We find that about half (17/35 =
48.57%) of the users in our sample that were sick explicitly discuss their
disease on Twitter. By developing a meta classifier that combines text
analysis, anomaly detection, and social network analysis, we are able to
diagnose an individual with greater than 99% accuracy even if she does not
discuss her health.Comment: Presented at of WWW2014. WWW'14 Companion, April 7-11, 2014, Seoul,
Kore
On the Ground Validation of Online Diagnosis with Twitter and Medical Records
Social media has been considered as a data source for tracking disease.
However, most analyses are based on models that prioritize strong correlation
with population-level disease rates over determining whether or not specific
individual users are actually sick. Taking a different approach, we develop a
novel system for social-media based disease detection at the individual level
using a sample of professionally diagnosed individuals. Specifically, we
develop a system for making an accurate influenza diagnosis based on an
individual's publicly available Twitter data. We find that about half (17/35 =
48.57%) of the users in our sample that were sick explicitly discuss their
disease on Twitter. By developing a meta classifier that combines text
analysis, anomaly detection, and social network analysis, we are able to
diagnose an individual with greater than 99% accuracy even if she does not
discuss her health.Comment: Presented at of WWW2014. WWW'14 Companion, April 7-11, 2014, Seoul,
Kore
Social media use among American Indians in South Dakota: Preferences and perceptions
Social media use data is widely being used in health, psychology, and
marketing research to analyze human behavior. However, we have very limited
knowledge on social media use among American Indians. In this context, this
study was designed to assess preferences and perceptions of social media use
among American Indians during COVID-19. We collected data from American Indians
in South Dakota using online survey. Results show that Facebook, YouTube,
TikTok, Instagram and Snapchat are the most preferred social media platforms.
Most of the participants reported that the use of social media increased
tremendously during COVID-19 and had perceptions of more negative effects than
positive effects. Hate/harassment/extremism, misinformation/made up news, and
people getting one point of view were the top reasons for negative effects.Comment: 20 pages, 6 figures, 2 Tables, Appendix Tables (7
Big data values deliverance: OSS model
Open source software (OSS) repositories, like GitHub, conjointly build numerous big data projects. GitHub developers and/or its responders extend/enhance a project’s software capabilities. Over time, GitHub’s repositories are mined for new knowledge and capabilities. This study’s values-deliverance staging system data mines, isolates, collates and incorporates relevant GitHub text into values deliverance model constructs. This suggests differential construct effects influence a project’s activities levels. The study suggests OSS big data platforms can be software data mined to isolate and assess the values embedded. This also elucidates pathways where behavioral values deliverance improvements to GitHub can likely be most beneficial
Estimating influenza incidence using search query deceptiveness and generalized ridge regression
Seasonal influenza is a sometimes surprisingly impactful disease, causing
thousands of deaths per year along with much additional morbidity. Timely
knowledge of the outbreak state is valuable for managing an effective response.
The current state of the art is to gather this knowledge using in-person
patient contact. While accurate, this is time-consuming and expensive. This has
motivated inquiry into new approaches using internet activity traces, based on
the theory that lay observations of health status lead to informative features
in internet data.
These approaches risk being deceived by activity traces having a
coincidental, rather than informative, relationship to disease incidence; to
our knowledge, this risk has not yet been quantitatively explored. We evaluated
both simulated and real activity traces of varying deceptiveness for influenza
incidence estimation using linear regression.
We found that deceptiveness knowledge does reduce error in such estimates,
that it may help automatically-selected features perform as well or better than
features that require human curation, and that a semantic distance measure
derived from the Wikipedia article category tree serves as a useful proxy for
deceptiveness. This suggests that disease incidence estimation models should
incorporate not only data about how internet features map to incidence but also
additional data to estimate feature deceptiveness. By doing so, we may gain one
more step along the path to accurate, reliable disease incidence estimation
using internet data. This capability would improve public health by decreasing
the cost and increasing the timeliness of such estimates.Comment: 27 pages, 8 figure