23 research outputs found

    On the Ground Validation of Online Diagnosis with Twitter and Medical Records

    Full text link
    Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health.Comment: Presented at of WWW2014. WWW'14 Companion, April 7-11, 2014, Seoul, Kore

    On the Ground Validation of Online Diagnosis with Twitter and Medical Records

    Full text link
    Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health.Comment: Presented at of WWW2014. WWW'14 Companion, April 7-11, 2014, Seoul, Kore

    Social media use among American Indians in South Dakota: Preferences and perceptions

    Full text link
    Social media use data is widely being used in health, psychology, and marketing research to analyze human behavior. However, we have very limited knowledge on social media use among American Indians. In this context, this study was designed to assess preferences and perceptions of social media use among American Indians during COVID-19. We collected data from American Indians in South Dakota using online survey. Results show that Facebook, YouTube, TikTok, Instagram and Snapchat are the most preferred social media platforms. Most of the participants reported that the use of social media increased tremendously during COVID-19 and had perceptions of more negative effects than positive effects. Hate/harassment/extremism, misinformation/made up news, and people getting one point of view were the top reasons for negative effects.Comment: 20 pages, 6 figures, 2 Tables, Appendix Tables (7

    Big data values deliverance: OSS model

    Get PDF
    Open source software (OSS) repositories, like GitHub, conjointly build numerous big data projects. GitHub developers and/or its responders extend/enhance a project’s software capabilities. Over time, GitHub’s repositories are mined for new knowledge and capabilities. This study’s values-deliverance staging system data mines, isolates, collates and incorporates relevant GitHub text into values deliverance model constructs. This suggests differential construct effects influence a project’s activities levels. The study suggests OSS big data platforms can be software data mined to isolate and assess the values embedded. This also elucidates pathways where behavioral values deliverance improvements to GitHub can likely be most beneficial

    Estimating influenza incidence using search query deceptiveness and generalized ridge regression

    Full text link
    Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.Comment: 27 pages, 8 figure
    corecore