7,472 research outputs found
Can Twitter be a source of information on allergy? Correlation of pollen counts with tweets reporting symptoms of allergic rhinoconjunctivitis and names of antihistamine drugs
Pollen forecasts are in use everywhere to inform therapeutic decisions for patients with allergic rhinoconjunctivitis (ARC). We exploited data derived from Twitter in order to identify tweets reporting a combination of symptoms consistent with a case definition of ARC and those reporting the name of an antihistamine drug. In order to increase the sensitivity of the system, we applied an algorithm aimed at automatically identifying jargon expressions related to medical terms. We compared weekly Twitter trends with National Allergy Bureau weekly pollen counts derived from US stations, and found a high correlation of the sum of the total pollen counts from each stations with tweets reporting ARC symptoms (Pearson's correlation coefficient: 0.95) and with tweets reporting antihistamine drug names (Pearson's correlation coefficient: 0.93). Longitude and latitude of the pollen stations affected the strength of the correlation. Twitter and other social networks may play a role in allergic disease surveillance and in signaling drug consumptions trends
Inferring individual attributes from search engine queries and auxiliary information
Internet data has surfaced as a primary source for investigation of different
aspects of human behavior. A crucial step in such studies is finding a suitable
cohort (i.e., a set of users) that shares a common trait of interest to
researchers. However, direct identification of users sharing this trait is
often impossible, as the data available to researchers is usually anonymized to
preserve user privacy. To facilitate research on specific topics of interest,
especially in medicine, we introduce an algorithm for identifying a trait of
interest in anonymous users. We illustrate how a small set of labeled examples,
together with statistical information about the entire population, can be
aggregated to obtain labels on unseen examples. We validate our approach using
labeled data from the political domain.
We provide two applications of the proposed algorithm to the medical domain.
In the first, we demonstrate how to identify users whose search patterns
indicate they might be suffering from certain types of cancer. In the second,
we detail an algorithm to predict the distribution of diseases given their
incidence in a subset of the population at study, making it possible to predict
disease spread from partial epidemiological data
Towards Automatic Evaluation of Health-Related CQA Data
The paper reports on evaluation of Russian community question answering (CQA) data in health domain. About 1,500 question-answer pairs were manually evaluated by medical professionals, in addition automatic evaluation based on reference disease-medicine pairs was performed. Although the results of the manual and automatic evaluation do not fully match, we find the method still promising and propose several improvements. Automatic processing can be used to dynamically monitor the quality of the CQA content and to compare different data sources. Moreover, the approach can be useful for symptomatic surveillance and health education campaigns.This work is partially supported by the Russian Foundation for Basic Research, project #14-07-00589 “Data Analysis and User Modelling in Narrow-Domain Social Media”. We also thank assessors who volunteered for the evaluation and Mail.Ru for granting us access to the data
Characterizing Transgender Health Issues in Twitter
Although there are millions of transgender people in the world, a lack of
information exists about their health issues. This issue has consequences for
the medical field, which only has a nascent understanding of how to identify
and meet this population's health-related needs. Social media sites like
Twitter provide new opportunities for transgender people to overcome these
barriers by sharing their personal health experiences. Our research employs a
computational framework to collect tweets from self-identified transgender
users, detect those that are health-related, and identify their information
needs. This framework is significant because it provides a macro-scale
perspective on an issue that lacks investigation at national or demographic
levels. Our findings identified 54 distinct health-related topics that we
grouped into 7 broader categories. Further, we found both linguistic and
topical differences in the health-related information shared by transgender men
(TM) as com-pared to transgender women (TW). These findings can help inform
medical and policy-based strategies for health interventions within transgender
communities. Also, our proposed approach can inform the development of
computational strategies to identify the health-related information needs of
other marginalized populations
Topical Mining of malaria Using Social Media. A Text Mining Approach
Malaria is a life-threatening parasitic disease, common in subtropical and tropical climates caused by mosquitoes. Each year, several hundred thousand of people die from malaria infections. However, with the rapid growth, popularity and global reach of social media usage, a myriad of opportunities arises for extracting opinions and discourses on various topics and issues. This research examines the public discourse, trends and emergent themes surrounding malaria discussion. We query Twitter corpus leveraging text mining algorithms to extract and analyze topical themes. Further, to investigate these dynamics, we use Crimson social media analytics software to analyze topical emergent themes and monitor malaria trends. The findings reveal the discovery of pertinent topics and themes regarding malaria discourses. The implications include shedding insights to public health officials on sentiments and opinions shaping public discourse on malaria epidemic. The multi-dimensional analysis of data provides directions for future research and informs public policy decisions
Effective Feature Representation for Clinical Text Concept Extraction
Crucial information about the practice of healthcare is recorded only in
free-form text, which creates an enormous opportunity for high-impact NLP.
However, annotated healthcare datasets tend to be small and expensive to
obtain, which raises the question of how to make maximally efficient uses of
the available data. To this end, we develop an LSTM-CRF model for combining
unsupervised word representations and hand-built feature representations
derived from publicly available healthcare ontologies. We show that this
combined model yields superior performance on five datasets of diverse kinds of
healthcare text (clinical, social, scientific, commercial). Each involves the
labeling of complex, multi-word spans that pick out different healthcare
concepts. We also introduce a new labeled dataset for identifying the treatment
relations between drugs and diseases
- …