52 research outputs found
Social media mining for toxicovigilance of prescription medications: End-to-end pipeline, challenges and future work
Substance use, substance use disorder, and overdoses related to substance use
are major public health problems globally and in the United States. A key
aspect of addressing these problems from a public health standpoint is improved
surveillance. Traditional surveillance systems are laggy, and social media are
potentially useful sources of timely data. However, mining knowledge from
social media is challenging, and requires the development of advanced
artificial intelligence, specifically natural language processing (NLP) and
machine learning methods. We developed a sophisticated end-to-end pipeline for
mining information about nonmedical prescription medication use from social
media, namely Twitter and Reddit. Our pipeline employs supervised machine
learning and NLP for filtering out noise and characterizing the chatter. In
this paper, we describe our end-to-end pipeline developed over four years. In
addition to describing our data mining infrastructure, we discuss existing
challenges in social media mining for toxicovigilance, and possible future
research directions
Social media mining for identification and exploration of health-related information from pregnant women
Widespread use of social media has led to the generation of substantial
amounts of information about individuals, including health-related information.
Social media provides the opportunity to study health-related information about
selected population groups who may be of interest for a particular study. In
this paper, we explore the possibility of utilizing social media to perform
targeted data collection and analysis from a particular population group --
pregnant women. We hypothesize that we can use social media to identify cohorts
of pregnant women and follow them over time to analyze crucial health-related
information. To identify potentially pregnant women, we employ simple
rule-based searches that attempt to detect pregnancy announcements with
moderate precision. To further filter out false positives and noise, we employ
a supervised classifier using a small number of hand-annotated data. We then
collect their posts over time to create longitudinal health timelines and
attempt to divide the timelines into different pregnancy trimesters. Finally,
we assess the usefulness of the timelines by performing a preliminary analysis
to estimate drug intake patterns of our cohort at different trimesters. Our
rule-based cohort identification technique collected 53,820 users over thirty
months from Twitter. Our pregnancy announcement classification technique
achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user
timelines. Analysis of the timelines revealed that pertinent health-related
information, such as drug-intake and adverse reactions can be mined from the
data. Our approach to using user timelines in this fashion has produced very
encouraging results and can be employed for other important tasks where
cohorts, for which health-related information may not be available from other
sources, are required to be followed over time to derive population-based
estimates.Comment: 9 page
Extractive Summarisation of Medical Documents
Background Evidence Based Medicine (EBM) practice requires practitioners to extract evidence from published medical research when answering clinical queries. Due to the time-consuming nature of this practice, there is a strong motivation for systems that can automatically summarise medical documents and help practitioners find relevant information. Aim The aim of this work is to propose an automatic query-focused, extractive summarisation approach that selects informative sentences from medical documents. MethodWe use a corpus that is specifically designed for summarisation in the EBM domain. We use approximately half the corpus for deriving important statistics associated with the best possible extractive summaries. We take into account factors such as sentence position, length, sentence content, and the type of the query posed. Using the statistics from the first set, we evaluate our approach on a separate set. Evaluation of the qualities of the generated summaries is performed automatically using ROUGE, which is a popular tool for evaluating automatic summaries. Results Our summarisation approach outperforms all baselines (best baseline score: 0.1594; our score 0.1653). Further improvements are achieved when query types are taken into account. Conclusion The quality of extractive summarisation in the medical domain can be significantly improved by incorporating domain knowledge and statistics derived from a specialised corpus. Such techniques can therefore be applied for content selection in end-to-end summarisation systems
Extractive Summarisation of Medical Documents
Background Evidence Based Medicine (EBM) practice requires practitioners to extract evidence from published medical research when answering clinical queries. Due to the time-consuming nature of this practice, there is a strong motivation for systems that can automatically summarise medical documents and help practitioners find relevant information. Aim The aim of this work is to propose an automatic query-focused, extractive summarisation approach that selects informative sentences from medical documents. Method We use a corpus that is specifically designed for summarisation in the EBM domain. We use approximately half the corpus for deriving important statistics associated with the best possible extractive summaries. We take into account factors such as sentence position, length, sentence content, and the type of the query posed. Using the statistics from the first set, we evaluate our approach on a separate set. Evaluation of the qualities of the generated summaries is performed automatically using ROUGE, which is a popular tool for evaluating automatic summaries. Results Our summarisation approach outperforms all baselines (best baseline score: 0.1594; our score 0.1653). Further improvements are achieved when query types are taken into account. Conclusion The quality of extractive summarisation in the medical domain can be significantly improved by incorporating domain knowledge and statistics derived from a specialised corpus. Such techniques can therefore be applied for content selection in end-to-end summarisation systems
Public and Political Opinion on Medicaid
Medicaid has long been a political litmus test and a target for substantial programmatic changes. But what does the public feel about Medicaid, especially during a pandemic? In this study, the authors analyze more than one million Medicaid-related tweets from December 1, 2018 to September 30, 2020. They found a high volume of political posts on Twitter around Medicaid topics, peaking in January 2020 in the context of news about Medicaid expansion and the prior administration’s Medicaid block grant proposal. As the pandemic hit, the number of Twitter posts about Medicaid and the pandemic increased, and the volume of political tweets on other Medicaid topics dropped. The posts themselves also appeared to be less polarized. These patterns suggest that when the public sees Medicaid operate as a safety net, the program is far less polarizing than partisan politics might indicate. Highlighting Medicaid’s role during the pandemic could help strengthen public support for the program in non-crisis times and better position it to respond to future economic downturns
- …