Search CORE

71 research outputs found

Social media mining for toxicovigilance of prescription medications: End-to-end pipeline, challenges and future work

Author: Sarker Abeed
Publication venue
Publication date: 02/09/2023
Field of study

Substance use, substance use disorder, and overdoses related to substance use are major public health problems globally and in the United States. A key aspect of addressing these problems from a public health standpoint is improved surveillance. Traditional surveillance systems are laggy, and social media are potentially useful sources of timely data. However, mining knowledge from social media is challenging, and requires the development of advanced artificial intelligence, specifically natural language processing (NLP) and machine learning methods. We developed a sophisticated end-to-end pipeline for mining information about nonmedical prescription medication use from social media, namely Twitter and Reddit. Our pipeline employs supervised machine learning and NLP for filtering out noise and characterizing the chatter. In this paper, we describe our end-to-end pipeline developed over four years. In addition to describing our data mining infrastructure, we discuss existing challenges in social media mining for toxicovigilance, and possible future research directions

arXiv.org e-Print Archive

Enhancing Drug Overdose Mortality Surveillance through Natural Language Processing and Machine Learning

Author: Ward Patrick J.
Publication venue: UKnowledge
Publication date: 01/01/2021
Field of study

Epidemiological surveillance is key to monitoring and assessing the health of populations. Drug overdose surveillance has become an increasingly important part of public health practice as overdose morbidity and mortality has increased due in large part to the opioid crisis. Monitoring drug overdose mortality relies on death certificate data, which has several limitations including timeliness and the coding structure used to identify specific substances that caused death. These limitations stem from the need to analyze the free-text cause-of-death sections of the death certificate that are completed by the medical certifier during death investigation. Other fields, including clinical sciences, have utilized natural language processing (NLP) methods to gain insight from free-text data, but thus far, adoption of NLP methods in epidemiological surveillance has been limited. Through a narrative review of NLP methods currently used in public health surveillance and the integration of two NLP tasks, classification and named entity recognition, this dissertation enhances the capabilities of public health practitioners and researchers to perform drug overdose mortality surveillance. This dissertation advances both surveillance science and public health practice by integrating methods from bioinformatics into the surveillance pipeline which provides more timely and increased quality overdose mortality surveillance, which is essential to guiding effective public health response to the continuing drug overdose epidemic

University of Kentucky

Systematic review on the prevalence, frequency and comparative value of adverse events data in social media

Author: Abou Taam
Abou Taam
Ahlwardt
Allen
Belaise
Benton
Beusterien
Beusterien
Bosley
Broniatowski
Brownstein
Butt
Butt
Carneiro
Chew
Chunara
Cox
Dasgupta
Denecke
Dreyfus
Duh
Freifeld
Freifeld
Freifeld
Frost
Gesualdo
Goodman
Hanson
Heaivilin
Hogan
Huesch
Hughes
IMS Health
Jashinsky
Karimi
Kheloufi
Knezevic
Macias
Mao
Medawar
Mishra
Moncrieff
Nicholson
Pages
Pages
Paul
Pestello
Rizo
Sampathkumar
Sampathkumar
Sarker
Sarrazin
Scanfeld
Schroder
Signorini
Sillence
Skea
St Louis
Street
Tan
Visible Tecnhnologies
Wicks
Wu
Yakushev
Yanagisawa
Yanagisawa
Yang
Yeleswarapu
Publication venue: 'Wiley'
Publication date: 01/08/2015
Field of study

Aim: The aim of this review was to summarize the prevalence, frequency and comparative value of information on the adverse events of healthcare interventions from user comments and videos in social media. Methods: A systematic review of assessments of the prevalence or type of information on adverse events in social media was undertaken. Sixteen databases and two internet search engines were searched in addition to handsearching, reference checking and contacting experts. The results were sifted independently by two researchers. Data extraction and quality assessment were carried out by one researcher and checked by a second. The quality assessment tool was devised in-house and a narrative synthesis of the results followed. Results: From 3064 records, 51 studies met the inclusion criteria. The studies assessed over 174 social media sites with discussion forums (71%) being the most popular. The overall prevalence of adverse events reports in social media varied from 0.2% to 8% of posts. Twenty-nine studies compared the results from searching social media with using other data sources to identify adverse events. There was general agreement that a higher frequency of adverse events was found in social media and that this was particularly true for ‘symptom’ related and ‘mild’ adverse events. Those adverse events that were under-represented in social media were laboratory-based and serious adverse events. Conclusions: Reports of adverse events are identifiable within social media. However, there is considerable heterogeneity in the frequency and type of events reported, and the reliability or validity of the data has not been thoroughly evaluated

Crossref

PubMed Central

The University of Manchester - Institutional Repository

University of East Anglia digital repository

Identification and characterization of diseases on social web

Author: Sofean Mustafa
Publication venue: Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2017
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Characterizing Information Seeking Events in Health-Related Social Discourse

Author: Basak Madhusudan
Borodovsky Jacob T.
Bradham Alphonso
Lord Sarah E.
Parvin Tanzia
Preum Sarah Masud
Scharfstein Ava
Sharif Omar
Publication venue
Publication date: 17/08/2023
Field of study

Social media sites have become a popular platform for individuals to seek and share health information. Despite the progress in natural language processing for social media mining, a gap remains in analyzing health-related texts on social discourse in the context of events. Event-driven analysis can offer insights into different facets of healthcare at an individual and collective level, including treatment options, misconceptions, knowledge gaps, etc. This paper presents a paradigm to characterize health-related information-seeking in social discourse through the lens of events. Events here are board categories defined with domain experts that capture the trajectory of the treatment/medication. To illustrate the value of this approach, we analyze Reddit posts regarding medications for Opioid Use Disorder (OUD), a critical global health concern. To the best of our knowledge, this is the first attempt to define event categories for characterizing information-seeking in OUD social discourse. Guided by domain experts, we develop TREAT-ISE, a novel multilabel treatment information-seeking event dataset to analyze online discourse on an event-based framework. This dataset contains Reddit posts on information-seeking events related to recovery from OUD, where each post is annotated based on the type of events. We also establish a strong performance benchmark (77.4% F1 score) for the task by employing several machine learning and deep learning classifiers. Finally, we thoroughly investigate the performance and errors of ChatGPT on this task, providing valuable insights into the LLM's capabilities and ongoing characterization efforts.Comment: Under review AAAI-2024. 10 pages, 6 tables, 2 figue

arXiv.org e-Print Archive

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

Author: Abeed Sarker
Anthony Rios
Berry de Bruijn
Debanjan Mahata
Farrokh Mehryary
Filip Ginter
Goran Nenadic
Graciela Gonzalez-Hernandez
Jasper Friedrichs
Kai Hakala
Maksim Belousov
Ramakanth Kavuluru
Saif M. Mohammad
Sifei Han
Svetlana Kiritchenko
Tung Tran
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/10/2022
Field of study

Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).</div

UTUPub

Front-Line Physicians' Satisfaction with Information Systems in Hospitals

Author: Junttila Kristiina
Peltonen Laura-Maria
Salanterä Sanna
Publication venue: 'IOS Press'
Publication date: 01/01/2018
Field of study

Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Biomedical Information Extraction Pipelines for Public Health in the Age of Deep Learning

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an important role in the implementation and adoption of applications in areas such as public health. Advancements in machine learning and deep learning techniques have enabled rapid development of such pipelines. This dissertation presents entity extraction pipelines for two public health applications: virus phylogeography and pharmacovigilance. For virus phylogeography, geographical locations are extracted from biomedical scientific texts for metadata enrichment in the GenBank database containing 2.9 million virus nucleotide sequences. For pharmacovigilance, tools are developed to extract adverse drug reactions from social media posts to open avenues for post-market drug surveillance from non-traditional sources. Across these pipelines, high variance is observed in extraction performance among the entities of interest while using state-of-the-art neural network architectures. To explain the variation, linguistic measures are proposed to serve as indicators for entity extraction performance and to provide deeper insight into the domain complexity and the challenges associated with entity extraction. For both the phylogeography and pharmacovigilance pipelines presented in this work the annotated datasets and applications are open source and freely available to the public to foster further research in public health.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

ASU Digital Repository

Deep learning for pollen allergy surveillance from twitter in Australia

Author: Du Jiahua
Michalska Sandra
Rong Jia
Subramani Sudha
Wang Hua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/11/2019
Field of study

Victoria University Eprints Repository