Search CORE

4,142 research outputs found

Social media mining for identification and exploration of health-related information from pregnant women

Author: Chandrashekar Pramod Bharadwaj
Magge Arjun
Sarker Abeed
Gonzalez Graciela
Publication venue
Publication date: 01/01/1989
Field of study

Widespread use of social media has led to the generation of substantial amounts of information about individuals, including health-related information. Social media provides the opportunity to study health-related information about selected population groups who may be of interest for a particular study. In this paper, we explore the possibility of utilizing social media to perform targeted data collection and analysis from a particular population group -- pregnant women. We hypothesize that we can use social media to identify cohorts of pregnant women and follow them over time to analyze crucial health-related information. To identify potentially pregnant women, we employ simple rule-based searches that attempt to detect pregnancy announcements with moderate precision. To further filter out false positives and noise, we employ a supervised classifier using a small number of hand-annotated data. We then collect their posts over time to create longitudinal health timelines and attempt to divide the timelines into different pregnancy trimesters. Finally, we assess the usefulness of the timelines by performing a preliminary analysis to estimate drug intake patterns of our cohort at different trimesters. Our rule-based cohort identification technique collected 53,820 users over thirty months from Twitter. Our pregnancy announcement classification technique achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user timelines. Analysis of the timelines revealed that pertinent health-related information, such as drug-intake and adverse reactions can be mined from the data. Our approach to using user timelines in this fashion has produced very encouraging results and can be employed for other important tasks where cohorts, for which health-related information may not be available from other sources, are required to be followed over time to derive population-based estimates.Comment: 9 page

arXiv.org e-Print Archive

Identificação e análise de estados de saúde em mensagens do twitter

Author: Morais Edgar Guilherme Silva
Publication venue
Publication date: 21/11/2022
Field of study

Social media has become very widely used all over the world for its ability to connect people from different countries and create global communities. One of the most prominent social media platforms is Twitter. Twitter is a platform where users can share text segments with a maximum length of 280 characters. Due to the nature of the platform, it generates very large amounts of text data about its users’ lives. This data can be used to extract health information about a segment of the population for the purpose of public health surveillance. Social Media Mining for Health Shared Task is a challenge that encompasses many Natural Language Processing tasks related to the use of social media data for health research purposes. This dissertation describes the approach I used in my participation in the Social Media Mining for Health Shared Task. I participated in task 1 of the Shared Task. This task was divided into three subtasks. Subtask 1a consisted of the classification of Tweets regarding the presence of Adverse Drug Events. Subtask 1b was a Named Entity Recognition task that aimed at detecting Adverse Drug Effect spans in tweets. Subtask 1c was a normalization task that sought to match an Adverse Drug Event mention to a Medical Dictionary for Regulatory Activities preferred term ID. Toward discovering the best approach for each of the subtasks I made many experiments with different models and techniques to distinguish the ones that were more suited for each subtask. To solve these subtasks, I used transformer-based models as well as other techniques that aim at solving the challenges present in each of the subtasks. The best-performing approach for subtask 1a was a BERTweet large model trained with an augmented training set. As for subtask 1b, the best results were obtained through a RoBERTa large model with oversampled training data. Regarding subtask 1c, I used a RoBERTa base model trained with data from an additional dataset beyond the one made available by the shared task organizers. The systems used for subtasks 1a and 1b both achieved state-of-the-art performance, however, the approach for the third subtask was not able to achieve favorable results. The system used in subtask 1a achieved an F1 score of 0.698, the one used in subtask 1b achieved a relaxed F1 score of 0.661, and the one used in the final subtask achieved a relaxed F1 score of 0.116.As redes sociais tornaram-se muito utilizadas por todo o mundo, permitindo ligar pessoas de diferentes países e criar comunidades globais. O Twitter, uma das redes sociais mais populares, permite que os seus utilizadores partilhem segmentos curtos de texto com um máximo de 280 caracteres. Esta partilha na rede gera uma enorme quantidade de dados sobre os seus utilizadores, podendo ser analisados sobre múltiplas perspetivas. Por exemplo, podem ser utilizados para extrair informação sobre a saúde de um segmento da população tendo em vista a vigilância de saúde pública. O objetivo deste trabalho foi a investigação e o desenvolvimento de soluções técnicas para participar no “Social Media Mining for Health Shared Task” (#SMM4H), um desafio constituído por diversas tarefas de processamento de linguagem natural relacionadas com o uso de dados provenientes de redes sociais para o propósito de investigação na área da saúde. O trabalho envolveu o desenvolvimento de modelos baseados em transformadores e outras técnicas relacionadas, para participação na tarefa 1 deste desafio, que por sua vez está dividida em 3 subtarefas: 1a) classificação de tweets relativamente à presença ou não de eventos adversos de medicamentos (ADE); 1b) reconhecimento de entidades com o objetivo de detetar menções de ADE; 1c) tarefa de normalização com o objetivo de associar as menções de ADE ao termo MedDRA correspondente (“Medical Dictionary for Regulatory Activities”). A abordagem com melhor desempenho na tarefa 1a foi um modelo BERTweet large treinado com dados gerados através de um processo de data augmentation. Relativamente à tarefa 1b, os melhores resultados foram obtidos usando um modelo RoBERTa large com dados de treino sobreamostrados. Na tarefa 1c utilizou-se um modelo RoBERTa base treinado com dados adicionais provenientes de um conjunto de dados externo. A abordagem utilizada na terceira tarefa não conseguiu alcançar resultados relevantes (F1 de 0.12), enquanto que os sistemas desenvolvidos para as duas primeiras alcançaram resultados ao nível dos melhores do desafio (F1 de 0.69 e 0.66 respetivamente).Mestrado em Engenharia Informátic

Repositório Institucional da Universidade de Aveiro

Harnessing Machine Learning to Improve Healthcare Monitoring with FAERS

Author: Jing Zhang
Kamaraj Santhosh
Publication venue: ResearchBerg
Publication date: 16/11/2022
Field of study

This research study investigates the potential of machine learning techniques to improve healthcare monitoring through the utilization of data from the FDA Adverse Event Reporting System (FAERS). The objective is to explore specific applications of machine learning in healthcare monitoring with FAERS and highlight their findings. The study reveals several significant ways in which machine learning can contribute to enhancing healthcare monitoring using FAERS.Machine learning algorithms can detect potential safety signals at an early stage by analyzing FAERS data. By employing anomaly detection and temporal pattern analysis techniques, these models can identify emerging safety concerns that were previously unknown or underreported. This early detection enables timely action to mitigate risks associated with medications or medical products.Machine learning models can assist in pharmacovigilance triage, addressing the challenge posed by the large number of adverse event reports within FAERS. By developing ranking and classification models, adverse events can be prioritized based on severity, novelty, or potential impact. This automation of the triage process enables pharmacovigilance teams to efficiently identify and investigate critical safety concerns.Machine learning models can automate the classification and coding of adverse events, which are often present in unstructured text within FAERS reports. Through the application of Natural Language Processing (NLP) techniques, such as named entity recognition and text classification, relevant information can be extracted, enhancing the efficiency and accuracy of adverse event coding.Machine learning algorithms can refine and validate signals generated from FAERS data by incorporating additional data sources, such as electronic health records, social media, or clinical trials data. This integration provides a more comprehensive understanding of potential risks and helps filter out false positives, facilitating the identification of signals requiring further investigation.Machine learning enables real-time surveillance of FAERS data, allowing for the identification of safety concerns as they occur. Continuous monitoring and real-time analysis of incoming reports enable machine learning models to trigger alerts or notifications to relevant stakeholders, promoting timely intervention to minimize patient harm.The study demonstrates the use of machine learning models to conduct comparative safety analyses by combining FAERS data with other healthcare databases. These models assist in identifying safety differences between medications, patient populations, or dosing regimens, enabling healthcare providers and regulators to make informed decisions regarding treatment choices.While machine learning is a powerful tool in healthcare monitoring, its implementation should be complemented by human expertise and domain knowledge. The interpretation and validation of results generated by machine learning models necessitate the involvement of healthcare professionals and pharmacovigilance experts to ensure accurate and meaningful insights.This research study illustrates the diverse applications of machine learning in improving healthcare monitoring using FAERS data. The findings highlight the potential of machine learning in early safety signal detection, pharmacovigilance triage, adverse event classification and coding, signal refinement and validation, real-time surveillance and alerting, and comparative safety analysis. The study emphasizes the importance of combining machine learning with human expertise to achieve effective and reliable healthcare monitoring

Automated data analysis of unstructured grey literature in health research: A mapping review

Author: Bacardit J
Craig D
Meader N
Mohamed S
Schmidt L
Publication venue: John Wiley and Sons Ltd
Publication date: 01/01/2023
Field of study

\ua9 2023 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd. The amount of grey literature and ‘softer’ intelligence from social media or websites is vast. Given the long lead-times of producing high-quality peer-reviewed health information, this is causing a demand for new ways to provide prompt input for secondary research. To our knowledge, this is the first review of automated data extraction methods or tools for health-related grey literature and soft data, with a focus on (semi)automating horizon scans, health technology assessments (HTA), evidence maps, or other literature reviews. We searched six databases to cover both health- and computer-science literature. After deduplication, 10% of the search results were screened by two reviewers, the remainder was single-screened up to an estimated 95% sensitivity; screening was stopped early after screening an additional 1000 results with no new includes. All full texts were retrieved, screened, and extracted by a single reviewer and 10% were checked in duplicate. We included 84 papers covering automation for health-related social media, internet fora, news, patents, government agencies and charities, or trial registers. From each paper, we extracted data about important functionalities for users of the tool or method; information about the level of support and reliability; and about practical challenges and research gaps. Poor availability of code, data, and usable tools leads to low transparency regarding performance and duplication of work. Financial implications, scalability, integration into downstream workflows, and meaningful evaluations should be carefully planned before starting to develop a tool, given the vast amounts of data and opportunities those tools offer to expedite research