873 research outputs found

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    Novel Natural Language Processing Models for Medical Terms and Symptoms Detection in Twitter

    Get PDF
    This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected 7.5 million data from August 2015 to March 2016. This work leveraged a longstanding, multidisciplinary collaboration between researchers at the Population & Center for Interventions, Treatment, and Addictions Research (CITAR) in the Boonshoft School of Medicine and the Department of Computer Science and Engineering. In addition, we aimed to develop and deploy an innovative prediction analysis algorithm for eDrugTrends, capable of semi-automated processing of Twitter data to identify emerging trends in cannabis and synthetic cannabinoid use in the U.S. In addition, the study included aim four, a use case study defined by tweets content analyzing PLWH, medication patterns, and identifying keyword trends via Twitter-based, user-generated content. This case study leveraged a multidisciplinary collaboration between researchers at the Departments of Family Medicine and Population and Public Health Sciences at Wright State University’s Boonshoft School of Medicine and the Department of Computer Science and Engineering. We collected 65K data from February 2022 to July 2022 with the U.S.-based HIV knowledge domain recruited via the Twitter API streaming platform. For knowledge discovery, domain knowledge plays a significant role in powering many intelligent frameworks, such as data analysis, information retrieval, and pattern recognition. Recent NLP and semantic web advances have contributed to extending the domain knowledge of medical terms. These techniques required a bag of seeds for medical knowledge discovery. Various initiate seeds create irrelevant data to the noise and negatively impact the prediction analysis performance. The methodology of aim one, PatRDis classifier, applied for noisy and ambiguous issues, and aim two, DsOn Ontology model, applied for semantic parsing and enriching the online medical to classify the data for HIV care medications engagement and symptom detection from Twitter. By applying the methodology of aims 2 and 3, we solved the challenges of ambiguity and explored more than 1500 cannabis and cannabinoid slang terms. Sentiments measured preceding the election, such as states with high levels of positive sentiment preceding the election who were engaged in enhancing their legalization status. we also used the same dataset for prediction analysis for marijuana legalization and consumption trend analysis (Ohio public polling data). In Aim 4, we applied three experiments, ensemble-learning, the RNN-LSM, the NNBERT-CNN models, and five techniques to determine the tweets associated with medication adherence and HIV symptoms. The long short-term memory (LSTM) model and the CNN for sentence classification produce accurate results and have been recently used in NLP tasks. CNN models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long-term dependencies between word sequences hence are better used for text classification. We propose attention-based RNN, MLP, and CNN deep learning models that capitalize on the advantages of LSTM and BERT techniques with an additional attention mechanism. We trained the model using NNBERT to evaluate the proposed model\u27s performance. The test results showed that the proposed models produce more accurate classification results, and BERT obtained higher recall and F1 scores than MLP or LSTM models. In addition, We developed an intelligent tool capable of automated processing of Twitter data to identify emerging trends in HIV disease, HIV symptoms, and medication adherence

    Enhancing Drug Overdose Mortality Surveillance through Natural Language Processing and Machine Learning

    Get PDF
    Epidemiological surveillance is key to monitoring and assessing the health of populations. Drug overdose surveillance has become an increasingly important part of public health practice as overdose morbidity and mortality has increased due in large part to the opioid crisis. Monitoring drug overdose mortality relies on death certificate data, which has several limitations including timeliness and the coding structure used to identify specific substances that caused death. These limitations stem from the need to analyze the free-text cause-of-death sections of the death certificate that are completed by the medical certifier during death investigation. Other fields, including clinical sciences, have utilized natural language processing (NLP) methods to gain insight from free-text data, but thus far, adoption of NLP methods in epidemiological surveillance has been limited. Through a narrative review of NLP methods currently used in public health surveillance and the integration of two NLP tasks, classification and named entity recognition, this dissertation enhances the capabilities of public health practitioners and researchers to perform drug overdose mortality surveillance. This dissertation advances both surveillance science and public health practice by integrating methods from bioinformatics into the surveillance pipeline which provides more timely and increased quality overdose mortality surveillance, which is essential to guiding effective public health response to the continuing drug overdose epidemic

    A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs

    Get PDF
    While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and 'intelligible constructs' not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. We present a hybrid approach to domain specific information retrieval (or knowledge-aware search) that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of certain specific expressions that belong to such patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE developed for prescription drug abuse epidemiology. Keywords: Knowledge-Aware Search, Ontology, Semantic Search, Background Knowledge, Context-Free GrammarComment: Accepted for publication: Journal of Web Semantics, Elsevie

    Mining social media data for biomedical signals and health-related behavior

    Full text link
    Social media data has been increasingly used to study biomedical and health-related phenomena. From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance, sentiment analysis especially for mental health, and other areas. We also discuss a variety of innovative uses of social media data for health-related applications and important limitations in social media data access and use.Comment: To appear in the Annual Review of Biomedical Data Scienc

    Early-stage pregnancy recognition on microblogs: Machine learning and lexicon-based approaches

    Get PDF
    Pregnancy carries high medical and psychosocial risks that could lead pregnant women to experience serious health consequences. Providing protective measures for pregnant women is one of the critical tasks during the pregnancy period. This study proposes an emotion-based mechanism to detect the early stage of pregnancy using real-time data from Twitter. Pregnancy-related emotions (e.g., anger, fear, sadness, joy, and surprise) and polarity (positive and negative) were extracted from users' tweets using NRC Affect Intensity Lexicon and SentiStrength techniques. Then, pregnancy-related terms were extracted and mapped with pregnancy-related sentiments using part-of-speech tagging and association rules mining techniques. The results showed that pregnancy tweets contained high positivity, as well as significant amounts of joy, sadness, and fear. The classification results demonstrated the possibility of using users’ sentiments for early-stage pregnancy recognition on microblogs. The proposed mechanism offers valuable insights to healthcare decision-makers, allowing them to develop a comprehensive understanding of users' health status based on social media posts

    Utilizing Consumer Health Posts for Pharmacovigilance: Identifying Underlying Factors Associated with Patients’ Attitudes Towards Antidepressants

    Get PDF
    Non-adherence to antidepressants is a major obstacle to antidepressants therapeutic benefits, resulting in increased risk of relapse, emergency visits, and significant burden on individuals and the healthcare system. Several studies showed that non-adherence is weakly associated with personal and clinical variables, but strongly associated with patients’ beliefs and attitudes towards medications. The traditional methods for identifying the key dimensions of patients’ attitudes towards antidepressants are associated with some methodological limitations, such as concern about confidentiality of personal information. In this study, attempts have been made to address the limitations by utilizing patients’ self report experiences in online healthcare forums to identify underlying factors affecting patients attitudes towards antidepressants. The data source of the study was a healthcare forum called “askapatients.com”. 892 patients’ reviews were randomly collected from the forum for the four most commonly prescribed antidepressants including Sertraline (Zoloft) and Escitalopram (Lexapro) from SSRI class, and Venlafaxine (Effexor) and duloxetine (Cymbalta) from SNRI class. Methodology of this study is composed of two main phases: I) generating structured data from unstructured patients’ drug reviews and testing hypotheses concerning attitude, II) identification and normalization of Adverse Drug Reactions (ADRs), Withdrawal Symptoms (WDs) and Drug Indications (DIs) from the posts, and mapping them to both The UMLS and SNOMED CT concepts. Phase II also includes testing the association between ADRs and attitude. The result of the first phase of this study showed that “experience of adverse drug reactions”, “perceived distress received from ADRs”, “lack of knowledge about medication’s mechanism”, “withdrawal experience”, “duration of usage”, and “drug effectiveness” are strongly associated with patients attitudes. However, demographic variables including “age” and “gender” are not associated with attitude. Analysis of the data in second phase of the study showed that from 6,534 identified entities, 73% are ADRs, 12% are WDs, and 15 % are drug indications. In addition, psychological and cognitive expressions have higher variability than physiological expressions. All three types of entities were mapped to 811 UMLS and SNOMED CT concepts. Testing the association between ADRs and attitude showed that from twenty-one physiological ADRs specified in the ASEC questionnaire, “dry mouth”, “increased appetite”, “disorientation”, “yawning”, “weight gain”, and “problem with sexual dysfunction” are associated with attitude. A set of psychological and cognitive ADRs, such as “emotional indifference” and “memory problem were also tested that showed significance association between these types of ADRs and attitude. The findings of this study have important implications for designing clinical interventions aiming to improve patients\u27 adherence towards antidepressants. In addition, the dataset generated in this study has significant implications for improving performance of text-mining algorithms aiming to identify health related information from consumer health posts. Moreover, the dataset can be used for generating and testing hypotheses related to ADRs associated with psychiatric mediations, and identifying factors associated with discontinuation of antidepressants. The dataset and guidelines of this study are available at https://sites.google.com/view/pharmacovigilanceinpsychiatry/hom
    • …
    corecore