1,256 research outputs found

    Novel Natural Language Processing Models for Medical Terms and Symptoms Detection in Twitter

    Get PDF
    This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected 7.5 million data from August 2015 to March 2016. This work leveraged a longstanding, multidisciplinary collaboration between researchers at the Population & Center for Interventions, Treatment, and Addictions Research (CITAR) in the Boonshoft School of Medicine and the Department of Computer Science and Engineering. In addition, we aimed to develop and deploy an innovative prediction analysis algorithm for eDrugTrends, capable of semi-automated processing of Twitter data to identify emerging trends in cannabis and synthetic cannabinoid use in the U.S. In addition, the study included aim four, a use case study defined by tweets content analyzing PLWH, medication patterns, and identifying keyword trends via Twitter-based, user-generated content. This case study leveraged a multidisciplinary collaboration between researchers at the Departments of Family Medicine and Population and Public Health Sciences at Wright State University’s Boonshoft School of Medicine and the Department of Computer Science and Engineering. We collected 65K data from February 2022 to July 2022 with the U.S.-based HIV knowledge domain recruited via the Twitter API streaming platform. For knowledge discovery, domain knowledge plays a significant role in powering many intelligent frameworks, such as data analysis, information retrieval, and pattern recognition. Recent NLP and semantic web advances have contributed to extending the domain knowledge of medical terms. These techniques required a bag of seeds for medical knowledge discovery. Various initiate seeds create irrelevant data to the noise and negatively impact the prediction analysis performance. The methodology of aim one, PatRDis classifier, applied for noisy and ambiguous issues, and aim two, DsOn Ontology model, applied for semantic parsing and enriching the online medical to classify the data for HIV care medications engagement and symptom detection from Twitter. By applying the methodology of aims 2 and 3, we solved the challenges of ambiguity and explored more than 1500 cannabis and cannabinoid slang terms. Sentiments measured preceding the election, such as states with high levels of positive sentiment preceding the election who were engaged in enhancing their legalization status. we also used the same dataset for prediction analysis for marijuana legalization and consumption trend analysis (Ohio public polling data). In Aim 4, we applied three experiments, ensemble-learning, the RNN-LSM, the NNBERT-CNN models, and five techniques to determine the tweets associated with medication adherence and HIV symptoms. The long short-term memory (LSTM) model and the CNN for sentence classification produce accurate results and have been recently used in NLP tasks. CNN models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long-term dependencies between word sequences hence are better used for text classification. We propose attention-based RNN, MLP, and CNN deep learning models that capitalize on the advantages of LSTM and BERT techniques with an additional attention mechanism. We trained the model using NNBERT to evaluate the proposed model\u27s performance. The test results showed that the proposed models produce more accurate classification results, and BERT obtained higher recall and F1 scores than MLP or LSTM models. In addition, We developed an intelligent tool capable of automated processing of Twitter data to identify emerging trends in HIV disease, HIV symptoms, and medication adherence

    Nowcasting user behaviour with social media and smart devices on a longitudinal basis: from macro- to micro-level modelling

    Get PDF
    The adoption of social media and smart devices by millions of users worldwide over the last decade has resulted in an unprecedented opportunity for NLP and social sciences. Users publish their thoughts and opinions on everyday issues through social media platforms, while they record their digital traces through their smart devices. Mining these rich resources offers new opportunities in sensing real-world events and indices (e.g., political preference, mental health indices) in a longitudinal fashion, either at the macro (population)-, or at the micro(user)-level. The current project aims at developing approaches to “nowcast" (predict the current state of) such indices at both levels of granularity. First, we build natural language resources for the static tasks of sentiment analysis, emotion disclosure and sarcasm detection over user-generated content. These are important for opinion monitoring on a large scale. Second, we propose a general approach that leverages textual data derived from generic social media streams to nowcast political indices at the macro-level. Third, we leverage temporally sensitive and asynchronous information to nowcast the political stance of social media users, at the micro-level using multiple kernel learning. We then focus further on the micro-level modelling, to account for heterogeneous data sources, such as information derived from users' smart phones, SMS and social media messages, to nowcast time-varying mental health indices of a small cohort of users on a longitudinal basis. Finally, we present the challenges faced when applying such micro-level approaches in a real-world setting and propose directions for future research

    Real-time context-based sound and color extraction from text

    Get PDF
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (page 69).Narratarium is a system that uses English text or voice input, provided either realtime or off-line, to generate context-specific colors and sound effects. It accomplishes this by employing a variety of machine learning approaches, including commonsense reasoning and natural language processing. It can be highly customized to prioritize different performance metrics, most importantly accuracy and latency, and can be used with any tagged sound corpus. The final product allows users to tell a story in an immersive environment that augments the story-telling experience with thematic colors and background sounds. In this thesis, we present the back-end logic that generates best guesses for contextual colors and sound using text input. We evaluate the performance of these algorithms under different configurations, and demonstrate that performance is acceptable for realistic user scenarios. We also discuss Narratarium's overall design.by Timothy Peng.M. Eng

    Sharing is Caring: Using Open Data To Improve Targeting Policies

    Get PDF
    When it comes to predictive power, companies in a variety of sectors depend on having sufficient data to develop and deploy business analytics applications, for example, to acquire new customers. While there is a vast literature on enriching internal data sets with external data sources, it is still largely unclear whether and how open data can be used to enrich internal data sets to improve business analytics. We choose a particular business analytics problem – designing targeting policies to acquire new customers – to investigate how an internal data set of a German grocery supplier can be enriched with open data to improve targeting policies. Using the enriched data set, we can improve the response rate of several well-established targeting policies by more than 30% in back-testing. Based on these results, we encourage firms and researchers to use, leverage, and share open data to enhance business analytics

    INVESTIGATING CRIME-TO-TWITTER RELATIONSHIPS IN URBAN ENVIRONMENTS - FACILITATING A VIRTUAL NEIGHBORHOOD WATCH

    Get PDF
    Social networks offer vast potential for marketing agencies, as members freely provide private information, for instance on their current situation, opinions, tastes, and feelings. The use of social networks to feed into crime platforms has been acknowledged to build a kind of a virtual neighborhood watch. Current attempts that tried to automatically connect news from social networks with crime platforms have concentrated on documentation of past events, but neglected the opportunity to use Twitter data as a decision support system to detect future crimes. In this work, we attempt to unleash the wisdom of crowds materialized in tweets from Twitter. This requires to look at Tweets that have been sent within a vicinity of each other. Based on the aggregated Tweets traffic we correlate them with crime types. Apparently, crimes such as disturbing the peace or homicide exhibit different Tweet patterns before the crime has been committed. We show that these tweet patterns can strengthen the explanation of criminal activity in urban areas. On top of that, we go beyond pure explanatory approaches and use predictive analytics to provide evidence that Twitter data can improve the prediction of crimes
    • …
    corecore