5 research outputs found

    Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark

    Get PDF
    Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general

    Leveraging graph-based semantic annotation for the identification of cause-effect relations

    Get PDF
    This research is related to language article in Indonesia that discuss about causality relationship research used as public health surveillance information monitoring system. Utilization of this research is suitability of feature selection, phrase annotation, paragraph annotation, medical element annotation and graph-based semantic annotation. Evaluation of system performance is done by intrinsic approach using the Naive Bayes Multinomial method. The results obtained sequentially for recall, precision and f-measure are 0.924, 0.905, and 0.910

    The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis

    Full text link
    We conduct an inquiry into the sociotechnical aspects of sentiment analysis (SA) by critically examining 189 peer-reviewed papers on their applications, models, and datasets. Our investigation stems from the recognition that SA has become an integral component of diverse sociotechnical systems, exerting influence on both social and technical users. By delving into sociological and technological literature on sentiment, we unveil distinct conceptualizations of this term in domains such as finance, government, and medicine. Our study exposes a lack of explicit definitions and frameworks for characterizing sentiment, resulting in potential challenges and biases. To tackle this issue, we propose an ethics sheet encompassing critical inquiries to guide practitioners in ensuring equitable utilization of SA. Our findings underscore the significance of adopting an interdisciplinary approach to defining sentiment in SA and offer a pragmatic solution for its implementation.Comment: This paper has been accepted and will appear at the EMNLP 2023 Main Conferenc

    Blockchain in Healthcare: a New Perspective from Social Media Data

    Get PDF
    Blockchain as a technology has brought with it a wave of promises and expectations. After its successes in the financial sector, many potential new applications of the technology have been theorized across a variety of sectors. Blockchain’s application to healthcare stands out among these theories. Healthcare is a sector that views technological innovation under more scrutiny, so the introduction of blockchain into healthcare is a particularly unique implementation of the technology. Attempting to understand how blockchain is accepted in the healthcare industry is a difficult problem due to the nature of data associated with the sector. One avenue to understand how blockchain is viewed by this sector is through analysis of social media micro-blogging on the Twitter platform. By archiving a time series of tweets, important questions about how blockchain is viewed in healthcare can be addressed with the natural language processing technique of sentiment analysis. An ensemble of BERT models are identified as the best classifier with the given training data, and are further applied to a time series of tweets about blockchain in healthcare. This study analyzes healthcare perceptions of blockchain based on these results, and finds that the distribution of sentiment is largely positive. Examining the volume of tweets over time also indicates a massive increase in interest in the topic in 2018. Finally, when exploring how company accounts tweet compared to personal accounts, it is found that personal accounts produce slightly more positive tweets relative to company accounts. Thus, it is understood that healthcare perception of blockchain became consistently positive following 2017
    corecore