10,379 research outputs found
Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark
Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general
Recommended from our members
Adverse Drug Reaction Classification With Deep Neural Networks
We study the problem of detecting sentences describing adverse drug reactions (ADRs) and frame the problem as binary classification. We investigate different neural network (NN) architectures for ADR classification. In particular, we propose two new neural network models, Convolutional Recurrent Neural Network (CRNN) by concatenating convolutional neural networks with recurrent neural networks, and Convolutional Neural Network with Attention (CNNA) by adding attention weights into convolutional neural networks. We evaluate various NN architectures on a Twitter dataset containing informal language and an Adverse Drug Effects (ADE) dataset constructed by sampling from MEDLINE case reports. Experimental results show that all the NN architectures outperform the traditional maximum entropy classifiers trained from n-grams with different weighting strategies considerably on both datasets. On the Twitter dataset, all the NN architectures perform similarly. But on the ADE dataset, CNN performs better than other more complex CNN variants. Nevertheless, CNNA allows the visualisation of attention weights of words when making classification decisions and hence is more appropriate for the extraction of word subsequences describing ADRs
A Large-Scale CNN Ensemble for Medication Safety Analysis
Revealing Adverse Drug Reactions (ADR) is an essential part of post-marketing
drug surveillance, and data from health-related forums and medical communities
can be of a great significance for estimating such effects. In this paper, we
propose an end-to-end CNN-based method for predicting drug safety on user
comments from healthcare discussion forums. We present an architecture that is
based on a vast ensemble of CNNs with varied structural parameters, where the
prediction is determined by the majority vote. To evaluate the performance of
the proposed solution, we present a large-scale dataset collected from a
medical website that consists of over 50 thousand reviews for more than 4000
drugs. The results demonstrate that our model significantly outperforms
conventional approaches and predicts medicine safety with an accuracy of 87.17%
for binary and 62.88% for multi-classification tasks
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Predicting the Effects of News Sentiments on the Stock Market
Stock market forecasting is very important in the planning of business
activities. Stock price prediction has attracted many researchers in multiple
disciplines including computer science, statistics, economics, finance, and
operations research. Recent studies have shown that the vast amount of online
information in the public domain such as Wikipedia usage pattern, news stories
from the mainstream media, and social media discussions can have an observable
effect on investors opinions towards financial markets. The reliability of the
computational models on stock market prediction is important as it is very
sensitive to the economy and can directly lead to financial loss. In this
paper, we retrieved, extracted, and analyzed the effects of news sentiments on
the stock market. Our main contributions include the development of a sentiment
analysis dictionary for the financial sector, the development of a
dictionary-based sentiment analysis model, and the evaluation of the model for
gauging the effects of news sentiments on stocks for the pharmaceutical market.
Using only news sentiments, we achieved a directional accuracy of 70.59% in
predicting the trends in short-term stock price movement.Comment: 4 page
- …