282,787 research outputs found
Anomaly Detection on Graph Time Series
In this paper, we use variational recurrent neural network to investigate the
anomaly detection problem on graph time series. The temporal correlation is
modeled by the combination of recurrent neural network (RNN) and variational
inference (VI), while the spatial information is captured by the graph
convolutional network. In order to incorporate external factors, we use feature
extractor to augment the transition of latent variables, which can learn the
influence of external factors. With the target function as accumulative ELBO,
it is easy to extend this model to on-line method. The experimental study on
traffic flow data shows the detection capability of the proposed method
Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark
Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general
- …
