64,196 research outputs found
Analyzing stock market movements using Twitter sentiment analysis
In this paper we investigate the complex relationship between tweet board literature (like bullishness, volume, agreement etc) with the financial market instruments (like volatility, trading volume and stock prices). We have analyzed sentiments for more than 4 million tweets between June 2010 to July 2011 for DJIA, NASDAQ-100 and 13 other big cap technological stocks. Our results show high correlation (up to 0.88 for returns) between stock prices and twitter sentiments. Further, using Granger's Causality Analysis, we have validated that the movement of stock prices and indices are greatly affected in the short term by Twitter discussions. Finally, we have implemented Expert Model Mining System (EMMS) to demonstrate that our forecasted returns give a high value of Rsquare (0.952) with low Maximum Absolute Percentage Error (MaxAPE) of 1.76% for Dow Jones Industrial Average (DJIA)
Network Analysis with the Enron Email Corpus
We use the Enron email corpus to study relationships in a network by applying
six different measures of centrality. Our results came out of an in-semester
undergraduate research seminar. The Enron corpus is well suited to statistical
analyses at all levels of undergraduate education. Through this note's focus on
centrality, students can explore the dependence of statistical models on
initial assumptions and the interplay between centrality measures and
hierarchical ranking, and they can use completed studies as springboards for
future research. The Enron corpus also presents opportunities for research into
many other areas of analysis, including social networks, clustering, and
natural language processing.Comment: in Journal of Statistics Education, Volume 23, Number 2, 201
Graph-based Features for Automatic Online Abuse Detection
While online communities have become increasingly important over the years,
the moderation of user-generated content is still performed mostly manually.
Automating this task is an important step in reducing the financial cost
associated with moderation, but the majority of automated approaches strictly
based on message content are highly vulnerable to intentional obfuscation. In
this paper, we discuss methods for extracting conversational networks based on
raw multi-participant chat logs, and we study the contribution of graph
features to a classification system that aims to determine if a given message
is abusive. The conversational graph-based system yields unexpectedly high
performance , with results comparable to those previously obtained with a
content-based approach
- …