64 research outputs found
Web search queries can predict stock market volumes
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. © 2012 Bordino et al
Coupling news sentiment with web browsing data improves prediction of intra-day price dynamics
The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users' behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012-2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a news signal where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for nearly 50% of the companies such signal Granger-causes hourly price returns. Our result indicates a "wisdom-of-the-crowd" effect that allows to exploit users' activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment
Provably and Efficiently Approximating Near-cliques using the Tur\'an Shadow: PEANUTS
Clique and near-clique counts are important graph properties with
applications in graph generation, graph modeling, graph analytics, community
detection among others. They are the archetypal examples of dense subgraphs.
While there are several different definitions of near-cliques, most of them
share the attribute that they are cliques that are missing a small number of
edges. Clique counting is itself considered a challenging problem. Counting
near-cliques is significantly harder more so since the search space for
near-cliques is orders of magnitude larger than that of cliques.
We give a formulation of a near-clique as a clique that is missing a constant
number of edges. We exploit the fact that a near-clique contains a smaller
clique, and use techniques for clique sampling to count near-cliques. This
method allows us to count near-cliques with 1 or 2 missing edges, in graphs
with tens of millions of edges. To the best of our knowledge, there was no
known efficient method for this problem, and we obtain a 10x - 100x speedup
over existing algorithms for counting near-cliques.
Our main technique is a space-efficient adaptation of the Tur\'an Shadow
sampling approach, recently introduced by Jain and Seshadhri (WWW 2017). This
approach constructs a large recursion tree (called the Tur\'an Shadow) that
represents cliques in a graph. We design a novel algorithm that builds an
estimator for near-cliques, using an online, compact construction of the
Tur\'an Shadow.Comment: The Web Conference, 2020 (WWW
Quantifying trading behavior in financial markets using Google Trends
Crises in financial markets affect humans worldwide. Detailed market data on trading decisions reflect some of the complex human behavior that has led to these crises. We suggest that massive new data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. By analyzing changes in Google query volumes for search terms related to finance, we find patterns that may be interpreted as “early warning signs” of stock market moves. Our results illustrate the potential that combining extensive behavioral data sets offers for a better understanding of collective human behavior
Twitter-based analysis of the dynamics of collective attention to political parties
Large-scale data from social media have a significant potential to describe
complex phenomena in real world and to anticipate collective behaviors such as
information spreading and social trends. One specific case of study is
represented by the collective attention to the action of political parties. Not
surprisingly, researchers and stakeholders tried to correlate parties' presence
on social media with their performances in elections. Despite the many efforts,
results are still inconclusive since this kind of data is often very noisy and
significant signals could be covered by (largely unknown) statistical
fluctuations. In this paper we consider the number of tweets (tweet volume) of
a party as a proxy of collective attention to the party, identify the dynamics
of the volume, and show that this quantity has some information on the
elections outcome. We find that the distribution of the tweet volume for each
party follows a log-normal distribution with a positive autocorrelation of the
volume over short terms, which indicates the volume has large fluctuations of
the log-normal distribution yet with a short-term tendency. Furthermore, by
measuring the ratio of two consecutive daily tweet volumes, we find that the
evolution of the daily volume of a party can be described by means of a
geometric Brownian motion (i.e., the logarithm of the volume moves randomly
with a trend). Finally, we determine the optimal period of averaging tweet
volume for reducing fluctuations and extracting short-term tendencies. We
conclude that the tweet volume is a good indicator of parties' success in the
elections when considered over an optimal time window. Our study identifies the
statistical nature of collective attention to political issues and sheds light
on how to model the dynamics of collective attention in social media.Comment: 16 pages, 7 figures, 3 tables. Published in PLoS ON
The Effects of Twitter Sentiment on Stock Price Returns
Social media are increasingly reflecting and influencing behavior of other
complex systems. In this paper we investigate the relations between a well-know
micro-blogging platform Twitter and financial markets. In particular, we
consider, in a period of 15 months, the Twitter volume and sentiment about the
30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We
find a relatively low Pearson correlation and Granger causality between the
corresponding time series over the entire time period. However, we find a
significant dependence between the Twitter sentiment and abnormal returns
during the peaks of Twitter volume. This is valid not only for the expected
Twitter volume peaks (e.g., quarterly announcements), but also for peaks
corresponding to less obvious events. We formalize the procedure by adapting
the well-known "event study" from economics and finance to the analysis of
Twitter data. The procedure allows to automatically identify events as Twitter
volume peaks, to compute the prevailing sentiment (positive or negative)
expressed in tweets at these peaks, and finally to apply the "event study"
methodology to relate them to stock returns. We show that sentiment polarity of
Twitter peaks implies the direction of cumulative abnormal returns. The amount
of cumulative abnormal returns is relatively low (about 1-2%), but the
dependence is statistically significant for several days after the events
- …