961 research outputs found
Stock Forecasts with LSTM and Web Sentiment
Traditional time-series techniques, such as auto-regressive and moving average models, can have difficulties when applied to stock data due to the randomness inherent to the markets. In this study, Long Short-Term Memory Recurrent Neural Networks, or LSTMs, have been applied to pricing data along with sentiment scores derived from web sources such as Twitter and other financial media outlets. The project team utilized this approach to complement the technical indicators observed at the end of each trading day for three stocks from the NASDAQ stock exchange over a 12-year span. A common benchmark to assess model performance on time series data is using the prior dayâs closing price of a given stock to predict the next dayâs closing value, which is a naive, but surprisingly accurate method when calculating the mean absolute error. The main objective of the paper is to use predictions from the various models assembled for the research, and then calculate whether the next dayâs closing price will rise or fall when compared against the last predicted value. All models showed on average a roughly 2% accuracy improvement over the largely balanced up and down movements for the tickers used in the study
Purchase Intentions on Social Media as Predictors of Consumer Spending
The paper addresses the problem of forecasting consumer expenditure from social media data. Previous research of the topic exploited the intuition that search engine traffic reflects purchase intentions and constructed predictive models of consumer behaviour from search query volumes. In contrast, we derive predictors from explicit expressions of purchase intentions found in social media posts. Two types of predictors created from these expressions are explored: those based on word embeddings and those based on topical word clusters. We introduce a new clustering method, which takes into account temporal co-occurrence of words, in addition to their semantic similarity, in order to create predictors relevant to the forecasting problem. The predictors are evaluated against baselines that use only macroeconomic variables, and against models trained on search traffic data. Conducting experiments with three different regression methods on Facebook and Twitter data, we find that both word embeddings and word clusters help to reduce forecasting errors in comparison to purely macroeconomic models. In most experimental settings, the error reduction is statistically significant, and is comparable to error reduction achieved with search traffic variables
Nowcasting user behaviour with social media and smart devices on a longitudinal basis: from macro- to micro-level modelling
The adoption of social media and smart devices by millions of users worldwide over the last decade has resulted in an unprecedented opportunity for NLP and social sciences. Users publish their thoughts and opinions on everyday issues through social media platforms, while they record their digital traces through their smart devices. Mining these rich resources offers new opportunities in sensing real-world events and indices (e.g., political preference, mental health indices) in a longitudinal fashion, either at the macro (population)-, or at the micro(user)-level.
The current project aims at developing approaches to ânowcast" (predict the current state of) such indices at both levels of granularity. First, we build natural language resources for the static tasks of sentiment analysis, emotion disclosure and sarcasm detection over user-generated content. These are important for opinion monitoring on a large scale. Second, we propose a general approach that leverages textual data derived from generic social media streams to nowcast political indices at the macro-level. Third, we leverage temporally sensitive and asynchronous information to nowcast the political stance of social media users, at the micro-level using multiple kernel learning. We then focus further on the micro-level modelling, to account for heterogeneous data sources, such as information derived from users' smart phones, SMS and social media messages, to nowcast time-varying mental health indices of a small cohort of users on a longitudinal basis. Finally, we present the challenges faced when applying such micro-level approaches in a real-world setting and propose directions for future research
Analyzing Tweets For Predicting Mental Health States Using Data Mining And Machine Learning Algorithms
Tweets are usually the outcome of peoplesâ feelings on various topics. Twitter allows users to post casual and emotional thoughts to share in real-time. Around 20% of U.S. adults use Twitter. Using the word-frequency and singular value decomposition methods, we identified the behavior of individuals through their tweets. We graded depressive and anti-depressive keywords using the tweet time-series, time-window, and time-stamp methods. We have collected around four million tweets since 2018. A parameter (Depressive Index) is computed using the F1 score and Mathews correlation coefficient (MCC) to indicate the depressive level. A framework showing the Depressive Index and the Happiness Index is prepared with the time, location, and keywords and delivers F1 Score, MCC, and CI values.
COVID-19 changed the routines of most peoples\u27 lives and affected mental health. We studied the tweets and compared them with the COVID-19 growth. The Happiness Index from our work and World Happiness Report for Georgia, New York, and Sri Lanka is compared. An interactive framework is prepared to analyze the tweets, depict the happiness index, and compare it. Bad words in tweets are analyzed, and a map showing the Happiness Index is computed for all the US states and was compared with WalletHub data. We add tweets continuously and a framework delivering an atlas of maps based on the Happiness Index and make these maps available for further study.
We forecasted tweets with real-time data. Our results of tweets and COVID-19 reports (WHO) are in a similar pattern. A new moving average method was presented; this unique process gave perfect results at peaks of the function and improved the error percentage.
An interactive GUI portal computes the Happiness Index, depression index, feel-good- factors, prediction of the keywords, and prepares a Happiness Index map. We plan to create a public web portal to facilitate users to get these results. Upon completing the proposed GUI application, the users can get the Happiness Index, Depression Index values, Happiness map, and prediction of keywords of the desired dates and geographical locations instantaneously
Recommended from our members
Projects in Applied Data Science: Fall 2019
This document contains semester projects for students in CSCI 4381/7000 Data ScienceProjects. This course explores concepts and techniques for design, formulation and execution ofpractical, applied data science. Topics covered include experimental design, statistical analysisand predictive modeling, machine learning, data visualization, scientific writing and presentation.During the class, students selected a semester-long project to acquire, analyze, and understanddata in support of a research question. In addition to traditional lectures, students read anddiscussed published papers on data science topics, practiced skills in recitation sessions, andentertained guest lectures from expert data scientists in the field. Outside of these readings andrecitations, students were allowed to work on their projects exclusively and were supported withmeetings, peer-discussion and copyediting.
In terms of the scope of the final product, undergraduate students were asked to perform aresearch or engineering task of some complexity while graduate students were additionallyrequired to perform a survey of related work, demonstrate some novelty in their approach, anddescribe the position of their contribution within the broader literature. All students whoperformed at or above these expectations were offered the opportunity to contribute their paperfor publication in this technical summary.
The diversity of the papers herein is representative of the diversity of interests of the students inthe class. There is no common trend among the papers submitted and each takes a differenttopic to task. Students made use of open data or worked with organizations to acquire data.Several students pivoted their projects early on due to limitations and difficulties in data access--- a real-world challenge in practical data science. The projects herein range from analyzingtraffic in cities, restaurant trends and Facebook responses to smartphone accelerometer data,scaling laws in higher education, and bicycle trends in Boulder, Colorado. Analysis approachesare similarly varied: visualization, statistical analysis and modeling, machine learning,reinforcement learning, etc.. Most papers can be understood as exploratory data analysis,although some emphasize interactive visualization and others emphasize statistical modelingand prediction aimed at testing a well-defined research question. To inform the style of theirapproach, students read papers from a broad sampling of original research. They used thesereadings to build an understanding of approaches to presentation and analysis in the modernscientific literature. One paper was held out from this compendium so that it could be submittedfor publication to a peer-reviewed venue.
Please direct questions/comments on individual papers to the student authors when contactinformation has been made available.</p
Proceedings of Mathsport international 2017 conference
Proceedings of MathSport International 2017 Conference, held in the Botanical Garden of the University of Padua, June 26-28, 2017.
MathSport International organizes biennial conferences dedicated to all topics where mathematics and sport meet.
Topics include: performance measures, optimization of sports performance, statistics and probability models, mathematical and physical models in sports, competitive strategies, statistics and probability match outcome models, optimal tournament design and scheduling, decision support systems, analysis of rules and adjudication, econometrics in sport, analysis of sporting technologies, financial valuation in sport, e-sports (gaming), betting and sports
- âŠ