1,952 research outputs found
Are black friday deals worth it? Mining twitter users' sentiment and behavior response
The Black Friday event has become a global opportunity for marketing and companies’
strategies aimed at increasing sales. The present study aims to understand consumer behavior
through the analysis of user-generated content (UGC) on social media with respect to the Black Friday
2018 offers published by the 23 largest technology companies in Spain. To this end, we analyzed
Twitter-based UGC about companies’ offers using a three-step data text mining process. First, a Latent
Dirichlet Allocation Model (LDA) was used to divide the sample into topics related to Black Friday.
In the next step, sentiment analysis (SA) using Python was carried out to determine the feelings
towards the identified topics and offers published by the companies on Twitter. Thirdly and finally,
a data-text mining process called textual analysis (TA) was performed to identify insights that could
help companies to improve their promotion and marketing strategies as well as to better understand
the customer behavior on social media. The results show that consumers had positive perceptions of
such topics as exclusive promotions (EP) and smartphones (SM); by contrast, topics such as fraud (FA),
insults and noise (IN), and customer support (CS) were negatively perceived by customers. Based on
these results, we offer guidelines to practitioners to improve their social media communication.
Our results also have theoretical implications that can promote further research in this area
Applying text mining techniques to forecast the stock market fluctuations of large it companies with twitter data: descriptive and predictive approaches to enhance the research of stock market predictions with textual and semantic data
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementThis research project applies advanced text mining techniques as a method to predict stock market fluctuations by merging published tweets and daily stock market prices for a set of American Information Technology companies. This project executes a systematical approach to investigate and further analyze, by using mainly R code, two main objectives: i) which are the descriptive criteria, patterns, and variables, which are correlated with the stock fluctuation and ii) does the single usage of tweets indicate moderate signal to predict with high accuracy the stock market fluctuations. The main supposition and expected output of the research work is to deliver findings about the twitter text significance and predictability power to indicate the importance of social media content in terms of stock market fluctuations by using descriptive and predictive data mining approaches, as natural language processing, topic modelling, sentiment analysis and binary classification with neural networks
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
MetaLDA: a Topic Model that Efficiently Incorporates Meta information
Besides the text content, documents and their associated words usually come
with rich sets of meta informa- tion, such as categories of documents and
semantic/syntactic features of words, like those encoded in word embeddings.
Incorporating such meta information directly into the generative process of
topic models can improve modelling accuracy and topic quality, especially in
the case where the word-occurrence information in the training data is
insufficient. In this paper, we present a topic model, called MetaLDA, which is
able to leverage either document or word meta information, or both of them
jointly. With two data argumentation techniques, we can derive an efficient
Gibbs sampling algorithm, which benefits from the fully local conjugacy of the
model. Moreover, the algorithm is favoured by the sparsity of the meta
information. Extensive experiments on several real world datasets demonstrate
that our model achieves comparable or improved performance in terms of both
perplexity and topic quality, particularly in handling sparse texts. In
addition, compared with other models using meta information, our model runs
significantly faster.Comment: To appear in ICDM 201
RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation
Real-time location inference of social media users is the fundamental of some
spatial applications such as localized search and event detection. While tweet
text is the most commonly used feature in location estimation, most of the
prior works suffer from either the noise or the sparsity of textual features.
In this paper, we aim to tackle these two problems. We use topic modeling as a
building block to characterize the geographic topic variation and lexical
variation so that "one-hot" encoding vectors will no longer be directly used.
We also incorporate other features which can be extracted through the Twitter
streaming API to overcome the noise problem. Experimental results show that our
RATE algorithm outperforms several benchmark methods, both in the precision of
region classification and the mean distance error of latitude and longitude
regression.Comment: 4 pages; Accepted to CIKM 2017; Some typos fixe
- …