9 research outputs found
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
The Effects of Twitter Sentiment on Stock Price Returns
Social media are increasingly reflecting and influencing behavior of other
complex systems. In this paper we investigate the relations between a well-know
micro-blogging platform Twitter and financial markets. In particular, we
consider, in a period of 15 months, the Twitter volume and sentiment about the
30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We
find a relatively low Pearson correlation and Granger causality between the
corresponding time series over the entire time period. However, we find a
significant dependence between the Twitter sentiment and abnormal returns
during the peaks of Twitter volume. This is valid not only for the expected
Twitter volume peaks (e.g., quarterly announcements), but also for peaks
corresponding to less obvious events. We formalize the procedure by adapting
the well-known "event study" from economics and finance to the analysis of
Twitter data. The procedure allows to automatically identify events as Twitter
volume peaks, to compute the prevailing sentiment (positive or negative)
expressed in tweets at these peaks, and finally to apply the "event study"
methodology to relate them to stock returns. We show that sentiment polarity of
Twitter peaks implies the direction of cumulative abnormal returns. The amount
of cumulative abnormal returns is relatively low (about 1-2%), but the
dependence is statistically significant for several days after the events
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
What are the limits of automated Twitter sentiment classification? We analyze
a large set of manually labeled tweets in different languages, use them as
training data, and construct automated classification models. It turns out that
the quality of classification models depends much more on the quality and size
of training data than on the type of the model trained. Experimental results
indicate that there is no statistically significant difference between the
performance of the top classification models. We quantify the quality of
training data by applying various annotator agreement measures, and identify
the weakest points of different datasets. We show that the model performance
approaches the inter-annotator agreement when the size of the training set is
sufficiently large. However, it is crucial to regularly monitor the self- and
inter-annotator agreements since this improves the training datasets and
consequently the model performance. Finally, we show that there is strong
evidence that humans perceive the sentiment classes (negative, neutral, and
positive) as ordered
La structure linguistique de tweets en campagne présidentielle
L’objectif de cette étude est de faire une analyse systématique de tweets publiés par Emmanuel Macron et Marine Le Pen pendant la campagne présidentielle en 2022. L’analyse est réalisée aux niveaux textuel, syntaxique, énonciatif et thématique. Les résultats montrent de légères différences au niveau textuel et syntaxique et des différences saillantes au niveau énonciatif (jugements de valeur, verbes modaux, emphase) et thématique. La méthodologie proposée permet aux linguistes sans compétences computationnelles d’obtenir des résultats quantifiables et en même temps interprétables par des catégories linguistiques traditionnelles.
Parallel data processing, analysis and visualization using high scalability mechanisms
In this work we present conceptual and implementation model for scalable, distributed and balanced execution of large number of compute operations running on multiple processing units in the cloud.
We provide system development methods for large scale processing with minimal time constraints and limitations in regard to increasing scale-out parallelism in the cloud. Implementation details regarding elastic adjustment to processing units are discussed in connection to required processing power needed in a cloud environment.
Work provides filtering approaches for useful data in the described problem domain. We present options for advanced data filtering in multiple stages, which correlate with needed analyses requirements.
At the end of this work we present ways of visualization of advanced analysis of gathered data in a form of intuitive and interactive UI components, graphs, word clouds and other user acceptable views
Parallel data processing, analysis and visualization using high scalability mechanisms
In this work we present conceptual and implementation model for scalable, distributed and balanced execution of large number of compute operations running on multiple processing units in the cloud.
We provide system development methods for large scale processing with minimal time constraints and limitations in regard to increasing scale-out parallelism in the cloud. Implementation details regarding elastic adjustment to processing units are discussed in connection to required processing power needed in a cloud environment.
Work provides filtering approaches for useful data in the described problem domain. We present options for advanced data filtering in multiple stages, which correlate with needed analyses requirements.
At the end of this work we present ways of visualization of advanced analysis of gathered data in a form of intuitive and interactive UI components, graphs, word clouds and other user acceptable views