9 research outputs found

    Active learning in annotating micro-blogs dealing with e-reputation

    Full text link
    Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

    The Effects of Twitter Sentiment on Stock Price Returns

    Get PDF
    Social media are increasingly reflecting and influencing behavior of other complex systems. In this paper we investigate the relations between a well-know micro-blogging platform Twitter and financial markets. In particular, we consider, in a period of 15 months, the Twitter volume and sentiment about the 30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We find a relatively low Pearson correlation and Granger causality between the corresponding time series over the entire time period. However, we find a significant dependence between the Twitter sentiment and abnormal returns during the peaks of Twitter volume. This is valid not only for the expected Twitter volume peaks (e.g., quarterly announcements), but also for peaks corresponding to less obvious events. We formalize the procedure by adapting the well-known "event study" from economics and finance to the analysis of Twitter data. The procedure allows to automatically identify events as Twitter volume peaks, to compute the prevailing sentiment (positive or negative) expressed in tweets at these peaks, and finally to apply the "event study" methodology to relate them to stock returns. We show that sentiment polarity of Twitter peaks implies the direction of cumulative abnormal returns. The amount of cumulative abnormal returns is relatively low (about 1-2%), but the dependence is statistically significant for several days after the events

    Multilingual Twitter Sentiment Classification: The Role of Human Annotators

    Get PDF
    What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered

    La structure linguistique de tweets en campagne présidentielle

    Get PDF
    L’objectif de cette étude est de faire une analyse systématique de tweets publiés par Emmanuel Macron et Marine Le Pen pendant la campagne présidentielle en 2022. L’analyse est réalisée aux niveaux textuel, syntaxique, énonciatif et thématique. Les résultats montrent de légères différences au niveau textuel et syntaxique et des différences saillantes au niveau énonciatif (jugements de valeur, verbes modaux, emphase) et thématique. La méthodologie proposée permet aux linguistes sans compétences computationnelles d’obtenir des résultats quantifiables et en même temps interprétables par des catégories linguistiques traditionnelles.

    Parallel data processing, analysis and visualization using high scalability mechanisms

    Get PDF
    In this work we present conceptual and implementation model for scalable, distributed and balanced execution of large number of compute operations running on multiple processing units in the cloud. We provide system development methods for large scale processing with minimal time constraints and limitations in regard to increasing scale-out parallelism in the cloud. Implementation details regarding elastic adjustment to processing units are discussed in connection to required processing power needed in a cloud environment. Work provides filtering approaches for useful data in the described problem domain. We present options for advanced data filtering in multiple stages, which correlate with needed analyses requirements. At the end of this work we present ways of visualization of advanced analysis of gathered data in a form of intuitive and interactive UI components, graphs, word clouds and other user acceptable views

    Parallel data processing, analysis and visualization using high scalability mechanisms

    Get PDF
    In this work we present conceptual and implementation model for scalable, distributed and balanced execution of large number of compute operations running on multiple processing units in the cloud. We provide system development methods for large scale processing with minimal time constraints and limitations in regard to increasing scale-out parallelism in the cloud. Implementation details regarding elastic adjustment to processing units are discussed in connection to required processing power needed in a cloud environment. Work provides filtering approaches for useful data in the described problem domain. We present options for advanced data filtering in multiple stages, which correlate with needed analyses requirements. At the end of this work we present ways of visualization of advanced analysis of gathered data in a form of intuitive and interactive UI components, graphs, word clouds and other user acceptable views
    corecore