635 research outputs found

    Tweet Contextualization Based on Wikipedia and Dbpedia

    No full text
    National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    Active learning in annotating micro-blogs dealing with e-reputation

    Full text link
    Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

    INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

    Get PDF
    Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

    Corporate image or social engagement: Twitter discourse on corporate social responsibility (CSR) in public relations strategies in the energy sector

    Get PDF
    Social media have opened up new opportunities for the creation of innovative public relations strategies focused on establishing and cultivating relationships with stakeholders on the basis of meaningful dialogue. Consideration of the interrelation between corporate social responsibility (CSR) and public relations highlights new areas for exploration and engagement. Both the dialogical and semantic perspectives reveal the performative and conversational aspects of social media. In general, both the linguistic panorama of CSR and digital media as part of a PR strategy open new possibilities for a dialogical, interactive, meaningful relationship strategy for corporate image management. Based on the linguistic approach to CSR and the Communication Management Approach, this paper explores the linguistic use of Twitter as a primary dialogical strategy to effectively enhance interactive dialogue-based relationships with the stakeholders of the top 50 companies in the energy sector based on tweet data from 2016. Semantic analysis was conducted by advanced text mining and clustering techniques on 3042 tweets monitored in 2017 that contained the leading CSR-related hashtags and keywords. The results demonstrated that the top energy companies apply a defensive and symbolic perspective, mainly for branding purposes. The corporate discourse dominates over a meaningful conversational strategy to foster interaction with stakeholders around sustainability issues on Twitter. The study reveals a homogenized interrelation between CSR, social media, and public relations. The results reveal a tendency for isomorphy in the communication models applied by the companies in the energy sector. Furthermore, similarities in semantics and thus strong tendencies to mutually mimic dialogical strategies are also observed. The semantic narrative built around the brand indicates a limited orientation towards CSR and sustainability. As such, it does not contribute to the creation of a dialogical interaction and meaningful relationships with multiple stakeholders on Twitter, in the high-risk sector represented by the energy industry

    LIA@CLEF 2018: Mining events opinion argumentation from raw unlabeled Twitter data using convolutional neural network

    Get PDF
    International audienceSocial networks on the Internet are becoming increasingly important in our society. In recent years, this type of media, through communication platforms such as Twitter, has brought new research issues due to the massive size of data exchanged and the important number of ever-increasing users. In this context, the CLEF 2018 Mining opinion argumentation task aims to retrieve, for a specific event (festival name or topic), the most diverse argumentative microblogs from a large collection of tweets about festivals in different languages. In this paper, we propose a four-step approach for extracting argumentative microblogs related to a specific query (or event) while no reference data is provided

    Social Media Operationalized for GIS: The Prequel

    Get PDF
    With social media a de facto global communication channel used to disseminate news, entertainment, and one’s self-revelations, the latter contains double-talk, peculiar insight, and contextual observation about real-world events. The primary objective is to propose a novel pipeline to classify a tweet as either “useful” or “not useful” by using widely-accepted Natural Language Processing (NLP) techniques, and measure the effect of such method based on the change in performance of a Geographical Information System (GIS) artifact. A 1,000 tweet sample is manually tagged and compared to an innovative social media grammar applied by a rule-based social media NLP pipeline. Evaluation underpins answering, prior to content analysis of a tweet, does a method exist to support identifying a tweet as “useful” for subsequent processing? Indeed, “useful” tweet identification via NLP returned precision of 0.9256, recall of 0.6590, and F-measure of 0.7699; consequently GIS social media processing increased 0.2194 over baseline

    Dissecting Deep Language Models: The Explainability and Bias Perspective

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Analysing and evaluating the task of automatic tweet generation: Knowledge to business

    Get PDF
    In this paper a study concerning the evaluation and analysis of natural language tweets is presented. Based on our experience in text summarisation, we carry out a deep analysis on user's perception through the evaluation of tweets manual and automatically generated from news. Specifically, we consider two key issues of a tweet: its informativeness and its interestingness. Therefore, we analyse: (1) do users equally perceive manual and automatic tweets?; (2) what linguistic features a good tweet may have to be interesting, as well as informative? The main challenge of this proposal is the analysis of tweets to help companies in their positioning and reputation on the Web. Our results show that: (1) automatically informative and interesting natural language tweets can be generated as a result of summarisation approaches; and (2) we can characterise good and bad tweets based on specific linguistic features not present in other types of tweets.This research work has been partially funded by the University of Alicante, Generalitat Valenciana, Spanish Government and the European Commission through the projects, “Tratamiento inteligente de la información para la ayuda a la toma de decisiones” (GRE12-44), “Explotación y tratamiento de la información disponible en Internet para la anotación y generación de textos adaptados al usuario” (GRE13-15), DIIM2.0 (PROMETEOII/2014/001), ATTOS (TIN2012-38536-C03-03), LEGOLANG-UAGE (TIN2012-31224), and SAM (FP7-611312)
    corecore