7 research outputs found

    Overview of INEX Tweet Contextualization 2013 track

    Get PDF
    International audienceTwitter is increasingly used for on-line client and audience fishing; this motivated the tweet contextualization task at INEX. The objective is to help a user to understand a tweet by providing him with a short summary (500 words). This summary should be built automatically using local resources like the Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. The task is evaluated considering informativeness which is computed using a variant of Kullback-Leibler divergence and passage pooling. Meanwhile effective readability in context of summaries is checked using binary questionnaires on small samples of results. Running since 2010, results show that only systems that efficiently combine passage retrieval, sentence segmentation and scoring, named entity recognition, text POS analysis, anaphora detection, diversity content measure as well as sentence reordering are effective

    INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

    Get PDF
    Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

    Tweet Contextualization Based on Wikipedia and Dbpedia

    No full text
    National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection

    Évaluation de la contextualisation de tweets

    Get PDF
    National audienceCet article s'intéresse à l'évaluation de la contextualisation de tweets. La contextualisation est définie comme un résumé permettant de remettre en contexte un texte qui, de par sa taille, ne contient pas l'ensemble des éléments qui permettent à un lecteur de comprendre tout ou partie de son contenu. Nous définissons un cadre d'évaluation pour la contextualisation de tweets généralisable à d'autres textes courts. Nous proposons une collection de référence ainsi que des mesures d'évaluation adhoc. Ce cadre d'évaluation a été expérimenté avec succÚs dans la contexte de la campagne INEX Tweet Contextualization. Au regard des résultats obtenus lors de cette campagne, nous discutons ici les mesures utilisées en lien avec les autres mesures de la littérature

    Social Media Operationalized for GIS: The Prequel

    Get PDF
    With social media a de facto global communication channel used to disseminate news, entertainment, and one’s self-revelations, the latter contains double-talk, peculiar insight, and contextual observation about real-world events. The primary objective is to propose a novel pipeline to classify a tweet as either “useful” or “not useful” by using widely-accepted Natural Language Processing (NLP) techniques, and measure the effect of such method based on the change in performance of a Geographical Information System (GIS) artifact. A 1,000 tweet sample is manually tagged and compared to an innovative social media grammar applied by a rule-based social media NLP pipeline. Evaluation underpins answering, prior to content analysis of a tweet, does a method exist to support identifying a tweet as “useful” for subsequent processing? Indeed, “useful” tweet identification via NLP returned precision of 0.9256, recall of 0.6590, and F-measure of 0.7699; consequently GIS social media processing increased 0.2194 over baseline

    The challenging task of summary evaluation: an overview

    Get PDF
    Evaluation is crucial in the research and development of automatic summarization applications, in order to determine the appropriateness of a summary based on different criteria, such as the content it contains, and the way it is presented. To perform an adequate evaluation is of great relevance to ensure that automatic summaries can be useful for the context and/or application they are generated for. To this end, researchers must be aware of the evaluation metrics, approaches, and datasets that are available, in order to decide which of them would be the most suitable to use, or to be able to propose new ones, overcoming the possible limitations that existing methods may present. In this article, a critical and historical analysis of evaluation metrics, methods, and datasets for automatic summarization systems is presented, where the strengths and weaknesses of evaluation efforts are discussed and the major challenges to solve are identified. Therefore, a clear up-to-date overview of the evolution and progress of summarization evaluation is provided, giving the reader useful insights into the past, present and latest trends in the automatic evaluation of summaries.This research is partially funded by the European Commission under the Seventh (FP7 - 2007- 2013) Framework Programme for Research and Technological Development through the SAM (FP7-611312) project; by the Spanish Government through the projects VoxPopuli (TIN2013-47090-C3-1-P) and Vemodalen (TIN2015-71785-R), the Generalitat Valenciana through project DIIM2.0 (PROMETEOII/2014/001), and the Universidad Nacional de Educación a Distancia through the project “Modelado y síntesis automática de opiniones de usuario en redes sociales” (2014-001-UNED-PROY)
    corecore