Search CORE

7 research outputs found

Overview of INEX Tweet Contextualization 2013 track

Author: Bellot Patrice
Moriceau Véronique
Mothe Josiane
Sanjuan Eric
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceTwitter is increasingly used for on-line client and audience fishing; this motivated the tweet contextualization task at INEX. The objective is to help a user to understand a tweet by providing him with a short summary (500 words). This summary should be built automatically using local resources like the Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. The task is evaluated considering informativeness which is computed using a variant of Kullback-Leibler divergence and passage pooling. Meanwhile effective readability in context of summaries is checked using binary questionnaires on small samples of results. Running since 2010, results show that only systems that efficiently combine passage retrieval, sentence segmentation and scoring, named entity recognition, text POS analysis, anaphora detection, diversity content measure as well as sentence reordering are effective

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

Open Archive Toulouse Archive Ouverte

INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

Author: Bellot Patrice
Juan Eric San
Moriceau Véronique
Mothe Josiane
SanJuan Eric
Tannier Xavier
Publication venue: Elsevier
Publication date: 01/03/2016
Field of study

Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

ZENODO

Open Archive Toulouse Archive Ouverte

HAL Descartes

Tweet Contextualization Based on Wikipedia and Dbpedia

Author: Berrut Catherine
Latiri Chiraz
Mulhem Philippe
Slimani Yahya
Zingla Meriem Amina
Publication venue: HAL CCSD
Publication date: 09/03/2016
Field of study

National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection

Hal - Université Grenoble Alpes

Évaluation de la contextualisation de tweets

Author: Bellot Patrice
Moriceau Véronique
Mothe Josiane
Sanjuan Eric
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

National audienceCet article s'intéresse à l'évaluation de la contextualisation de tweets. La contextualisation est définie comme un résumé permettant de remettre en contexte un texte qui, de par sa taille, ne contient pas l'ensemble des éléments qui permettent à un lecteur de comprendre tout ou partie de son contenu. Nous définissons un cadre d'évaluation pour la contextualisation de tweets généralisable à d'autres textes courts. Nous proposons une collection de référence ainsi que des mesures d'évaluation adhoc. Ce cadre d'évaluation a été expérimenté avec succès dans la contexte de la campagne INEX Tweet Contextualization. Au regard des résultats obtenus lors de cette campagne, nous discutons ici les mesures utilisées en lien avec les autres mesures de la littérature

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

Open Archive Toulouse Archive Ouverte

Recommended from our members

GIS Investigation of Crime Prediction with an Operationalized Tweet Corpus

Author: Alsudais Abdulkareem
Corso Anthony J.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 22/09/2017
Field of study

Social media as the de facto communication channel is being used to disseminate one’s diurnal self-revelations. This profound discovery often contains double-talk, peculiar insights, or contextual information about real-world events. Natural language processing is regularly used to uncover both obvious and latent knowledge claims within disclosures published amid the complex environment. For example, a perpetrator with first-hand knowledge of their criminal incident uses social media to post critical information about it. A geographic information system (GIS) is capable of large-scale point data analysis and possesses methods that enable dataset processing, evaluation, and automatic spatial visualization. Such an artifact—fused with traditional environmental criminology theory and social media—erects guidelines, tools, and models for substantive construction and evaluation of GIS crime analysis solutions. Provided the social media stream is timely and correctly processed, corrective action can be taken. The construction of a natural language processing social media annotation pipe identifies latent indicators extracted from a social media corpus and is an integral part of societal mishap prediction. Spatial visualizations and regression analyses were used to describe and evaluate project artifacts. As a result, a social media corpus was operationalized, and subsequently used as a proxy for a traditional environmental criminology risk layer in construction of a social media GIS crime analysis artifact. Using such multi-domain collaboration, the artifact was able to increase the predictive crime incident outcome with an overall R-squared increase of 21.94%. This result is the state-of-the-art; there are no other results to compare it to

ScholarWorks@UMass Amherst

Social Media Operationalized for GIS: The Prequel

Author: Alsudais Abdulkareem
Corso Anthony
Publication venue: AIS Electronic Library (AISeL)
Publication date: 10/08/2017
Field of study

With social media a de facto global communication channel used to disseminate news, entertainment, and one’s self-revelations, the latter contains double-talk, peculiar insight, and contextual observation about real-world events. The primary objective is to propose a novel pipeline to classify a tweet as either “useful” or “not useful” by using widely-accepted Natural Language Processing (NLP) techniques, and measure the effect of such method based on the change in performance of a Geographical Information System (GIS) artifact. A 1,000 tweet sample is manually tagged and compared to an innovative social media grammar applied by a rule-based social media NLP pipeline. Evaluation underpins answering, prior to content analysis of a tweet, does a method exist to support identifying a tweet as “useful” for subsequent processing? Indeed, “useful” tweet identification via NLP returned precision of 0.9256, recall of 0.6590, and F-measure of 0.7699; consequently GIS social media processing increased 0.2194 over baseline

AIS Electronic Library (AISeL)

The challenging task of summary evaluation: an overview

Author: Aker Ahmet
Lloret Elena
Plaza Morales Laura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Evaluation is crucial in the research and development of automatic summarization applications, in order to determine the appropriateness of a summary based on different criteria, such as the content it contains, and the way it is presented. To perform an adequate evaluation is of great relevance to ensure that automatic summaries can be useful for the context and/or application they are generated for. To this end, researchers must be aware of the evaluation metrics, approaches, and datasets that are available, in order to decide which of them would be the most suitable to use, or to be able to propose new ones, overcoming the possible limitations that existing methods may present. In this article, a critical and historical analysis of evaluation metrics, methods, and datasets for automatic summarization systems is presented, where the strengths and weaknesses of evaluation efforts are discussed and the major challenges to solve are identified. Therefore, a clear up-to-date overview of the evolution and progress of summarization evaluation is provided, giving the reader useful insights into the past, present and latest trends in the automatic evaluation of summaries.This research is partially funded by the European Commission under the Seventh (FP7 - 2007- 2013) Framework Programme for Research and Technological Development through the SAM (FP7-611312) project; by the Spanish Government through the projects VoxPopuli (TIN2013-47090-C3-1-P) and Vemodalen (TIN2015-71785-R), the Generalitat Valenciana through project DIIM2.0 (PROMETEOII/2014/001), and the Universidad Nacional de Educación a Distancia through the project “Modelado y síntesis automática de opiniones de usuario en redes sociales” (2014-001-UNED-PROY)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas