Search CORE

635 research outputs found

Tweet Contextualization Based on Wikipedia and Dbpedia

Author: Berrut Catherine
Latiri Chiraz
Mulhem Philippe
Slimani Yahya
Zingla Meriem Amina
Publication venue: HAL CCSD
Publication date: 09/03/2016
Field of study

National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection

Hal - Université Grenoble Alpes

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Author: Anantharam Pramod
Anantharam Pramod
Balasuriya Lakshika
Ferrucci David
Kimmig Angelika
McMahon Connor
Meng Lingling
Perera Sujan
Sheth Amit
Wijeratne Sanjaya
Wijeratne Sanjaya
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

Active learning in annotating micro-blogs dealing with e-reputation

Author: Cossu Jean-Valère
Molina-Villegas Alejandro
Tello-Signoret Mariana
Publication venue
Publication date: 25/09/2017
Field of study

Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

arXiv.org e-Print Archive

Episciences.org

INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

Author: Bellot Patrice
Juan Eric San
Moriceau Véronique
Mothe Josiane
SanJuan Eric
Tannier Xavier
Publication venue: Elsevier
Publication date: 01/03/2016
Field of study

Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

ZENODO

Open Archive Toulouse Archive Ouverte

HAL Descartes

Corporate image or social engagement: Twitter discourse on corporate social responsibility (CSR) in public relations strategies in the energy sector

Author: Nakayama Atsuho
Paliwoda-Matiolanska Adriana
Smolak-Lozano Emilia
Publication venue
Publication date: 22/06/2020
Field of study

Social media have opened up new opportunities for the creation of innovative public relations strategies focused on establishing and cultivating relationships with stakeholders on the basis of meaningful dialogue. Consideration of the interrelation between corporate social responsibility (CSR) and public relations highlights new areas for exploration and engagement. Both the dialogical and semantic perspectives reveal the performative and conversational aspects of social media. In general, both the linguistic panorama of CSR and digital media as part of a PR strategy open new possibilities for a dialogical, interactive, meaningful relationship strategy for corporate image management. Based on the linguistic approach to CSR and the Communication Management Approach, this paper explores the linguistic use of Twitter as a primary dialogical strategy to effectively enhance interactive dialogue-based relationships with the stakeholders of the top 50 companies in the energy sector based on tweet data from 2016. Semantic analysis was conducted by advanced text mining and clustering techniques on 3042 tweets monitored in 2017 that contained the leading CSR-related hashtags and keywords. The results demonstrated that the top energy companies apply a defensive and symbolic perspective, mainly for branding purposes. The corporate discourse dominates over a meaningful conversational strategy to foster interaction with stakeholders around sustainability issues on Twitter. The study reveals a homogenized interrelation between CSR, social media, and public relations. The results reveal a tendency for isomorphy in the communication models applied by the companies in the energy sector. Furthermore, similarities in semantics and thus strong tendencies to mutually mimic dialogical strategies are also observed. The semantic narrative built around the brand indicates a limited orientation towards CSR and sustainability. As such, it does not contribute to the creation of a dialogical interaction and meaningful relationships with multiple stakeholders on Twitter, in the high-risk sector represented by the energy industry

Scipedia

LIA@CLEF 2018: Mining events opinion argumentation from raw unlabeled Twitter data using convolutional neural network

Author: Delorme Alexandre
Dufour Richard
Malinas Damien
Rouvier Mickael
Publication venue: HAL CCSD
Publication date: 10/09/2018
Field of study

International audienceSocial networks on the Internet are becoming increasingly important in our society. In recent years, this type of media, through communication platforms such as Twitter, has brought new research issues due to the massive size of data exchanged and the important number of ever-increasing users. In this context, the CLEF 2018 Mining opinion argumentation task aims to retrieve, for a specific event (festival name or topic), the most diverse argumentative microblogs from a large collection of tweets about festivals in different languages. In this paper, we propose a four-step approach for extracting argumentative microblogs related to a specific query (or event) while no reference data is provided

Recommended from our members

GIS Investigation of Crime Prediction with an Operationalized Tweet Corpus

Author: Alsudais Abdulkareem
Corso Anthony J.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 22/09/2017
Field of study

Social media as the de facto communication channel is being used to disseminate one’s diurnal self-revelations. This profound discovery often contains double-talk, peculiar insights, or contextual information about real-world events. Natural language processing is regularly used to uncover both obvious and latent knowledge claims within disclosures published amid the complex environment. For example, a perpetrator with first-hand knowledge of their criminal incident uses social media to post critical information about it. A geographic information system (GIS) is capable of large-scale point data analysis and possesses methods that enable dataset processing, evaluation, and automatic spatial visualization. Such an artifact—fused with traditional environmental criminology theory and social media—erects guidelines, tools, and models for substantive construction and evaluation of GIS crime analysis solutions. Provided the social media stream is timely and correctly processed, corrective action can be taken. The construction of a natural language processing social media annotation pipe identifies latent indicators extracted from a social media corpus and is an integral part of societal mishap prediction. Spatial visualizations and regression analyses were used to describe and evaluate project artifacts. As a result, a social media corpus was operationalized, and subsequently used as a proxy for a traditional environmental criminology risk layer in construction of a social media GIS crime analysis artifact. Using such multi-domain collaboration, the artifact was able to increase the predictive crime incident outcome with an overall R-squared increase of 21.94%. This result is the state-of-the-art; there are no other results to compare it to

ScholarWorks@UMass Amherst

Social Media Operationalized for GIS: The Prequel

Author: Alsudais Abdulkareem
Corso Anthony
Publication venue: AIS Electronic Library (AISeL)
Publication date: 10/08/2017
Field of study

With social media a de facto global communication channel used to disseminate news, entertainment, and one’s self-revelations, the latter contains double-talk, peculiar insight, and contextual observation about real-world events. The primary objective is to propose a novel pipeline to classify a tweet as either “useful” or “not useful” by using widely-accepted Natural Language Processing (NLP) techniques, and measure the effect of such method based on the change in performance of a Geographical Information System (GIS) artifact. A 1,000 tweet sample is manually tagged and compared to an innovative social media grammar applied by a rule-based social media NLP pipeline. Evaluation underpins answering, prior to content analysis of a tweet, does a method exist to support identifying a tweet as “useful” for subsequent processing? Indeed, “useful” tweet identification via NLP returned precision of 0.9256, recall of 0.6590, and F-measure of 0.7699; consequently GIS social media processing increased 0.2194 over baseline

AIS Electronic Library (AISeL)

Dissecting Deep Language Models: The Explainability and Bias Perspective

Author: ATTANASIO GIUSEPPE
Publication venue: country:Italy
Publication date: 04/10/2022
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Analysing and evaluating the task of automatic tweet generation: Knowledge to business

Author: Lloret Elena
Palomar Manuel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

In this paper a study concerning the evaluation and analysis of natural language tweets is presented. Based on our experience in text summarisation, we carry out a deep analysis on user's perception through the evaluation of tweets manual and automatically generated from news. Specifically, we consider two key issues of a tweet: its informativeness and its interestingness. Therefore, we analyse: (1) do users equally perceive manual and automatic tweets?; (2) what linguistic features a good tweet may have to be interesting, as well as informative? The main challenge of this proposal is the analysis of tweets to help companies in their positioning and reputation on the Web. Our results show that: (1) automatically informative and interesting natural language tweets can be generated as a result of summarisation approaches; and (2) we can characterise good and bad tweets based on specific linguistic features not present in other types of tweets.This research work has been partially funded by the University of Alicante, Generalitat Valenciana, Spanish Government and the European Commission through the projects, “Tratamiento inteligente de la información para la ayuda a la toma de decisiones” (GRE12-44), “Explotación y tratamiento de la información disponible en Internet para la anotación y generación de textos adaptados al usuario” (GRE13-15), DIIM2.0 (PROMETEOII/2014/001), ATTOS (TIN2012-38536-C03-03), LEGOLANG-UAGE (TIN2012-31224), and SAM (FP7-611312)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas