Search CORE

15 research outputs found

Detecting Real-World Influence Through Twitter

Author: Cossu Jean-Valère
Dugué Nicolas
Labatut Vincent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/07/2015
Field of study

In this paper, we investigate the issue of detecting the real-life influence of people based on their Twitter account. We propose an overview of common Twitter features used to characterize such accounts and their activity, and show that these are inefficient in this context. In particular, retweets and followers numbers, and Klout score are not relevant to our analysis. We thus propose several Machine Learning approaches based on Natural Language Processing and Social Network Analysis to label Twitter users as Influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art ranking methods.Comment: 2nd European Network Intelligence Conference (ENIC), Sep 2015, Karlskrona, Swede

arXiv.org e-Print Archive

Crossref

HAL Descartes

Overview of INEX Tweet Contextualization 2014 track

Author: Bellot Patrice
Moriceau Véronique
Mothe Josiane
Sanjuan Eric
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/09/2014
Field of study

International audience140 characters long messages are rarely self-content. The Tweet Contextualization aims at providing automatically information - a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document sum- marization including entity linking. Running since 2010, the task in 2014 was a slight variant of previous ones considering more complex queries from RepLab 2013. Given a tweet and a related entity, systems had to provide some context about the subject of the tweet from the perspective of the entity, in order to help the reader to understand it

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

Open Archive Toulouse Archive Ouverte

Detecting the reputation polarity of microblog posts

Author: de Rijke M.
Gârbacea C.
Tsagkias M.
Publication venue: 'IOS Press'
Publication date: 01/01/2014
Field of study

International Migration, Integration and Social Cohesion online publications

Analyse de l’image de marque sur le Web 2.0

Author: Cossu Jean-Valère
Publication venue: HAL CCSD
Publication date: 16/12/2015
Field of study

Analyse of entities representation over the Web 2.0Every day, millions of people publish their views on Web 2.0 (social networks,blogs, etc.). These comments focus on subjects as diverse as news, politics,sports scores, consumer objects, etc. The accumulation and agglomerationof these notices on an entity (be it a product, a company or a public entity) givebirth to the brand image of that entity. Internet has become in recent years aprivileged place for the emergence and dissemination of opinions and puttingWeb 2.0 at the head of observatories of opinions. The latter being a means ofaccessing the knowledge of the opinion of the world population.The image is here understood as the idea that a person or a group of peopleis that entity. This idea carries a priori on a particular subject and is onlyvalid in context for a given time. This perceived image is different from theentity initially wanted to broadcast (eg via a communication campaign). Moreover,in reality, there are several images in the end living together in parallel onthe network, each specific to a community and all evolve differently over time(imagine how would be perceived in each camp together two politicians edgesopposite). Finally, in addition to the controversy caused by the voluntary behaviorof some entities to attract attention (think of the declarations required orshocking). It also happens that the dissemination of an image beyond the frameworkthat governed the and sometimes turns against the entity (for example,« marriage for all » became « the demonstration for all »). The views expressedthen are so many clues to understand the logic of construction and evolution ofthese images. The aim is to be able to know what we are talking about and howwe talk with filigree opportunity to know who is speaking.viiIn this thesis we propose to use several simple supervised statistical automaticmethods to monitor entity’s online reputation based on textual contentsmentioning it. More precisely we look the most important contents and theirsauthors (from a reputation manager point-of-view). We introduce an optimizationprocess allowing us to enrich the data using a simulated relevance feedback(without any human involvement). We also compare content contextualizationmethod using information retrieval and automatic summarization methods.Wealso propose a reflection and a new approach to model online reputation, improveand evaluate reputation monitoring methods using Partial Least SquaresPath Modelling (PLS-PM). In designing the system, we wanted to address localand global context of the reputation. That is to say the features can explain thedecision and the correlation betweens topics and reputation. The goal of ourwork was to propose a different way to combine usual methods and featuresthat may render reputation monitoring systems more accurate than the existingones. We evaluate and compare our systems using state of the art frameworks: Imagiweb and RepLab. The performances of our proposals are comparableto the state of the art. In addition, the fact that we provide reputation modelsmake our methods even more attractive for reputation manager or scientistsfrom various fields.Image sur le web : analyse de la dynamique des images sur le Web 2.0. En plus d’être un moyen d’accès à la connaissance, Internet est devenu en quelques années un lieu privilégié pour l’apparition et la diffusion d’opinions.Chaque jour, des millions d’individus publient leurs avis sur le Web 2.0 (réseaux sociaux, blogs, etc.). Ces commentaires portent sur des sujets aussi variés que l’actualité, la politique, les résultats sportifs, biens culturels, des objets de consommation, etc. L’amoncellement et l’agglomération de ces avis publiés sur une entité (qu’il s’agisse d’un produit, une entreprise ou une personnalité publique)donnent naissance à l’image de marque de cette entité.L’image d’une entité est ici comprise comme l’idée qu’une personne ou qu’un groupe de personnes se fait de cette entité. Cette idée porte a priori sur un sujet particulier et n’est valable que dans un contexte, à un instant donné.Cette image perçue est par nature différente de celle que l’entité souhaitait initialement diffuser (par exemple via une campagne de communication). De plus,dans la réalité, il existe au final plusieurs images qui cohabitent en parallèle sur le réseau, chacune propre à une communauté et toutes évoluant différemment au fil du temps (imaginons comment serait perçu dans chaque camp le rapprochement de deux hommes politiques de bords opposés). Enfin, en plus des polémiques volontairement provoquées par le comportement de certaines entités en vue d’attirer l’attention sur elles (pensons aux tenues ou déclarations choquantes), il arrive également que la diffusion d’une image dépasse le cadre qui la régissait et même parfois se retourne contre l’entité (par exemple, «le mariage pour tous» devenu « la manif pour tous »). Les opinions exprimées constituent alors autant d’indices permettant de comprendre la logique de construction et d’évolution de ces images. Ce travail d’analyse est jusqu’à présent confié à des spécialistes de l’e-communication qui monnaient leur subjectivité. Ces derniers ne peuvent considérer qu’un volume restreint d’information et ne sont que rarement d’accord entre eux. Dans cette thèse, nous proposons d’utiliser différentes méthodes automatiques, statistiques, supervisées et d’une faible complexité permettant d’analyser et représenter l’image de marque d’entité à partir de contenus textuels les mentionnant. Plus spécifiquement, nous cherchons à identifier les contenus(ainsi que leurs auteurs) qui sont les plus préjudiciables à l’image de marque d’une entité. Nous introduisons un processus d’optimisation automatique de ces méthodes automatiques permettant d’enrichir les données en utilisant un retour de pertinence simulé (sans qu’aucune action de la part de l’entité concernée ne soit nécessaire). Nous comparer également plusieurs approches de contextualisation de messages courts à partir de méthodes de recherche d’information et de résumé automatique. Nous tirons également parti d’algorithmes de modélisation(tels que la Régression des moindres carrés partiels), dans le cadre d’une modélisation conceptuelle de l’image de marque, pour améliorer nos systèmes automatiques de catégorisation de documents textuels. Ces méthodes de modélisation et notamment les représentations des corrélations entre les différents concepts que nous manipulons nous permettent de représenter d’une part, le contexte thématique d’une requête de l’entité et d’autre, le contexte général de son image de marque. Nous expérimentons l’utilisation et la combinaison de différentes sources d’information générales représentant les grands types d’information auxquels nous sommes confrontés sur internet : de long les contenus objectifs rédigés à des informatives, les contenus brefs générés par les utilisateurs visant à partager des opinions. Nous évaluons nos approches en utilisant deux collections de données, la première est celle constituée dans le cadre du projet Imagiweb, la seconde est la collection de référence sur le sujet : CLEFRepLa

Thèses en Ligne

Active learning in annotating micro-blogs dealing with e-reputation

Author: Cossu Jean-Valère
Molina-Villegas Alejandro
Tello-Signoret Mariana
Publication venue
Publication date: 25/09/2017
Field of study

Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

arXiv.org e-Print Archive

Episciences.org

Generación y estudio de modelos de clasificación de perfiles de usuario de Twitter

Author: Tauste Martí Javier
Publication venue: 'Universitat Jaume I'
Publication date: 01/09/2018
Field of study

Treball final de Màster Universitari en Sistemes Intel.ligents (Pla de 2013). Codi: SIE043. Curs acadèmic 2017-2018Este trabajo se enfoca en “Author Profiling” donde se pretende analizar el contenido en formato texto producido por un usuario, tratando sus textos como datos no estructurados. Suelen ser generados en un lugar virtual, como, por ejemplo: emails, valoraciones de un servicio, conversaciones de mensajería instantánea, etc

Repositori Institucional de la Universitat Jaume I

DESIGN OF PEOPLE PROFILING AND MODELING REPUTATION COMPUTATION BASED ON SENTIMENT ANALYSIS

Author: Ahmad Mafazi Damanhuri
Zhang Huaping
Publication venue: 'Universitas Mercu Buana'
Publication date: 01/01/2019
Field of study

The number of popular people is still growing because of the easiness to access information technology. Every time people upload things and let people watch it and give it a like or comment. People who can impress other people will grow their popularity and fame. Some famous people make influences, help poor people with powers, and others are causing troubles. Community these days drives people perspective by share their thoughts on social media. They spread information and makes others want to see things they are talked about. Troublesome popular people defended by their fan base and attacked by other communities. By these cases, the research tried to gather information on social media and used it for calculation and profiling. The method that proposed to rely on this information is based on sentiment analysis to look up someone’s record and listing them into top 10 best got from DBpedia. This system shows the list of people and contains all important record about that person which can be used for decision support for a policy or rewarding people. The results have successfully visualized the output in the list of people with any further details following by clicking their names

SINERGI

Neliti

Directory of Open Access Journals

Publikasi Universitas Mercu Buana

Tracking public opinion on social media

Author: Crestani Fabio
Giachanou Anastasia
Publication venue
Publication date: 20/12/2018
Field of study

The increasing popularity of social media has changed the web from a static repository of information into a dynamic forum with continuously changing information. Social media platforms has given the capability to people expressing and sharing their thoughts and opinions on the web in a very simple way. The so-called User Generated Content is a good source of users opinion and mining it can be very useful for a wide variety of applications that require understanding the public opinion about a concept. For example, enterprises can capture the negative or positive opinions of customers about their services or products and improve their quality accordingly. The dynamic nature of social media with the constantly changing vocabulary, makes developing tools that can automatically track public opinion a challenge. To help users better understand public opinion towards an entity or a topic, it is important to: a) find the related documents and the sentiment polarity expressed in them; b) identify the important time intervals where there is a change in the opinion; c) identify the causes of the opinion change; d) estimate the number of people that have a certain opinion about the entity; and e) measure the impact of public opinion towards the entity. In this thesis we focus on the problem of tracking public opinion on social media and we propose and develop methods to address the different subproblems. First, we analyse the topical distribution of tweets to determine the number of topics that are discussed in a single tweet. Next, we propose a topic specific stylistic method to retrieve tweets that are relevant to a topic and also express opinion about it. Then, we explore the effectiveness of time series methodologies to track and forecast the evolution of sentiment towards a specific topic over time. In addition, we propose the LDA & KL-divergence approach to extract and rank the likely causes of sentiment spikes. We create a test collection that can be used to evaluate methodologies in ranking the likely reasons of sentiment spikes. To estimate the number of people that have a certain opinion about an entity, we propose an approach that uses pre-publication and post- publication features extracted from news posts and users' comments respectively. Finally, we propose an approach that propagates sentiment signals to measure the impact of public opinion towards the entity's reputation. We evaluate our proposed methods on standard evaluation collections and provide evidence that the proposed methods improve the performance of the state-of-the-art approaches on tracking public opinion on social media

RERO DOC Digital Library

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

Author: Liakata Maria
Procter Rob
Tsakalidis Adam
Voss Alex
Wang Bo
Zubiaga Arkaitz
Publication venue
Publication date: 01/01/2017
Field of study

In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Queen Mary Research Online

University of St. Andrews - Pure

INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

Author: Bellot Patrice
Juan Eric San
Moriceau Véronique
Mothe Josiane
SanJuan Eric
Tannier Xavier
Publication venue: Elsevier
Publication date: 01/03/2016
Field of study

Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

ZENODO

Open Archive Toulouse Archive Ouverte

HAL Descartes