15 research outputs found

    Detecting Real-World Influence Through Twitter

    Get PDF
    In this paper, we investigate the issue of detecting the real-life influence of people based on their Twitter account. We propose an overview of common Twitter features used to characterize such accounts and their activity, and show that these are inefficient in this context. In particular, retweets and followers numbers, and Klout score are not relevant to our analysis. We thus propose several Machine Learning approaches based on Natural Language Processing and Social Network Analysis to label Twitter users as Influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art ranking methods.Comment: 2nd European Network Intelligence Conference (ENIC), Sep 2015, Karlskrona, Swede

    Overview of INEX Tweet Contextualization 2014 track

    Get PDF
    International audience140 characters long messages are rarely self-content. The Tweet Contextualization aims at providing automatically information - a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document sum- marization including entity linking. Running since 2010, the task in 2014 was a slight variant of previous ones considering more complex queries from RepLab 2013. Given a tweet and a related entity, systems had to provide some context about the subject of the tweet from the perspective of the entity, in order to help the reader to understand it

    Analyse de l’image de marque sur le Web 2.0

    Get PDF
    Analyse of entities representation over the Web 2.0Every day, millions of people publish their views on Web 2.0 (social networks,blogs, etc.). These comments focus on subjects as diverse as news, politics,sports scores, consumer objects, etc. The accumulation and agglomerationof these notices on an entity (be it a product, a company or a public entity) givebirth to the brand image of that entity. Internet has become in recent years aprivileged place for the emergence and dissemination of opinions and puttingWeb 2.0 at the head of observatories of opinions. The latter being a means ofaccessing the knowledge of the opinion of the world population.The image is here understood as the idea that a person or a group of peopleis that entity. This idea carries a priori on a particular subject and is onlyvalid in context for a given time. This perceived image is different from theentity initially wanted to broadcast (eg via a communication campaign). Moreover,in reality, there are several images in the end living together in parallel onthe network, each specific to a community and all evolve differently over time(imagine how would be perceived in each camp together two politicians edgesopposite). Finally, in addition to the controversy caused by the voluntary behaviorof some entities to attract attention (think of the declarations required orshocking). It also happens that the dissemination of an image beyond the frameworkthat governed the and sometimes turns against the entity (for example,« marriage for all » became « the demonstration for all »). The views expressedthen are so many clues to understand the logic of construction and evolution ofthese images. The aim is to be able to know what we are talking about and howwe talk with filigree opportunity to know who is speaking.viiIn this thesis we propose to use several simple supervised statistical automaticmethods to monitor entity’s online reputation based on textual contentsmentioning it. More precisely we look the most important contents and theirsauthors (from a reputation manager point-of-view). We introduce an optimizationprocess allowing us to enrich the data using a simulated relevance feedback(without any human involvement). We also compare content contextualizationmethod using information retrieval and automatic summarization methods.Wealso propose a reflection and a new approach to model online reputation, improveand evaluate reputation monitoring methods using Partial Least SquaresPath Modelling (PLS-PM). In designing the system, we wanted to address localand global context of the reputation. That is to say the features can explain thedecision and the correlation betweens topics and reputation. The goal of ourwork was to propose a different way to combine usual methods and featuresthat may render reputation monitoring systems more accurate than the existingones. We evaluate and compare our systems using state of the art frameworks: Imagiweb and RepLab. The performances of our proposals are comparableto the state of the art. In addition, the fact that we provide reputation modelsmake our methods even more attractive for reputation manager or scientistsfrom various fields.Image sur le web : analyse de la dynamique des images sur le Web 2.0. En plus d’ĂȘtre un moyen d’accĂšs Ă  la connaissance, Internet est devenu en quelques annĂ©es un lieu privilĂ©giĂ© pour l’apparition et la diffusion d’opinions.Chaque jour, des millions d’individus publient leurs avis sur le Web 2.0 (rĂ©seaux sociaux, blogs, etc.). Ces commentaires portent sur des sujets aussi variĂ©s que l’actualitĂ©, la politique, les rĂ©sultats sportifs, biens culturels, des objets de consommation, etc. L’amoncellement et l’agglomĂ©ration de ces avis publiĂ©s sur une entitĂ© (qu’il s’agisse d’un produit, une entreprise ou une personnalitĂ© publique)donnent naissance Ă  l’image de marque de cette entitĂ©.L’image d’une entitĂ© est ici comprise comme l’idĂ©e qu’une personne ou qu’un groupe de personnes se fait de cette entitĂ©. Cette idĂ©e porte a priori sur un sujet particulier et n’est valable que dans un contexte, Ă  un instant donnĂ©.Cette image perçue est par nature diffĂ©rente de celle que l’entitĂ© souhaitait initialement diffuser (par exemple via une campagne de communication). De plus,dans la rĂ©alitĂ©, il existe au final plusieurs images qui cohabitent en parallĂšle sur le rĂ©seau, chacune propre Ă  une communautĂ© et toutes Ă©voluant diffĂ©remment au fil du temps (imaginons comment serait perçu dans chaque camp le rapprochement de deux hommes politiques de bords opposĂ©s). Enfin, en plus des polĂ©miques volontairement provoquĂ©es par le comportement de certaines entitĂ©s en vue d’attirer l’attention sur elles (pensons aux tenues ou dĂ©clarations choquantes), il arrive Ă©galement que la diffusion d’une image dĂ©passe le cadre qui la rĂ©gissait et mĂȘme parfois se retourne contre l’entitĂ© (par exemple, «le mariage pour tous» devenu « la manif pour tous »). Les opinions exprimĂ©es constituent alors autant d’indices permettant de comprendre la logique de construction et d’évolution de ces images. Ce travail d’analyse est jusqu’à prĂ©sent confiĂ© Ă  des spĂ©cialistes de l’e-communication qui monnaient leur subjectivitĂ©. Ces derniers ne peuvent considĂ©rer qu’un volume restreint d’information et ne sont que rarement d’accord entre eux. Dans cette thĂšse, nous proposons d’utiliser diffĂ©rentes mĂ©thodes automatiques, statistiques, supervisĂ©es et d’une faible complexitĂ© permettant d’analyser et reprĂ©senter l’image de marque d’entitĂ© Ă  partir de contenus textuels les mentionnant. Plus spĂ©cifiquement, nous cherchons Ă  identifier les contenus(ainsi que leurs auteurs) qui sont les plus prĂ©judiciables Ă  l’image de marque d’une entitĂ©. Nous introduisons un processus d’optimisation automatique de ces mĂ©thodes automatiques permettant d’enrichir les donnĂ©es en utilisant un retour de pertinence simulĂ© (sans qu’aucune action de la part de l’entitĂ© concernĂ©e ne soit nĂ©cessaire). Nous comparer Ă©galement plusieurs approches de contextualisation de messages courts Ă  partir de mĂ©thodes de recherche d’information et de rĂ©sumĂ© automatique. Nous tirons Ă©galement parti d’algorithmes de modĂ©lisation(tels que la RĂ©gression des moindres carrĂ©s partiels), dans le cadre d’une modĂ©lisation conceptuelle de l’image de marque, pour amĂ©liorer nos systĂšmes automatiques de catĂ©gorisation de documents textuels. Ces mĂ©thodes de modĂ©lisation et notamment les reprĂ©sentations des corrĂ©lations entre les diffĂ©rents concepts que nous manipulons nous permettent de reprĂ©senter d’une part, le contexte thĂ©matique d’une requĂȘte de l’entitĂ© et d’autre, le contexte gĂ©nĂ©ral de son image de marque. Nous expĂ©rimentons l’utilisation et la combinaison de diffĂ©rentes sources d’information gĂ©nĂ©rales reprĂ©sentant les grands types d’information auxquels nous sommes confrontĂ©s sur internet : de long les contenus objectifs rĂ©digĂ©s Ă  des informatives, les contenus brefs gĂ©nĂ©rĂ©s par les utilisateurs visant Ă  partager des opinions. Nous Ă©valuons nos approches en utilisant deux collections de donnĂ©es, la premiĂšre est celle constituĂ©e dans le cadre du projet Imagiweb, la seconde est la collection de rĂ©fĂ©rence sur le sujet : CLEFRepLa

    Active learning in annotating micro-blogs dealing with e-reputation

    Full text link
    Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

    GeneraciĂłn y estudio de modelos de clasificaciĂłn de perfiles de usuario de Twitter

    Get PDF
    Treball final de Màster Universitari en Sistemes Intel.ligents (Pla de 2013). Codi: SIE043. Curs acadùmic 2017-2018Este trabajo se enfoca en “Author Profiling” donde se pretende analizar el contenido en formato texto producido por un usuario, tratando sus textos como datos no estructurados. Suelen ser generados en un lugar virtual, como, por ejemplo: emails, valoraciones de un servicio, conversaciones de mensajería instantánea, etc

    DESIGN OF PEOPLE PROFILING AND MODELING REPUTATION COMPUTATION BASED ON SENTIMENT ANALYSIS

    Get PDF
    The number of popular people is still growing because of the easiness to access information technology. Every time people upload things and let people watch it and give it a like or comment. People who can impress other people will grow their popularity and fame. Some famous people make influences, help poor people with powers, and others are causing troubles. Community these days drives people perspective by share their thoughts on social media. They spread information and makes others want to see things they are talked about. Troublesome popular people defended by their fan base and attacked by other communities. By these cases, the research tried to gather information on social media and used it for calculation and profiling. The method that proposed to rely on this information is based on sentiment analysis to look up someone’s record and listing them into top 10 best got from DBpedia. This system shows the list of people and contains all important record about that person which can be used for decision support for a policy or rewarding people. The results have successfully visualized the output in the list of people with any further details following by clicking their names

    Tracking public opinion on social media

    Get PDF
    The increasing popularity of social media has changed the web from a static repository of information into a dynamic forum with continuously changing information. Social media platforms has given the capability to people expressing and sharing their thoughts and opinions on the web in a very simple way. The so-called User Generated Content is a good source of users opinion and mining it can be very useful for a wide variety of applications that require understanding the public opinion about a concept. For example, enterprises can capture the negative or positive opinions of customers about their services or products and improve their quality accordingly. The dynamic nature of social media with the constantly changing vocabulary, makes developing tools that can automatically track public opinion a challenge. To help users better understand public opinion towards an entity or a topic, it is important to: a) find the related documents and the sentiment polarity expressed in them; b) identify the important time intervals where there is a change in the opinion; c) identify the causes of the opinion change; d) estimate the number of people that have a certain opinion about the entity; and e) measure the impact of public opinion towards the entity. In this thesis we focus on the problem of tracking public opinion on social media and we propose and develop methods to address the different subproblems. First, we analyse the topical distribution of tweets to determine the number of topics that are discussed in a single tweet. Next, we propose a topic specific stylistic method to retrieve tweets that are relevant to a topic and also express opinion about it. Then, we explore the effectiveness of time series methodologies to track and forecast the evolution of sentiment towards a specific topic over time. In addition, we propose the LDA & KL-divergence approach to extract and rank the likely causes of sentiment spikes. We create a test collection that can be used to evaluate methodologies in ranking the likely reasons of sentiment spikes. To estimate the number of people that have a certain opinion about an entity, we propose an approach that uses pre-publication and post- publication features extracted from news posts and users' comments respectively. Finally, we propose an approach that propagates sentiment signals to measure the impact of public opinion towards the entity's reputation. We evaluate our proposed methods on standard evaluation collections and provide evidence that the proposed methods improve the performance of the state-of-the-art approaches on tracking public opinion on social media

    Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

    Get PDF
    In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

    INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

    Get PDF
    Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task
    corecore