Search CORE

52 research outputs found

Editorial : special issue on social networking for web-based communities

Author: Kommers Petrus A.M.
Simmerling Margriet
Publication venue
Publication date: 01/01/2013
Field of study

University of Twente Research Information

REINA at RepLab2013 Topic Detection Task: Community Detection

Author: Alonso Berrocal José Luis
Figuerola Carlos G.
Zazo Rodríguez Ángel Francisco
Publication venue
Publication date: 23/09/2013
Field of study

[EN]Social networks have become a large repository of comments which can extract multiple information. Twitter is one of the most widespread social networks and larger and is therefore an important source for detecting states of opinion, events and happenings before even the mainstream media. Topic detection is important to discover areas of interest that arise in the tweets. We have used classical systems for a similarity matrix and we have used community detection techniques. The results have been good and allows us to study new possibilities

Gestion del Repositorio Documental de la Universidad de Salamanca

REINA at RepLab2013 Topic Detection Task: Community Detection

Author: Alonso-Berrocal José-Luis
G. Figuerola Carlos
Zazo-Rodríguez Ángel-F.
Publication venue
Publication date: 23/09/2013
Field of study

Social networks have become a large repository of comments which can extract multiple information. Twitter is one of the most widespread social networks and larger and is therefore an important source for detecting states of opinion, events and happenings before even the mainstream media. Topic detection is important to discover areas of interest that arise in the tweets. We have used classical systems for a similarity matrix and we have used community detection techniques. The results have been good and allows us to study new possibilities

Towards an On-Line Analysis of Tweets Processing

Author: Bouillot Flavien
Bringay Sandra
Béchet Nicolas
Poncelet Pascal
Roche Mathieu
Teisseire Maguelonne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

International audienceTweets exchanged over the Internet represent an important source of information, even if their characteristics make them dicult to analyze (a maximum of 140 characters, etc.). In this paper, we define a data warehouse model to analyze large volumes of tweets by proposing measures relevant in the context of knowledge discovery. The use of data warehouses as a tool for the storage and analysis of textual documents is not new but current measures are not well-suited to the specificities of the manipulated data. We also propose a new way for extracting the context of a concept in a hierarchy. Experiments carried out on real data underline the relevance of our proposal

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

HAL-CIRAD

Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach

Author: Favre Cecile
Guille Adrien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2015
Field of study

The ever-growing number of people using Twitter makes it a valuable source of timely information. However, detecting events in Twitter is a difficult task, because tweets that report interesting events are overwhelmed by a large volume of tweets on unrelated topics. Existing methods focus on the textual content of tweets and ignore the social aspect of Twitter. In this paper we propose MABED (i.e. mention-anomaly-based event detection), a novel statistical method that relies solely on tweets and leverages the creation frequency of dynamic links (i.e. mentions) that users insert in tweets to detect significant events and estimate the magnitude of their impact over the crowd. MABED also differs from the literature in that it dynamically estimates the period of time during which each event is discussed, rather than assuming a predefined fixed duration for all events. The experiments we conducted on both English and French Twitter data show that the mention-anomaly-based approach leads to more accurate event detection and improved robustness in presence of noisy Twitter content. Qualitatively speaking, we find that MABED helps with the interpretation of detected events by providing clear textual descriptions and precise temporal descriptions. We also show how MABED can help understanding users' interest. Furthermore, we describe three visualizations designed to favor an efficient exploration of the detected events.Comment: 17 page

arXiv.org e-Print Archive

Crossref

HAL

Hal-Diderot

Text stream to temporal network - A dynamic heartbeat graph to detect emerging events on twitter

Author: Abbasi RA
Razzak MI
Sadaf A
Saeed Z
Xu G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© 2018, Springer International Publishing AG, part of Springer Nature. Huge mounds of data are generated every second on the Internet. People around the globe publish and share information related to real-world events they experience every day. This provides a valuable opportunity to analyze the content of this information to detect real-world happenings, however, it is quite challenging task. In this work, we propose a novel graph-based approach named the Dynamic Heartbeat Graph (DHG) that not only detects the events at an early stage, but also suppresses them in the upcoming adjacent data stream in order to highlight new emerging events. This characteristic makes the proposed method interesting and efficient in finding emerging events and related topics. The experiment results on real-world datasets (i.e. FA Cup Final and Super Tuesday 2012) show a considerable improvement in most cases, while time complexity remains very attractive

Deakin Research Online

OPUS - University of Technology Sydney

Recommended from our members

Mining Newsorthy Topics from Social Media

Author: Corney D.
Goker A. S.
MacFarlane A.
Martin C.
Publication venue
Publication date: 01/01/2013
Field of study

Newsworthy stories are increasingly being shared through social networking platforms such as Twitter and Reddit, and journal-ists now use them to rapidly discover stories and eye-witness accounts. We present a technique that detects “bursts” of phrases on Twitter that is designed for a real-time topic-detection system. We describe a time-dependent variant of the classic tf-idf approach and group together bursty phrases that often appear in the same messages in order to identify emerging topics. We demonstrate our methods by analysing tweets corresponding to events drawn from the worlds of politics and sport. We created a user-centred “ground truth” to evaluate our methods, based on mainstream media accounts of the events. This helps ensure our methods remain practical. We compare several clustering and topic ranking methods to discover the characteristics of news-related collections, and show tha t different strategies are needed to detect emerging topics within them. We show that our methods successfully detect a range of different topics for each event and can retrieve messages (for example, tweets) that represent each topic for the user

City Research Online

Analyse de gazouillis en ligne

Author: Bouillot Baptiste
Bringay Sandra
Béchet Nicolas
Poncelet Pascal
Roche Mathieu
Teisseire Maguelonne
Publication venue: RNTI
Publication date: 08/06/2011
Field of study

National audienceLes tweets échangés sur Internet constituent une source d'information importante même si leurs caractéristiques les rendent difficiles à analyser (140 caractères au maximum, notations abrégées, . . .). Dans cet article, nous définissons un modèle d'entrepôt de données permettant de valoriser et d'analyser de gros volumes de tweets en proposant des mesures pertinentes dans un contexte de découverte de connaissances. L'utilisation des entrepôts de données comme outil de stockage et d'analyse de documents textuels n'est pas nouvelle mais les mesures ne sont pas adaptées aux spécificités des données manipulées. Les résultats des expérimentations sur des données réelles soulignent la pertinence de notre proposition. / Exchanged tweets on the Internet are an important information source, even if their characteristics make them difficult to analyze (a maximum of 140 characters, shorthand notations, ...). In this paper, we define a model of data warehouse to develop and analyze large volumes of tweets by proposing relevant measures in a knowledge discovery context. Using data warehouses in order to store and analyze textual documents is not new. Traditionally they adapt classical measures which are not really adapted to the data specificities. Furthermore we propose that, if a hierarchy is available, we can automatically detect the context. Conducted experiments on real data show the relevance of our approach

HAL - Normandie Université

HAL-CIRAD

EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

Author: A Bruns
AM Azmi
BS Wasike
D Bodoff
D Elsweiler
Hind Almerekhi
J Benhardus
JL Fleiss
JR Landis
K Darwish
M Efron
M Rowe
M Sanderson
Maram Hasanain
Mucahid Kutlu
Reem Suwaileh
RL Brennan
Tamer Elsayed
W Magdy
Zhang Y
Publication venue
Publication date: 21/08/2017
Field of study

This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

arXiv.org e-Print Archive

Qatar University Institutional Repository

Crossref