Search CORE

246 research outputs found

Document Clustering with Bursty Information

Author: Chaoji Vineet
Hoonlor Apirak
Szymanski Bolesław K.
Zaki Mohamed J.
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2013
Field of study

Nowadays, almost all text corpora, such as blogs, emails and RSS feeds, are a collection of text streams. The traditional vector space model (VSM), or bag-of-words representation, cannot capture the temporal aspect of these text streams. So far, only a few bursty features have been proposed to create text representations with temporal modeling for the text streams. We propose bursty feature representations that perform better than VSM on various text mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we propose a novel framework to generate bursty distance measure. We evaluated it on UPGMA, Star and K-Medoids clustering algorithms. The bursty distance measure did not only perform equally well on various text collections, but it was also able to cluster the news articles related to specific events much better than other models

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Service quality monitoring in confined spaces through mining Twitter data

Author: Naghizade Elham
Rahimi Mohammad Masoud
Stevenson Mark
Winter Stephan
Publication venue: DigitalCommons@UMaine
Publication date: 14/07/2021
Field of study

Promoting public transport depends on adapting effective tools for concurrent monitoring of perceived service quality. Social media feeds, in general, provide an opportunity to ubiquitously look for service quality events, but when applied to confined geographic area such as a transport node, the sparsity of concurrent social media data leads to two major challenges. Both the limited number of social media messages--leading to biased machine-learning--and the capturing of bursty events in the study period considerably reduce the effectiveness of general event detection methods. In contrast to previous work and to face these challenges, this paper presents a hybrid solution based on a novel fine-tuned BERT language model and aspect-based sentiment analysis. BERT enables extracting aspects from a limited context, where traditional methods such as topic modeling and word embedding fail. Moreover, leveraging aspect-based sentiment analysis improves the sensitivity of event detection. Finally, the efficacy of event detection is further improved by proposing a statistical approach to combine frequency-based and sentiment-based solutions. Experiments on a real-world case study demonstrate that the proposed solution improves the effectiveness of event detection compared to state-of-the-art approaches

University of Maine

Real-time bursty topic detection and virality forecasting in microblogs

Author: XIE Wei
Publication venue: Singapore Management University
Publication date: 01/07/2017
Field of study

Institutional Knowledge at Singapore Management University

Enhanced Heartbeat Graph for emerging event detection on Twitter using time series networks

Author: Abbasi RA
Maqbool O
Razzak I
Sadaf A
Saeed Z
Xu G
Publication venue: 'Elsevier BV'
Publication date: 01/12/2019
Field of study

© 2019 Elsevier Ltd With increasing popularity of social media, Twitter has become one of the leading platforms to report events in real-time. Detecting events from Twitter stream requires complex techniques. Event-related trending topics consist of a group of words which successfully detect and identify events. Event detection techniques must be scalable and robust, so that they can deal with the huge volume and noise associated with social media. Existing event detection methods mostly rely on burstiness, mainly the frequency of words and their co-occurrences. However, burstiness sometimes dominates other relevant details in the data which could be equally significant. Besides, the topological and temporal relationships in the data are often ignored. In this work, we propose a novel graph-based approach, called the Enhanced Heartbeat Graph (EHG), which detects events efficiently. EHG suppresses dominating topics in the subsequent data stream, after their first detection. Experimental results on three real-world datasets (i.e., Football Association Challenge Cup Final, Super Tuesday, and the US Election 2012) show superior performance of the proposed approach in comparison to the state-of-the-art techniques

Deakin Research Online

OPUS - University of Technology Sydney