11 research outputs found

    Social event detection with retweeting behavior correlation

    Get PDF
    Event detection over microblogs has attracted great research interest due to its wide application in crisis management and decision making etc. In natural disasters, complex events are reported in real time on social media sites, but these reports are invisible to crisis coordinators. Detecting these crisis events helps watchers to make right decisions rapidly, reducing injuries, deaths and economic loss. In sporting activities, detecting events helps audiences make better and more timely game viewing plans. However, existing event detection techniques are not effective at handling complex social events that evolve over time. In this paper, we propose an event detection method that takes advantage of retweeting behavior for handling the events evolution. Specifically, we first propose a topic model called RL-LDA to capture the social media information over hashtag, location, textual and retweeting behavior. Using RL-LDA, a complex event can be well handled by exploring the correlation between retweeting behavior and the event. Then to maintain the RL-LDA in a dynamic environment, we propose a dynamic update algorithm, which incrementally updates events over real time streams. Experiments over real-world datasets show that RL-LDA detects the temporal evolution of complex events effectively and efficiently

    A probabilistic method for emerging topic tracking in Microblog stream

    Get PDF

    MANTRA: A Topic Modeling-Based Tool to Support Automated Trend Analysis on Unstructured Social Media Data

    Get PDF
    The early identification of new and auspicious ideas leads to competitive advantages for companies. Thereby, topic modeling can serve as an effective analytical approach for the automated investigation of trends from unstructured social media data. However, existing trend analysis tools do not meet the requirements regarding (a) Product Development, (b) Customer Behavior Analysis, and (c) Market-/Brand-Monitoring as reflected within extant literature. Thus, based on the requirements for each of these common marketing-related use cases, we derived design principles following design science research and instantiated the artifact “MANTRA” (MArketiNg TRend Analysis). We demonstrated MANTRA on a real-world data set (~1.03 million Yelp reviews) and hereby could confirm remarkable trends of vegan and global cuisine. In particular, the importance of meeting all specific requirements of the respective use cases and especially flexibly incorporating several external parameters into the trend analysis is exemplified


    Get PDF

    An enhanced binary bat and Markov clustering algorithms to improve event detection for heterogeneous news text documents

    Get PDF
    Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the large volume of published heterogeneous news text documents. Such documents create a high-dimensional feature space that influences the overall performance of the baseline methods in ED model. To address such a problem, this research presents an enhanced ED model that includes improved methods for the crucial phases of the ED model such as Feature Selection (FS), ED, and summarization. This work focuses on the FS problem by automatically detecting events through a novel wrapper FS method based on Adapted Binary Bat Algorithm (ABBA) and Adapted Markov Clustering Algorithm (AMCL), termed ABBA-AMCL. These adaptive techniques were developed to overcome the premature convergence in BBA and fast convergence rate in MCL. Furthermore, this study proposes four summarizing methods to generate informative summaries. The enhanced ED model was tested on 10 benchmark datasets and 2 Facebook news datasets. The effectiveness of ABBA-AMCL was compared to 8 FS methods based on meta-heuristic algorithms and 6 graph-based ED methods. The empirical and statistical results proved that ABBAAMCL surpassed other methods on most datasets. The key representative features demonstrated that ABBA-AMCL method successfully detects real-world events from Facebook news datasets with 0.96 Precision and 1 Recall for dataset 11, while for dataset 12, the Precision is 1 and Recall is 0.76. To conclude, the novel ABBA-AMCL presented in this research has successfully bridged the research gap and resolved the curse of high dimensionality feature space for heterogeneous news text documents. Hence, the enhanced ED model can organize news documents into distinct events and provide policymakers with valuable information for decision making

    Topic models for short text data

    Get PDF
    Topic models are known to suffer from sparsity when applied to short text data. The problem is caused by a reduced number of observations available for a reliable inference (i.e.: the words in a document). A popular heuristic utilized to overcome this problem is to perform before training some form of document aggregation by context (e.g.: author, hashtag). We dedicated one part of this dissertation to modeling explicitly the implicit assumptions of the document aggregation heuristic and applying it to two well known model architectures: a mixture and an admixture. Our findings indicate that an admixture model benefits more from aggregation compared to a mixture model which rarely improved over its baseline (the standard mixture). We also find that the state of the art in short text data can be surpassed as long as every context is shared by a small number of documents. In the second part of the dissertation we develop a more general purpose topic model which can also be used when contextual information is not available. The proposed model is formulated around the observation that in normal text data, a classic topic model like an admixture works well because patterns of word co-occurrences arise across the documents. However, the possibility of such patterns to arise in a short text dataset is reduced. The model assumes every document is a bag of word co-occurrences, where each co-occurrence belongs to a latent topic. The documents are enhanced a priori with related co-occurrences from the other documents, such that the collection will have a greater chance of exhibiting word patterns. The proposed model performs well managing to surpass the state of the art and popular topic model baselines

    A Probabilistic Model for Bursty Topic Discovery in Microblogs

    No full text
    Bursty topics discovery in microblogs is important for people to grasp essential and valuable information. However, the task is challenging since microblog posts are particularly short and noisy. This work develops a novel probabilistic model, namely Bursty Biterm Topic Model (BBTM), to deal with the task. BBTM extends the Biterm Topic Model (BTM) by incorporating the burstiness of biterms as prior knowledge for bursty topic modeling, which enjoys the following merits: 1) It can well solve the data sparsity problem in topic modeling over short texts as the same as BTM; 2) It can automatical discover high quality bursty topics in microblogs in a principled and efficient way. Extensive experiments on a standard Twitter dataset show that our approach outperforms the state-of-the-art baselines significantly