922 research outputs found

    Event detection and user interest discovering in social media data streams

    Get PDF
    Social media plays an increasingly important role in people’s life. Microblogging is a form of social media which allows people to share and disseminate real-life events. Broadcasting events in microblogging networks can be an effective method of creating awareness, divulging important information and so on. However, many existing approaches at dissecting the information content primarily discuss the event detection model and ignore the user interest which can be discovered during event evolution. This leads to difficulty in tracking the most important events as they evolve including identifying the influential spreaders. There is further complication given that the influential spreaders interests will also change during event evolution. The influential spreaders play a key role in event evolution and this has been largely ignored in traditional event detection methods. To this end, we propose a user-interest model based event evolution model, named the HEE (Hot Event Evolution) model. This model not only considers the user interest distribution, but also uses the short text data in the social network to model the posts and the recommend methods to discovering the user interests. This can resolve the problem of data sparsity, as exemplified by many existing event detection methods, and improve the accuracy of event detection. A hot event automatic filtering algorithm is initially applied to remove the influence of general events, improving the quality and efficiency of mining the event. Then an automatic topic clustering algorithm is applied to arrange the short texts into clusters with similar topics. An improved user-interest model is proposed to combine the short texts of each cluster into a long text document simplifying the determination of the overall topic in relation to the interest distribution of each user during the evolution of important events. Finally a novel cosine measure based event similarity detection method is used to assess correlation between events thereby detecting the process of event evolution. The experimental results on a real Twitter dataset demonstrate the efficiency and accuracy of our proposed model for both event detection and user interest discovery during the evolution of hot events.N/

    Time Aware Knowledge Extraction for Microblog Summarization on Twitter

    Full text link
    Microblogging services like Twitter and Facebook collect millions of user generated content every moment about trending news, occurring events, and so on. Nevertheless, it is really a nightmare to find information of interest through the huge amount of available posts that are often noise and redundant. In general, social media analytics services have caught increasing attention from both side research and industry. Specifically, the dynamic context of microblogging requires to manage not only meaning of information but also the evolution of knowledge over the timeline. This work defines Time Aware Knowledge Extraction (briefly TAKE) methodology that relies on temporal extension of Fuzzy Formal Concept Analysis. In particular, a microblog summarization algorithm has been defined filtering the concepts organized by TAKE in a time-dependent hierarchy. The algorithm addresses topic-based summarization on Twitter. Besides considering the timing of the concepts, another distinguish feature of the proposed microblog summarization framework is the possibility to have more or less detailed summary, according to the user's needs, with good levels of quality and completeness as highlighted in the experimental results.Comment: 33 pages, 10 figure

    Can we predict a riot? Disruptive event detection using Twitter

    Get PDF
    In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases

    Social Media and Information Overload: Survey Results

    Full text link
    A UK-based online questionnaire investigating aspects of usage of user-generated media (UGM), such as Facebook, LinkedIn and Twitter, attracted 587 participants. Results show a high degree of engagement with social networking media such as Facebook, and a significant engagement with other media such as professional media, microblogs and blogs. Participants who experience information overload are those who engage less frequently with the media, rather than those who have fewer posts to read. Professional users show different behaviours to social users. Microbloggers complain of information overload to the greatest extent. Two thirds of Twitter-users have felt that they receive too many posts, and over half of Twitter-users have felt the need for a tool to filter out the irrelevant posts. Generally speaking, participants express satisfaction with the media, though a significant minority express a range of concerns including information overload and privacy

    Comparison of Clustering Algorithms for the Identification of Topics on Twitter

    Get PDF
    Topic Identification in Social Networks has become an important task when dealing with event detection, particularly when global communities are affected. In order to attack this problem, text processing techniques and machine learning algorithms have been extensively used. In this paper we compare four clustering algorithms – k-means, k-medoids, DBSCAN and NMF (Non-negative Matrix Factorization) – in order to detect topics related to textual messages obtained from Twitter. The algorithms were applied to a database initially composed by tweets having hashtags related to the recent Nepal earthquake as initial context. Obtained results suggest that the NMF clustering algorithm presents superior results, providing simpler clusters that are also easier to interpret. &nbsp

    A Semantic Similarity Approach for Linking Tweet Messages to Library of Congress Subject Headings using Linked Resources: A Pilot Study

    Get PDF
    The objective of this study is to propose, implement, and test a framework of assigning relevant Library of Congress (LC) subject headings to tweet messages. In this study, the task of assigning LC headings is considered an automatic classification task that identifies relevant LC subject headings for given tweets. The classification task is conducted in two stages. In the first stage, tweets are clustered so that similar tweets are grouped together. In the second stage, the degree of similarity between a cluster of tweets and LC subject headings is measured by a popular similarity metric, Jaccard Coefficient (JC). In this pilot study, five selected tweet clusters and nine LC subject headings were carefully chosen and used. This pilot study demonstrates a positive result forthe proposed approach of identifying subject headings for tweets. In three cluster cases out of the five, JC selected the most relevant headings as the largest degrees of similarity. For the other two cases, JC was not successful in ranking the most relevant within the top three headings. In the next step, a more sophisticated clustering method will be explored and applied. Also, all possible LC subject headings will be employed to identify LC subjects for tweets in the next steps of this study
    • …
    corecore