10,781 research outputs found

    Extracting News Events from Microblogs

    Full text link
    Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance

    Discovering Power Laws in Entity Length

    Full text link
    This paper presents a discovery that the length of the entities in various datasets follows a family of scale-free power law distributions. The concept of entity here broadly includes the named entity, entity mention, time expression, aspect term, and domain-specific entity that are well investigated in natural language processing and related areas. The entity length denotes the number of words in an entity. The power law distributions in entity length possess the scale-free property and have well-defined means and finite variances. We explain the phenomenon of power laws in entity length by the principle of least effort in communication and the preferential mechanism

    Extracting semantic entities and events from sports tweets

    Get PDF
    Large volumes of user-generated content on practically every major issue and event are being created on the microblogging site Twitter. This content can be combined and processed to detect events, entities and popular moods to feed various knowledge-intensive practical applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. In this paper, we exploit various approaches to detect the named entities and significant micro-events from users’ tweets during a live sports event. Here we describe how combining linguistic features with background knowledge and the use of Twitter-specific features can achieve high, precise detection results (f-measure = 87%) in different datasets. A study was conducted on tweets from cricket matches in the ICC World Cup in order to augment the event-related non-textual media with collective intelligence

    Discovering conversational topics and emotions associated with Demonetization tweets in India

    Full text link
    Social media platforms contain great wealth of information which provides us opportunities explore hidden patterns or unknown correlations, and understand people's satisfaction with what they are discussing. As one showcase, in this paper, we summarize the data set of Twitter messages related to recent demonetization of all Rs. 500 and Rs. 1000 notes in India and explore insights from Twitter's data. Our proposed system automatically extracts the popular latent topics in conversations regarding demonetization discussed in Twitter via the Latent Dirichlet Allocation (LDA) based topic model and also identifies the correlated topics across different categories. Additionally, it also discovers people's opinions expressed through their tweets related to the event under consideration via the emotion analyzer. The system also employs an intuitive and informative visualization to show the uncovered insight. Furthermore, we use an evaluation measure, Normalized Mutual Information (NMI), to select the best LDA models. The obtained LDA results show that the tool can be effectively used to extract discussion topics and summarize them for further manual analysis.Comment: 6 pages, 11 figures. arXiv admin note: substantial text overlap with arXiv:1608.02519 by other authors; text overlap with arXiv:1705.08094 by other author
    • …
    corecore