12,631 research outputs found

    Extracting News Events from Microblogs

    Full text link
    Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance

    Analysis and Extraction of Tempo-Spatial Events in an Efficient Archival CDN with Emphasis on Telegram

    Full text link
    This paper presents an efficient archival framework for exploring and tracking cyberspace large-scale data called Tempo-Spatial Content Delivery Network (TS-CDN). Social media data streams are renewing in time and spatial dimensions. Various types of websites and social networks (i.e., channels, groups, pages, etc.) are considered spatial in cyberspace. Accurate analysis entails encompassing the bulk of data. In TS-CDN by applying the hash function on big data an efficient content delivery network is created. Using hash function rebuffs data redundancy and leads to conclude unique data archive in large-scale. This framework based on entered query allows for apparent monitoring and exploring data in tempo-spatial dimension based on TF-IDF score. Also by conformance from i18n standard, the Unicode problem has been dissolved. For evaluation of TS-CDN framework, a dataset from Telegram news channels from March 23, 2020 (1399-01-01), to September 21, 2020 (1399-06-31) on topics including Coronavirus (COVID-19), vaccine, school reopening, flood, earthquake, justice shares, petroleum, and quarantine exploited. By applying hash on Telegram dataset in the mentioned time interval, a significant reduction in media files such as 39.8% for videos (from 79.5 GB to 47.8 GB), and 10% for images (from 4 GB to 3.6 GB) occurred. TS-CDN infrastructure in a web-based approach has been presented as a service-oriented system. Experiments conducted on enormous time series data, including different spatial dimensions (i.e., Khabare Fouri, Khabarhaye Fouri, Akhbare Rouze Iran, and Akhbare Rasmi Telegram news channels), demonstrate the efficiency and applicability of the implemented TS-CDN framework
    • …
    corecore