12,631 research outputs found
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
Analysis and Extraction of Tempo-Spatial Events in an Efficient Archival CDN with Emphasis on Telegram
This paper presents an efficient archival framework for exploring and
tracking cyberspace large-scale data called Tempo-Spatial Content Delivery
Network (TS-CDN). Social media data streams are renewing in time and spatial
dimensions. Various types of websites and social networks (i.e., channels,
groups, pages, etc.) are considered spatial in cyberspace. Accurate analysis
entails encompassing the bulk of data. In TS-CDN by applying the hash function
on big data an efficient content delivery network is created. Using hash
function rebuffs data redundancy and leads to conclude unique data archive in
large-scale. This framework based on entered query allows for apparent
monitoring and exploring data in tempo-spatial dimension based on TF-IDF score.
Also by conformance from i18n standard, the Unicode problem has been dissolved.
For evaluation of TS-CDN framework, a dataset from Telegram news channels from
March 23, 2020 (1399-01-01), to September 21, 2020 (1399-06-31) on topics
including Coronavirus (COVID-19), vaccine, school reopening, flood, earthquake,
justice shares, petroleum, and quarantine exploited. By applying hash on
Telegram dataset in the mentioned time interval, a significant reduction in
media files such as 39.8% for videos (from 79.5 GB to 47.8 GB), and 10% for
images (from 4 GB to 3.6 GB) occurred. TS-CDN infrastructure in a web-based
approach has been presented as a service-oriented system. Experiments conducted
on enormous time series data, including different spatial dimensions (i.e.,
Khabare Fouri, Khabarhaye Fouri, Akhbare Rouze Iran, and Akhbare Rasmi Telegram
news channels), demonstrate the efficiency and applicability of the implemented
TS-CDN framework
- …