1,782 research outputs found
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
Performance Comparison of Turkish Web Pages Classification
Nowadays., web page classification is essential for efficient and fast search engines. There is an ever-increasing need for automatic classification techniques with higher classification accuracy. In this article., a performance comparison of existing Turkish language CNN models for web pages classification systems is performed. In more detail., the content of web pages is extracted first., then preprocessing steps that aim to detect the important parts and eliminate useless contents are used. Next., Bert word embedding is integrated to represent the texts by efficient numerical vectors. Finally., three state-of-the-art CNN models that fully support the Turkish language are investigated to find the best classifier. Overall., the three studied models obtained an acceptable performance while classifying the Turkish webpages., however., the third model was able to achieve slightly better than the other two models. © 2021 IEEE
- …