Search CORE

357 research outputs found

Extracting News Events from Microblogs

Author: Ramampiaro Heri
Repp Øystein
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Temporal Information Models for Real-Time Microblog Search

Author: Martins Flávio Nuno Fernandes
Publication venue
Publication date: 01/01/2018
Field of study

Real-time search in Twitter and other social media services is often biased towards the most recent results due to the “in the moment” nature of topic trends and their ephemeral relevance to users and media in general. However, “in the moment”, it is often difficult to look at all emerging topics and single-out the important ones from the rest of the social media chatter. This thesis proposes to leverage on external sources to estimate the duration and burstiness of live Twitter topics. It extends preliminary research where itwas shown that temporal re-ranking using external sources could indeed improve the accuracy of results. To further explore this topic we pursued three significant novel approaches: (1) multi-source information analysis that explores behavioral dynamics of users, such as Wikipedia live edits and page view streams, to detect topic trends and estimate the topic interest over time; (2) efficient methods for federated query expansion towards the improvement of query meaning; and (3) exploiting multiple sources towards the detection of temporal query intent. It differs from past approaches in the sense that it will work over real-time queries, leveraging on live user-generated content. This approach contrasts with previous methods that require an offline preprocessing step

Repositório da Universidade Nova de Lisboa

Sentiment analysis in microblogging: a practical implementation

Author: Cohen Mauro
Damiani Pablo
Durandeu Sebastián
Fernández Enrique
Merlino Hernán
Navas Renzo
Publication venue
Publication date: 01/10/2011
Field of study

This paper presents a system that can take short messages relevant to a particular topic from a microblogging service such as Twitter or Facebook, analyze the messages for the sentiments they carry on, and classify them. In particular, the system addresses this problem by retrieving raw data from Twitter - one of the most popular microblogging platforms - pre-processing on that raw data, and finally analyzing it using machine learning techniques to classify them by sentiment as either positive or negativePresentado en el XII Workshop Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

A Time-Aware Approach to Improving Ad-hoc Information Retrieval from Microblogs

Author: Amin Nayeri Zahra
Publication venue
Publication date: 09/07/2014
Field of study

There is an immense number of short-text documents produced as the result of microblogging. The content produced is growing as the number of microbloggers grows, and as active microbloggers continue to post millions of updates. The range of topics discussed is so vast, that microblogs provide an abundance of useful information. In this work, the problem of retrieving the most relevant information in microblogs is addressed. Interesting temporal patterns were found in the initial analysis of the study. Therefore the focus of the current work is to first exploit a temporal variable in order to see how effectively it can be used to predict the relevance of the tweets and, then, to include it in a retrieval weighting model along with other tweet-specific features. Generalized Linear Mixed-effect Models (GLMMs) are used to analyze the features and to propose two re-ranking models. These two models were developed through an exploratory process on a training set and then were evaluated on a test set

YorkSpace

Twitmo: A Twitter Data Topic Modeling and Visualization Package for R

Author: Buchmüller Andreas
Kant Gillian
Kis-Katos Krisztina
Kneib Thomas
Säfken Benjamin
Weisser Christoph
Publication venue
Publication date: 08/07/2022
Field of study

We present Twitmo, a package that provides a broad range of methods to collect, pre-process, analyze and visualize geo-tagged Twitter data. Twitmo enables the user to collect geo-tagged Tweets from Twitter and and provides a comprehensive and user-friendly toolbox to generate topic distributions from Latent Dirichlet Allocations (LDA), correlated topic models (CTM) and structural topic models (STM). Functions are included for pre-processing of text, model building and prediction. In addition, one of the innovations of the package is the automatic pooling of Tweets into longer pseudo-documents using hashtags and cosine similarities for better topic coherence. The package additionally comes with functionality to visualize collected data sets and fitted models in static as well as interactive ways and offers built-in support for model visualizations via LDAvis providing great convenience for researchers in this area. The Twitmo package is an innovative toolbox that can be used to analyze public discourse of various topics, political parties or persons of interest in space and time.Comment: 16 pages, 4 figure

arXiv.org e-Print Archive

Microblog retrieval challenges and opportunities

Author: Rodriguez Perez Jesus Alberto
Publication venue
Publication date: 01/01/2018
Field of study

In recent years microblogging services have changed the way we communicate. Microblogs are a reduced version of web-blogs which are characterised by being just a few characters long. In the case of Twitter, messages known as \textit{tweets} are only 140 characters long, and are broadcasted from followees to followers organised as a social network. Microblogs such as tweets, are used to communicate up to the second information about any topic. Traffic updates, natural disaster reports, self-promotion, or product marketing are only a small portion of the type of information we can find across microblogging services. Most importantly, it has become a platform that has democratised the communication channels and empowered people into voicing their opinions. In fact, it is a very well known fact that the use Twitter amongst other social media services tilted the balance in favour of ex-president Obama when he was elected president of the USA in 2012. However, whilst the widespread use of microblogs has undoubtedly changed and shaped our current society, it is still very hard to effectively perform simple searches on such datasets due to the particular morphology of its documents. The limited character count and the ineffectiveness of state of the art retrieval models in producing relevant documents for queries, thus prompted TREC organisers to unite the research community into addressing these issues in 2011 during the first Microblog 2011 Track. This doctoral work is one of such efforts, and its focused on improving the access to microblog documents through ad-hoc searches. The first part of our work individually studies the behaviour of the state of the art retrieval models when utilised for microblog ad-hoc retrieval. First we contribute with the best configurations for each of the models studied. But more importantly, we discover how query term frequency and document length relates to the relevance of microblogs. As a result, we propose a microblog specific retrieval model, namely MBRM, which significantly outperforms the state of the art retrieval models described in this work. Furthermore we define an informativeness hypothesis in order to better understand the relevance of microblogs in terms of the presence of their inherent features or dimensions. We significantly improve the behaviour of a state of the art retrieval model by taking into consideration these dimensions as features into a linear combination re-ranking approach. Additionally we investigate the role that structure plays in determining the relevance of a microblog, by encoding the structure of relevant and non-relevant documents into two separate state machines. We then devise an approach to measure the similarity of an unobserved document towards each of these state machines, to then produce a score which is utilised for ranking. Our evaluation results demonstrate how the structure of microblogs plays a role in further differentiating relevant and non-relevant documents when ranking, by showing significantly improved results over a state of the art baseline. Subsequently we study the query performance prediction (QPP) task in terms of microblog ad-hoc retrieval. QPP represents the prediction of how well a query will be satisfied by a particular retrieval system. We study the performance of predictors in the context of microblogs and propose a number of microblog specific predictors. Finally our experimental evaluation demonstrates how our predictors outperform those in the literature in the microblog context. Finally, we address the ``vocabulary mismatch'' problem by studying the effect of utilising scores produced retrieval models as an ingredient in automatic query expansion (AQE) approaches based on pseudo relevance feedback . To this end we propose alternative approaches which do not rely directly on such scores and demonstrate higher stability when determining the most optimal terms for query expansion. In addition we propose an approach to estimate the quality of a term for query expansion. To this end we employ a classifier to determine whether a prospective query expansion term falls into a low, medium or high value category. The predictions performed by the classifier are then utilised to determine a boosting factor for such terms within an AQE approach. Then we conclude by proving that it is possible to predict the quality of terms by providing statistically enhanced results over an AQE baseline

Glasgow Theses Service