1,612 research outputs found
#Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds
Compounding of natural language units is a very common phenomena. In this
paper, we show, for the first time, that Twitter hashtags which, could be
considered as correlates of such linguistic units, undergo compounding. We
identify reasons for this compounding and propose a prediction model that can
identify with 77.07% accuracy if a pair of hashtags compounding in the near
future (i.e., 2 months after compounding) shall become popular. At longer times
T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This
technique has strong implications to trending hashtag recommendation since
newly formed hashtag compounds can be recommended early, even before the
compounding has taken place. Further, humans can predict compounds with an
overall accuracy of only 48.7% (treated as baseline). Notably, while humans can
discriminate the relatively easier cases, the automatic framework is successful
in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported
Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM
conference on Computer-Supported Cooperative Work and Social Computing (CSCW
2016
A study on Analysis and Utilization of Crowd-sourced Spatio-temporal Contexts from Social Media
兵庫県立大学大学院201
Forecasting User Interests Through Topic Tag Predictions in Online Health Communities
The increasing reliance on online communities for healthcare information by
patients and caregivers has led to the increase in the spread of
misinformation, or subjective, anecdotal and inaccurate or non-specific
recommendations, which, if acted on, could cause serious harm to the patients.
Hence, there is an urgent need to connect users with accurate and tailored
health information in a timely manner to prevent such harm. This paper proposes
an innovative approach to suggesting reliable information to participants in
online communities as they move through different stages in their disease or
treatment. We hypothesize that patients with similar histories of disease
progression or course of treatment would have similar information needs at
comparable stages. Specifically, we pose the problem of predicting topic tags
or keywords that describe the future information needs of users based on their
profiles, traces of their online interactions within the community (past posts,
replies) and the profiles and traces of online interactions of other users with
similar profiles and similar traces of past interaction with the target users.
The result is a variant of the collaborative information filtering or
recommendation system tailored to the needs of users of online health
communities. We report results of our experiments on an expert curated data set
which demonstrate the superiority of the proposed approach over the state of
the art baselines with respect to accurate and timely prediction of topic tags
(and hence information sources of interest).Comment: Healthcare Informatics and NL
Modeling user influence and expertise for news sources in social media
Estágio realizado no SAPO Lab, e orientado pelo Prof. Luís SarmentoTese de mestrado integrado. Engenharia Informática e Computação. Universidade do Porto. Faculdade de Engenharia. 201
Methods for improving entity linking and exploiting social media messages across crises
Entity Linking (EL) is the task of automatically identifying entity mentions in texts and resolving them to a corresponding entity in a reference knowledge base (KB). There is a large number of tools available for different types of documents and domains, however the literature in entity linking has shown the quality of a tool varies across different corpus and depends on specific characteristics of the corpus it is applied to. Moreover the lack
of precision on particularly ambiguous mentions often spoils the usefulness of automated
disambiguation results in real world applications.
In the first part of this thesis I explore an approximation of the difficulty to link entity
mentions and frame it as a supervised classification task. Classifying difficult to disambiguate entity mentions can facilitate identifying critical cases as part of a semi-automated system, while detecting latent corpus characteristics that affect the entity linking performance. Moreover, despiteless the large number of entity linking tools that have been proposed throughout the past years, some tools work better on short mentions while others perform better when there is more contextual information. To this end, I proposed a solution by exploiting results from distinct entity linking tools on the same corpus by leveraging their individual strengths on a per-mention basis. The proposed solution demonstrated to be effective and outperformed the individual entity systems employed in a series of experiments.
An important component in the majority of the entity linking tools is the probability
that a mentions links to one entity in a reference knowledge base, and the computation of this probability is usually done over a static snapshot of a reference KB. However, an entity’s popularity is temporally sensitive and may change due to short term events. Moreover, these changes might be then reflected in a KB and EL tools can produce different results for a given mention at different times. I investigated the prior probability change over time and the overall disambiguation performance using different KB from different time periods. The second part of this thesis is mainly concerned with short texts. Social media has become an integral part of the modern society. Twitter, for instance, is one of the most popular social media platforms around the world that enables people to share their opinions and post short messages about any subject on a daily basis. At first I presented one
approach to identifying informative messages during catastrophic events using deep learning techniques. By automatically detecting informative messages posted by users during major events, it can enable professionals involved in crisis management to better estimate damages with only relevant information posted on social media channels, as well as to act immediately. Moreover I have also performed an analysis study on Twitter messages posted during the Covid-19 pandemic. Initially I collected 4 million tweets posted in Portuguese since the begining of the pandemic and provided an analysis of the debate aroud the pandemic. I used topic modeling, sentiment analysis and hashtags recomendation techniques to provide isights around the online discussion of the Covid-19 pandemic
- …