78 research outputs found
Keyphrase Extraction from Disaster-related Tweets
While keyphrase extraction has received considerable attention in recent
years, relatively few studies exist on extracting keyphrases from social media
platforms such as Twitter, and even fewer for extracting disaster-related
keyphrases from such sources. During a disaster, keyphrases can be extremely
useful for filtering relevant tweets that can enhance situational awareness.
Previously, joint training of two different layers of a stacked Recurrent
Neural Network for keyword discovery and keyphrase extraction had been shown to
be effective in extracting keyphrases from general Twitter data. We improve the
model's performance on both general Twitter data and disaster-related Twitter
data by incorporating contextual word embeddings, POS-tags, phonetics, and
phonological features. Moreover, we discuss the shortcomings of the often used
F1-measure for evaluating the quality of predicted keyphrases with respect to
the ground truth annotations. Instead of the F1-measure, we propose the use of
embedding-based metrics to better capture the correctness of the predicted
keyphrases. In addition, we also present a novel extension of an
embedding-based metric. The extension allows one to better control the penalty
for the difference in the number of ground-truth and predicted keyphrasesComment: 12 pages, 7 figure
On Identifying Hashtags in Disaster Twitter Data
Tweet hashtags have the potential to improve the search for information
during disaster events. However, there is a large number of disaster-related
tweets that do not have any user-provided hashtags. Moreover, only a small
number of tweets that contain actionable hashtags are useful for disaster
response. To facilitate progress on automatic identification (or extraction) of
disaster hashtags for Twitter data, we construct a unique dataset of
disaster-related tweets annotated with hashtags useful for filtering actionable
information. Using this dataset, we further investigate Long Short Term
Memory-based models within a Multi-Task Learning framework. The best performing
model achieves an F1-score as high as 92.22%. The dataset, code, and other
resources are available on Github
Extracting keywords from tweets
Nos últimos anos, uma enorme quantidade de informações foi disponibilizada na Internet. As redes sociais estão entre as que mais contribuem para esse aumento no volume de dados. O Twitter, em particular, abriu o caminho, enquanto plataforma social, para que pessoas e organizações possam interagir entre si, gerando grandes volumes de dados a partir dos quais é possível extrair informação útil. Uma tal quantidade de dados, permitirá por exemplo, revelar-se importante se e quando, vários indivíduos relatarem sintomas de doença ao mesmo tempo e no mesmo lugar. Processar automaticamente um tal volume de informações e obter a partir dele conhecimento útil, torna-se, no entanto, uma tarefa impossível para qualquer ser humano. Os extratores de palavras-chave surgem neste contexto como uma ferramenta valiosa que visa facilitar este trabalho, ao permitir, de uma forma rápida, ter acesso a um conjunto de termos caracterizadores do documento.
Neste trabalho, tentamos contribuir para um melhor entendimento deste problema, avaliando a eficácia do YAKE (um algoritmo de extração de palavras-chave não supervisionado) em cima de um conjunto de tweets, um tipo de texto, caracterizado não só pelo seu reduzido tamanho, mas também pela sua natureza não estruturada. Embora os extratores de palavras-chave tenham sido amplamente aplicados a textos genéricos, como a relatórios, artigos, entre outros, a sua aplicabilidade em tweets é escassa e até ao momento não foi disponibilizado formalmente nenhum conjunto de dados. Neste trabalho e por forma a contornar esse problema optámos por desenvolver e tornar disponível uma nova coleção de dados, um importante contributo para que a comunidade científica promova novas soluções neste domínio. O KWTweet foi anotado por 15 anotadores e resultou em 7736 tweets anotados. Com base nesta informação, pudemos posteriormente avaliar a eficácia do YAKE! contra 9 baselines de extração de palavra-chave não supervisionados (TextRank, KP-Miner, SingleRank, PositionRank, TopicPageRank, MultipartiteRank, TopicRank, Rake e TF.IDF). Os resultados obtidos demonstram que o YAKE! tem um desempenho superior quando comparado com os seus competidores, provando-se assim a sua eficácia neste tipo de textos. Por fim, disponibilizamos uma demo que visa demonstrar o funcionamento do YAKE! Nesta plataforma web, os utilizadores têm a possibilidade de fazer uma pesquisa por utilizador ou hashtag e dessa forma obter as palavras chave mais relevantes através de uma nuvem de palavra
Abstractive Opinion Tagging
In e-commerce, opinion tags refer to a ranked list of tags provided by the
e-commerce platform that reflect characteristics of reviews of an item. To
assist consumers to quickly grasp a large number of reviews about an item,
opinion tags are increasingly being applied by e-commerce platforms. Current
mechanisms for generating opinion tags rely on either manual labelling or
heuristic methods, which is time-consuming and ineffective. In this paper, we
propose the abstractive opinion tagging task, where systems have to
automatically generate a ranked list of opinion tags that are based on, but
need not occur in, a given set of user-generated reviews.
The abstractive opinion tagging task comes with three main challenges: (1)
the noisy nature of reviews; (2) the formal nature of opinion tags vs. the
colloquial language usage in reviews; and (3) the need to distinguish between
different items with very similar aspects. To address these challenges, we
propose an abstractive opinion tagging framework, named AOT-Net, to generate a
ranked list of opinion tags given a large number of reviews. First, a
sentence-level salience estimation component estimates each review's salience
score. Next, a review clustering and ranking component ranks reviews in two
steps: first, reviews are grouped into clusters and ranked by cluster size;
then, reviews within each cluster are ranked by their distance to the cluster
center. Finally, given the ranked reviews, a rank-aware opinion tagging
component incorporates an alignment feature and alignment loss to generate a
ranked list of opinion tags. To facilitate the study of this task, we create
and release a large-scale dataset, called eComTag, crawled from real-world
e-commerce websites. Extensive experiments conducted on the eComTag dataset
verify the effectiveness of the proposed AOT-Net in terms of various evaluation
metrics.Comment: Accepted by WSDM 202
Knowledge Extraction from Open Data Repository
The explosion of affluent social networks, online communities, and jointly generated information resources has accelerated the convergence of technological and social networks producing environments that reveal both the framework of the underlying information arrangements and the collective formation of their members. In studying the consequences of these developments, we face the opportunity to analyze the POD repository at unprecedented scale levels and extract useful information from query log data. This chapter aim is to improve the performance of a POD repository from a different point of view. Firstly, we propose a novel query recommender system to help users shorten their query sessions. The idea is to find shortcuts to speed up the user interaction with the open data repository and decrease the number of queries submitted. The proposed model, based on pseudo-relevance feedback, formalizes exploiting the knowledge mined from query logs to help users rapidly satisfy their information need
Unsupervised Detection of Sub-events in Large Scale Disasters
Social media plays a major role during and after major natural disasters
(e.g., hurricanes, large-scale fires, etc.), as people ``on the ground'' post
useful information on what is actually happening. Given the large amounts of
posts, a major challenge is identifying the information that is useful and
actionable. Emergency responders are largely interested in finding out what
events are taking place so they can properly plan and deploy resources. In this
paper we address the problem of automatically identifying important sub-events
(within a large-scale emergency ``event'', such as a hurricane). In particular,
we present a novel, unsupervised learning framework to detect sub-events in
Tweets for retrospective crisis analysis. We first extract noun-verb pairs and
phrases from raw tweets as sub-event candidates. Then, we learn a semantic
embedding of extracted noun-verb pairs and phrases, and rank them against a
crisis-specific ontology. We filter out noisy and irrelevant information then
cluster the noun-verb pairs and phrases so that the top-ranked ones describe
the most important sub-events. Through quantitative experiments on two large
crisis data sets (Hurricane Harvey and the 2015 Nepal Earthquake), we
demonstrate the effectiveness of our approach over the state-of-the-art. Our
qualitative evaluation shows better performance compared to our baseline.Comment: AAAI-20 Social Impact Trac
National security and social media monitoring: a presentation of the emotive and related systems
Today social media streams, such as Twitter, represent vast amounts of 'real-time' daily streaming data. Topics on these streams cover every range of human communication, ranging from banal banter, to serious reactions to events and information sharing regarding any imaginable product, item or entity. It has now become the norm for publicly visible events to break news over social media streams first, and only then followed by main stream media picking up on the news. It has been suggested in literature that social-media are a valid, valuable and effective real-time tool for gauging public subjective reactions to events and entities. Due to the vast big-data that is generated on a daily basis on social media streams, monitoring and gauging public reactions has to be automated and most of all scalable - i.e. human, expert monitoring is generally unfeasible. In this paper the EMOTIVE system, a project funded jointly by the DSTL (Defence Science and Technology Laboratory) and EPSRC, which focuses on monitoring fine-grained emotional responses relating to events of national security importance, will be presented. Similar systems for monitoring national security events are also presented and the primary traits of such national security social media monitoring systems are introduced and discussed
- …