100 research outputs found
Towards Deep Semantic Analysis Of Hashtags
Hashtags are semantico-syntactic constructs used across various social
networking and microblogging platforms to enable users to start a topic
specific discussion or classify a post into a desired category. Segmenting and
linking the entities present within the hashtags could therefore help in better
understanding and extraction of information shared across the social media.
However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden),
the segmentation of hashtags into constituent entities ("NSA" and "Edward
Snowden" in this case) is not a trivial task. Most of the current
state-of-the-art social media analytics systems like Sentiment Analysis and
Entity Linking tend to either ignore hashtags, or treat them as a single word.
In this paper, we present a context aware approach to segment and link entities
in the hashtags to a knowledge base (KB) entry, based on the context within the
tweet. Our approach segments and links the entities in hashtags such that the
coherence between hashtag semantics and the tweet is maximized. To the best of
our knowledge, no existing study addresses the issue of linking entities in
hashtags for extracting semantic information. We evaluate our method on two
different datasets, and demonstrate the effectiveness of our technique in
improving the overall entity linking in tweets via additional semantic
information provided by segmenting and linking entities in a hashtag.Comment: To Appear in 37th European Conference on Information Retrieva
Summarizing Indian Languages using Multilingual Transformers based Models
With the advent of multilingual models like mBART, mT5, IndicBART etc.,
summarization in low resource Indian languages is getting a lot of attention
now a days. But still the number of datasets is low in number. In this work, we
(Team HakunaMatata) study how these multilingual models perform on the datasets
which have Indian languages as source and target text while performing
summarization. We experimented with IndicBART and mT5 models to perform the
experiments and report the ROUGE-1, ROUGE-2, ROUGE-3 and ROUGE-4 scores as a
performance metric
Passage Retrieval Using Answer Type Profiles in Question Answering
PACLIC 23 / City University of Hong Kong / 3-5 December 200
Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction Mention Extraction
Social media is an useful platform to share health-related information due to
its vast reach. This makes it a good candidate for public-health monitoring
tasks, specifically for pharmacovigilance. We study the problem of extraction
of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from
twitter. Medical information extraction from social media is challenging,
mainly due to short and highly information nature of text, as compared to more
technical and formal medical reports.
Current methods in ADR mention extraction relies on supervised learning
methods, which suffers from labeled data scarcity problem. The State-of-the-art
method uses deep neural networks, specifically a class of Recurrent Neural
Network (RNN) which are Long-Short-Term-Memory networks (LSTMs)
\cite{hochreiter1997long}. Deep neural networks, due to their large number of
free parameters relies heavily on large annotated corpora for learning the end
task. But in real-world, it is hard to get large labeled data, mainly due to
heavy cost associated with manual annotation. Towards this end, we propose a
novel semi-supervised learning based RNN model, which can leverage unlabeled
data also present in abundance on social media. Through experiments we
demonstrate the effectiveness of our method, achieving state-of-the-art
performance in ADR mention extraction.Comment: Accepted at DTMBIO workshop, CIKM 2017. To appear in BMC
Bioinformatics. Pls cite that versio
- …