6,799 research outputs found
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Despite substantial interest in applications of neural networks to
information retrieval, neural ranking models have only been applied to standard
ad hoc retrieval tasks over web pages and newswire documents. This paper
proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network)
a novel neural ranking model specifically designed for ranking short social
media posts. We identify document length, informal language, and heterogeneous
relevance signals as features that distinguish documents in our domain, and
present a model specifically designed with these characteristics in mind. Our
model uses hierarchical convolutional layers to learn latent semantic
soft-match relevance signals at the character, word, and phrase levels. A
pooling-based similarity measurement layer integrates evidence from multiple
types of matches between the query, the social media post, as well as URLs
contained in the post. Extensive experiments using Twitter data from the TREC
Microblog Tracks 2011--2014 show that our model significantly outperforms prior
feature-based as well and existing neural ranking models. To our best
knowledge, this paper presents the first substantial work tackling search over
social media posts using neural ranking models.Comment: AAAI 2019, 10 page
A Continuously Growing Dataset of Sentential Paraphrases
A major challenge in paraphrase research is the lack of parallel corpora. In
this paper, we present a new method to collect large-scale sentential
paraphrases from Twitter by linking tweets through shared URLs. The main
advantage of our method is its simplicity, as it gets rid of the classifier or
human in the loop needed to select data before annotation and subsequent
application of paraphrase identification algorithms in the previous work. We
present the largest human-labeled paraphrase corpus to date of 51,524 sentence
pairs and the first cross-domain benchmarking for automatic paraphrase
identification. In addition, we show that more than 30,000 new sentential
paraphrases can be easily and continuously captured every month at ~70%
precision, and demonstrate their utility for downstream NLP tasks through
phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201
Towards Deep Semantic Analysis Of Hashtags
Hashtags are semantico-syntactic constructs used across various social
networking and microblogging platforms to enable users to start a topic
specific discussion or classify a post into a desired category. Segmenting and
linking the entities present within the hashtags could therefore help in better
understanding and extraction of information shared across the social media.
However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden),
the segmentation of hashtags into constituent entities ("NSA" and "Edward
Snowden" in this case) is not a trivial task. Most of the current
state-of-the-art social media analytics systems like Sentiment Analysis and
Entity Linking tend to either ignore hashtags, or treat them as a single word.
In this paper, we present a context aware approach to segment and link entities
in the hashtags to a knowledge base (KB) entry, based on the context within the
tweet. Our approach segments and links the entities in hashtags such that the
coherence between hashtag semantics and the tweet is maximized. To the best of
our knowledge, no existing study addresses the issue of linking entities in
hashtags for extracting semantic information. We evaluate our method on two
different datasets, and demonstrate the effectiveness of our technique in
improving the overall entity linking in tweets via additional semantic
information provided by segmenting and linking entities in a hashtag.Comment: To Appear in 37th European Conference on Information Retrieva
- …