8,134 research outputs found
Improving Distributed Representations of Tweets - Present and Future
Unsupervised representation learning for tweets is an important research
field which helps in solving several business applications such as sentiment
analysis, hashtag prediction, paraphrase detection and microblog ranking. A
good tweet representation learning model must handle the idiosyncratic nature
of tweets which poses several challenges such as short length, informal words,
unusual grammar and misspellings. However, there is a lack of prior work which
surveys the representation learning models with a focus on tweets. In this
work, we organize the models based on its objective function which aids the
understanding of the literature. We also provide interesting future directions,
which we believe are fruitful in advancing this field by building high-quality
tweet representation learning models.Comment: To be presented in Student Research Workshop (SRW) at ACL 201
Improving Distributed Representations of Tweets - Present and Future
Unsupervised representation learning for tweets is an important research
field which helps in solving several business applications such as sentiment
analysis, hashtag prediction, paraphrase detection and microblog ranking. A
good tweet representation learning model must handle the idiosyncratic nature
of tweets which poses several challenges such as short length, informal words,
unusual grammar and misspellings. However, there is a lack of prior work which
surveys the representation learning models with a focus on tweets. In this
work, we organize the models based on its objective function which aids the
understanding of the literature. We also provide interesting future directions,
which we believe are fruitful in advancing this field by building high-quality
tweet representation learning models.Comment: To be presented in Student Research Workshop (SRW) at ACL 201
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
Synapse at CAp 2017 NER challenge: Fasttext CRF
We present our system for the CAp 2017 NER challenge which is about named
entity recognition on French tweets. Our system leverages unsupervised learning
on a larger dataset of French tweets to learn features feeding a CRF model. It
was ranked first without using any gazetteer or structured external data, with
an F-measure of 58.89\%. To the best of our knowledge, it is the first system
to use fasttext embeddings (which include subword representations) and an
embedding-based sentence representation for NER
- …