305 research outputs found
Using WordNet for query expansion: ADAPT @ FIRE 2016 microblog track
User-generated content on social websites such as Twitter
is known to be an important source of real-time information on significant events as they occur, for example natural
disasters. Our participation in the FIRE 2016 Microblog
track, seeks to exploit WordNet as an external resource
for synonym-based query expansion to support improved
matching between search topics and the target Tweet collection. The results of our participation in this task show that
this is an effective method for use with a standard BM25
based information retrieval system for this task
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Modeling Temporal Evidence from External Collections
Newsworthy events are broadcast through multiple mediums and prompt the
crowds to produce comments on social media. In this paper, we propose to
leverage on this behavioral dynamics to estimate the most relevant time periods
for an event (i.e., query). Recent advances have shown how to improve the
estimation of the temporal relevance of such topics. In this approach, we build
on two major novelties. First, we mine temporal evidences from hundreds of
external sources into topic-based external collections to improve the
robustness of the detection of relevant time periods. Second, we propose a
formal retrieval model that generalizes the use of the temporal dimension
across different aspects of the retrieval process. In particular, we show that
temporal evidence of external collections can be used to (i) infer a topic's
temporal relevance, (ii) select the query expansion terms, and (iii) re-rank
the final results for improved precision. Experiments with TREC Microblog
collections show that the proposed time-aware retrieval model makes an
effective and extensive use of the temporal dimension to improve search results
over the most recent temporal models. Interestingly, we observe a strong
correlation between precision and the temporal distribution of retrieved and
relevant documents.Comment: To appear in WSDM 201
TREC Incident Streams: Finding Actionable Information on Social Media
The Text Retrieval Conference (TREC) Incident Streams track is a new initiative that aims to mature social
media-based emergency response technology. This initiative advances the state of the art in this area through an
evaluation challenge, which attracts researchers and developers from across the globe. The 2018 edition of the track
provides a standardized evaluation methodology, an ontology of emergency-relevant social media information types,
proposes a scale for information criticality, and releases a dataset containing fifteen test events and approximately
20,000 labeled tweets. Analysis of this dataset reveals a significant amount of actionable information on social
media during emergencies (> 10%). While this data is valuable for emergency response efforts, analysis of the
39 state-of-the-art systems demonstrate a performance gap in identifying this data. We therefore find the current
state-of-the-art is insufficient for emergency responders’ requirements, particularly for rare actionable information
for which there is little prior training data available
TREC Incident Streams: Finding Actionable Information on Social Media
The Text Retrieval Conference (TREC) Incident Streams track is a new initiative that aims to mature social
media-based emergency response technology. This initiative advances the state of the art in this area through an
evaluation challenge, which attracts researchers and developers from across the globe. The 2018 edition of the track
provides a standardized evaluation methodology, an ontology of emergency-relevant social media information types,
proposes a scale for information criticality, and releases a dataset containing fifteen test events and approximately
20,000 labeled tweets. Analysis of this dataset reveals a significant amount of actionable information on social
media during emergencies (> 10%). While this data is valuable for emergency response efforts, analysis of the
39 state-of-the-art systems demonstrate a performance gap in identifying this data. We therefore find the current
state-of-the-art is insufficient for emergency responders’ requirements, particularly for rare actionable information
for which there is little prior training data available
- …