5,160 research outputs found
Benchmarking news recommendations: the CLEF NewsREEL use case
The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to evaluate and optimize news recommender algorithms. The goal is to create an algorithm that is able to generate news items that users would click, respecting a strict time constraint. The lab challenges participants to compete in either a "living lab" (Task 1) or perform an evaluation that replays recorded streams (Task 2). In this report, we discuss the objectives and challenges of the NewsREEL lab, summarize last year's campaign and outline the main research challenges that can be addressed by participating in NewsREEL 2016
Query Expansion with Locally-Trained Word Embeddings
Continuous space word embeddings have received a great deal of attention in
the natural language processing and machine learning communities for their
ability to model term similarity and other relationships. We study the use of
term relatedness in the context of query expansion for ad hoc information
retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when
trained globally, underperform corpus and query specific embeddings for
retrieval tasks. These results suggest that other tasks benefiting from global
embeddings may also benefit from local embeddings
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Technology Assisted Reviews: Finding the Last Few Relevant Documents by Asking Yes/No Questions to Reviewers
The goal of a technology-assisted review is to achieve high recall with low
human effort. Continuous active learning algorithms have demonstrated good
performance in locating the majority of relevant documents in a collection,
however their performance is reaching a plateau when 80\%-90\% of them has been
found. Finding the last few relevant documents typically requires exhaustively
reviewing the collection. In this paper, we propose a novel method to identify
these last few, but significant, documents efficiently. Our method makes the
hypothesis that entities carry vital information in documents, and that
reviewers can answer questions about the presence or absence of an entity in
the missing relevance documents. Based on this we devise a sequential Bayesian
search method that selects the optimal sequence of questions to ask. The
experimental results show that our proposed method can greatly improve
performance requiring less reviewing effort.Comment: This paper is accepted by SIGIR 201
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
While billions of non-English speaking users rely on search engines every
day, the problem of ad-hoc information retrieval is rarely studied for
non-English languages. This is primarily due to a lack of data set that are
suitable to train ranking algorithms. In this paper, we tackle the lack of data
by leveraging pre-trained multilingual language models to transfer a retrieval
system trained on English collections to non-English queries and documents. Our
model is evaluated in a zero-shot setting, meaning that we use them to predict
relevance scores for query-document pairs in languages never seen during
training. Our results show that the proposed approach can significantly
outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and
Spanish. We also show that augmenting the English training collection with some
examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short
Language Models
Contains fulltext :
227630.pdf (preprint version ) (Open Access
Third International Workshop on Gamification for Information Retrieval (GamifIR'16)
Stronger engagement and greater participation is often crucial
to reach a goal or to solve an issue. Issues like the emerging
employee engagement crisis, insufficient knowledge sharing,
and chronic procrastination. In many cases we need and
search for tools to beat procrastination or to change people’s
habits. Gamification is the approach to learn from often fun,
creative and engaging games. In principle, it is about understanding
games and applying game design elements in a
non-gaming environments. This offers possibilities for wide
area improvements. For example more accurate work, better
retention rates and more cost effective solutions by relating
motivations for participating as more intrinsic than conventional
methods. In the context of Information Retrieval (IR)
it is not hard to imagine that many tasks could benefit from
gamification techniques. Besides several manual annotation
tasks of data sets for IR research, user participation is important
in order to gather implicit or even explicit feedback
to feed the algorithms. Gamification, however, comes with
its own challenges and its adoption in IR is still in its infancy.
Given the enormous response to the first and second
GamifIR workshops that were both co-located with ECIR,
and the broad range of topics discussed, we now organized
the third workshop at SIGIR 2016 to address a range of
emerging challenges and opportunities
- …