4,869 research outputs found

    Benchmarking news recommendations: the CLEF NewsREEL use case

    Get PDF
    The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to evaluate and optimize news recommender algorithms. The goal is to create an algorithm that is able to generate news items that users would click, respecting a strict time constraint. The lab challenges participants to compete in either a "living lab" (Task 1) or perform an evaluation that replays recorded streams (Task 2). In this report, we discuss the objectives and challenges of the NewsREEL lab, summarize last year's campaign and outline the main research challenges that can be addressed by participating in NewsREEL 2016

    Query Expansion with Locally-Trained Word Embeddings

    Full text link
    Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings

    Technology Assisted Reviews: Finding the Last Few Relevant Documents by Asking Yes/No Questions to Reviewers

    Get PDF
    The goal of a technology-assisted review is to achieve high recall with low human effort. Continuous active learning algorithms have demonstrated good performance in locating the majority of relevant documents in a collection, however their performance is reaching a plateau when 80\%-90\% of them has been found. Finding the last few relevant documents typically requires exhaustively reviewing the collection. In this paper, we propose a novel method to identify these last few, but significant, documents efficiently. Our method makes the hypothesis that entities carry vital information in documents, and that reviewers can answer questions about the presence or absence of an entity in the missing relevance documents. Based on this we devise a sequential Bayesian search method that selects the optimal sequence of questions to ask. The experimental results show that our proposed method can greatly improve performance requiring less reviewing effort.Comment: This paper is accepted by SIGIR 201

    Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

    Full text link
    While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short

    Third International Workshop on Gamification for Information Retrieval (GamifIR'16)

    Get PDF
    Stronger engagement and greater participation is often crucial to reach a goal or to solve an issue. Issues like the emerging employee engagement crisis, insufficient knowledge sharing, and chronic procrastination. In many cases we need and search for tools to beat procrastination or to change people’s habits. Gamification is the approach to learn from often fun, creative and engaging games. In principle, it is about understanding games and applying game design elements in a non-gaming environments. This offers possibilities for wide area improvements. For example more accurate work, better retention rates and more cost effective solutions by relating motivations for participating as more intrinsic than conventional methods. In the context of Information Retrieval (IR) it is not hard to imagine that many tasks could benefit from gamification techniques. Besides several manual annotation tasks of data sets for IR research, user participation is important in order to gather implicit or even explicit feedback to feed the algorithms. Gamification, however, comes with its own challenges and its adoption in IR is still in its infancy. Given the enormous response to the first and second GamifIR workshops that were both co-located with ECIR, and the broad range of topics discussed, we now organized the third workshop at SIGIR 2016 to address a range of emerging challenges and opportunities

    Deriving query suggestions for site search

    Get PDF
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

    Enhanced information retrieval using domain-specific recommender models

    Get PDF
    The objective of an information retrieval (IR) system is to retrieve relevant items which meet a user information need. There is currently significant interest in personalized IR which seeks to improve IR effectiveness by incorporating a model of the user’s interests. However, in some situations there may be no opportunity to learn about the interests of a specific user on a certain topic. In our work, we propose an IR approach which combines a recommender algorithm with IR methods to improve retrieval for domains where the system has no opportunity to learn prior information about the user’s knowledge of a domain for which they have not previously entered a query. We use search data from other previous users interested in the same topic to build a recommender model for this topic. When a user enters a query on a topic, new to this user, an appropriate recommender model is selected and used to predict a ranking which the user may find interesting based on the behaviour of previous users with similar queries. The recommender output is integrated with a standard IR method in a weighted linear combination to provide a final result for the user. Experiments using the INEX 2009 data collection with a simulated recommender training set show that our approach can improve on a baseline IR system
    corecore