13,288 research outputs found

    Benchmarking news recommendations: the CLEF NewsREEL use case

    Get PDF
    The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to evaluate and optimize news recommender algorithms. The goal is to create an algorithm that is able to generate news items that users would click, respecting a strict time constraint. The lab challenges participants to compete in either a "living lab" (Task 1) or perform an evaluation that replays recorded streams (Task 2). In this report, we discuss the objectives and challenges of the NewsREEL lab, summarize last year's campaign and outline the main research challenges that can be addressed by participating in NewsREEL 2016

    A Vertical PRF Architecture for Microblog Search

    Full text link
    In microblog retrieval, query expansion can be essential to obtain good search results due to the short size of queries and posts. Since information in microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance feedback (PRF) with an external corpus has a higher chance of retrieving more relevant documents and improving ranking. In this paper, we focus on the research question:how can we reduce the query expansion computational cost while maintaining the same retrieval precision as standard PRF? Therefore, we propose to accelerate the query expansion step of pseudo-relevance feedback. The hypothesis is that using an expansion corpus organized into verticals for expanding the query, will lead to a more efficient query expansion process and improved retrieval effectiveness. Thus, the proposed query expansion method uses a distributed search architecture and resource selection algorithms to provide an efficient query expansion process. Experiments on the TREC Microblog datasets show that the proposed approach can match or outperform standard PRF in MAP and NDCG@30, with a computational cost that is three orders of magnitude lower.Comment: To appear in ICTIR 201

    Report on the Information Retrieval Festival (IRFest2017)

    Get PDF
    The Information Retrieval Festival took place in April 2017 in Glasgow. The focus of the workshop was to bring together IR researchers from the various Scottish universities and beyond in order to facilitate more awareness, increased interaction and reflection on the status of the field and its future. The program included an industry session, research talks, demos and posters as well as two keynotes. The first keynote was delivered by Prof. Jaana Kekalenien, who provided a historical, critical reflection of realism in Interactive Information Retrieval Experimentation, while the second keynote was delivered by Prof. Maarten de Rijke, who argued for more Artificial Intelligence usage in IR solutions and deployments. The workshop was followed by a "Tour de Scotland" where delegates were taken from Glasgow to Aberdeen for the European Conference in Information Retrieval (ECIR 2017

    Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

    Full text link
    This paper reports the use of a document distance-based approach to automatically expand the number of available relevance judgements when these are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements are extracted from a list of references in a published review paper. We compare the results on two document sets: OHSUMED, based on medical research publications, and TREC-8, based on news feeds. We show that evaluations based on these expanded relevance judgements are more reliable than those using only the initially available judgements, especially when the number of available judgements is very limited.Comment: SIGIR 2014 Workshop on Gathering Efficient Assessments of Relevanc

    Query Expansion with Locally-Trained Word Embeddings

    Full text link
    Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings

    Searching Spontaneous Conversational Speech

    Get PDF
    The ACM SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam.\ud The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions

    On the power laws of language: word frequency distributions

    Get PDF
    About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a loglog scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. Theword frequency distributions produced by this model are shown to match the observations both analytically and empirically. © 2017 Copyright held by the owner/author(s)

    Modeling Temporal Evidence from External Collections

    Full text link
    Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic's temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.Comment: To appear in WSDM 201
    • …
    corecore