Search CORE

5,800 research outputs found

Expansion via Prediction of Importance with Contextualization

Author: Frieder Ophir
Goharian Nazli
MacAvaney Sean
Nardini Franco Maria
Perego Raffaele
Tonellotto Nicola
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/05/2020
Field of study

The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.Comment: Accepted at SIGIR 2020 (short

arXiv.org e-Print Archive

Expansion via Prediction of Importance with Contextualization

Author: Frieder O.
Goharian N.
MacAvaney S.
Nardini F. M.
Perego R.
Tonellotto N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Personalized content retrieval in context using ontological knowledge

Author: Avrithis Y.
Castells P.
Fernandez M.
Mylonas P.
Vallet D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Personalized content retrieval aims at improving the retrieval process by taking into account the particular interests of individual users. However, not all user preferences are relevant in all situations. It is well known that human preferences are complex, multiple, heterogeneous, changing, even contradictory, and should be understood in context with the user goals and tasks at hand. In this paper, we propose a method to build a dynamic representation of the semantic context of ongoing retrieval tasks, which is used to activate different subsets of user interests at runtime, in a way that out-of-context preferences are discarded. Our approach is based on an ontology-driven representation of the domain of discourse, providing enriched descriptions of the semantics involved in retrieval actions and preferences, and enabling the definition of effective means to relate preferences and context

CiteSeerX

DSpace at NTUA

Online news: Where is the promised context?

Author: Zamith Fernando
Publication venue
Publication date: 01/01/2013
Field of study

Information Retrieval: Recent Advances and Beyond

Author: Hambarde Kailash A.
Proenca Hugo
Publication venue
Publication date: 01/01/2023
Field of study

In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain

arXiv.org e-Print Archive

Directory of Open Access Journals

Word sense discrimination in information retrieval: a spectral clustering-based approach

Author: Chifu Adrian-Gabriel
Hristea Florentina
Mothe Josiane
Popescu Marius
Publication venue: 'Elsevier BV'
Publication date: 01/07/2014
Field of study

International audienceWord sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved document list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries

A global resource for genomic predictions of antimicrobial resistance and surveillance of Salmonella Typhi at pathogenwatch.

Author: Aanensen David M
Abudahab Khalil
Argimón Silvia
Baker Stephen
Dougan Gordon
Dyson Zoe A
Goater Richard J
Holt Kathryn E
Keane Jacqueline A
Marks Florian
Nair Satheesh
Page Andrew J
Park Se Eun
Sánchez-Busó Leonor
Taylor Benjamin
Underwood Anthony
Wong Vanessa K
Yeats Corin A
Publication venue: Nature communications
Publication date: 19/06/2021
Field of study

As whole-genome sequencing capacity becomes increasingly decentralized, there is a growing opportunity for collaboration and the sharing of surveillance data within and between countries to inform typhoid control policies. This vision requires free, community-driven tools that facilitate access to genomic data for public health on a global scale. Here we present the Pathogenwatch scheme for Salmonella enterica serovar Typhi (S. Typhi), a web application enabling the rapid identification of genomic markers of antimicrobial resistance (AMR) and contextualization with public genomic data. We show that the clustering of S. Typhi genomes in Pathogenwatch is comparable to established bioinformatics methods, and that genomic predictions of AMR are highly concordant with phenotypic susceptibility data. We demonstrate the public health utility of Pathogenwatch with examples selected from >4,300 public genomes available in the application. Pathogenwatch provides an intuitive entry point to monitor of the emergence and spread of S. Typhi high risk clones