71 research outputs found
Cross-language Information Retrieval
Two key assumptions shape the usual view of ranked retrieval: (1) that the
searcher can choose words for their query that might appear in the documents
that they wish to see, and (2) that ranking retrieved documents will suffice
because the searcher will be able to recognize those which they wished to find.
When the documents to be searched are in a language not known by the searcher,
neither assumption is true. In such cases, Cross-Language Information Retrieval
(CLIR) is needed. This chapter reviews the state of the art for CLIR and
outlines some open research questions.Comment: 49 pages, 0 figure
Evaluating Generative Ad Hoc Information Retrieval
Recent advances in large language models have enabled the development of
viable generative information retrieval systems. A generative retrieval system
returns a grounded generated text in response to an information need instead of
the traditional document ranking. Quantifying the utility of these types of
responses is essential for evaluating generative retrieval systems. As the
established evaluation methodology for ranking-based ad hoc retrieval may seem
unsuitable for generative retrieval, new approaches for reliable, repeatable,
and reproducible experimentation are required. In this paper, we survey the
relevant information retrieval and natural language processing literature,
identify search tasks and system architectures in generative retrieval, develop
a corresponding user model, and study its operationalization. This theoretical
analysis provides a foundation and new insights for the evaluation of
generative ad hoc retrieval systems.Comment: 14 pages, 5 figures, 1 tabl
Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education
This report documents the program and the outcomes of Dagstuhl Seminar 23031
``Frontiers of Information Access Experimentation for Research and Education'',
which brought together 37 participants from 12 countries.
The seminar addressed technology-enhanced information access (information
retrieval, recommender systems, natural language processing) and specifically
focused on developing more responsible experimental practices leading to more
valid results, both for research as well as for scientific education.
The seminar brought together experts from various sub-fields of information
access, namely IR, RS, NLP, information science, and human-computer interaction
to create a joint understanding of the problems and challenges presented by
next generation information access systems, from both the research and the
experimentation point of views, to discuss existing solutions and impediments,
and to propose next steps to be pursued in the area in order to improve not
also our research methods and findings but also the education of the new
generation of researchers and developers.
The seminar featured a series of long and short talks delivered by
participants, who helped in setting a common ground and in letting emerge
topics of interest to be explored as the main output of the seminar. This led
to the definition of five groups which investigated challenges, opportunities,
and next steps in the following areas: reality check, i.e. conducting
real-world studies, human-machine-collaborative relevance judgment frameworks,
overcoming methodological challenges in information retrieval and recommender
systems through awareness and education, results-blind reviewing, and guidance
for authors.Comment: Dagstuhl Seminar 23031, report
Dense Text Retrieval based on Pretrained Language Models: A Survey
Text retrieval is a long-standing research topic on information seeking,
where a system is required to return relevant information resources to user's
queries in natural language. From classic retrieval methods to learning-based
ranking functions, the underlying retrieval models have been continually
evolved with the ever-lasting technical innovation. To design effective
retrieval models, a key point lies in how to learn the text representation and
model the relevance matching. The recent success of pretrained language models
(PLMs) sheds light on developing more capable text retrieval approaches by
leveraging the excellent modeling capacity of PLMs. With powerful PLMs, we can
effectively learn the representations of queries and texts in the latent
representation space, and further construct the semantic matching function
between the dense vectors for relevance modeling. Such a retrieval approach is
referred to as dense retrieval, since it employs dense vectors (a.k.a.,
embeddings) to represent the texts. Considering the rapid progress on dense
retrieval, in this survey, we systematically review the recent advances on
PLM-based dense retrieval. Different from previous surveys on dense retrieval,
we take a new perspective to organize the related work by four major aspects,
including architecture, training, indexing and integration, and summarize the
mainstream techniques for each aspect. We thoroughly survey the literature, and
include 300+ related reference papers on dense retrieval. To support our
survey, we create a website for providing useful resources, and release a code
repertory and toolkit for implementing dense retrieval models. This survey aims
to provide a comprehensive, practical reference focused on the major progress
for dense text retrieval
A Survey on Cross-domain Recommendation: Taxonomies, Methods, and Future Directions
Traditional recommendation systems are faced with two long-standing
obstacles, namely, data sparsity and cold-start problems, which promote the
emergence and development of Cross-Domain Recommendation (CDR). The core idea
of CDR is to leverage information collected from other domains to alleviate the
two problems in one domain. Over the last decade, many efforts have been
engaged for cross-domain recommendation. Recently, with the development of deep
learning and neural networks, a large number of methods have emerged. However,
there is a limited number of systematic surveys on CDR, especially regarding
the latest proposed methods as well as the recommendation scenarios and
recommendation tasks they address. In this survey paper, we first proposed a
two-level taxonomy of cross-domain recommendation which classifies different
recommendation scenarios and recommendation tasks. We then introduce and
summarize existing cross-domain recommendation approaches under different
recommendation scenarios in a structured manner. We also organize datasets
commonly used. We conclude this survey by providing several potential research
directions about this field
GNNUERS: Fairness Explanation in GNNs for Recommendation via Counterfactual Reasoning
In recent years, personalization research has been delving into issues of
explainability and fairness. While some techniques have emerged to provide
post-hoc and self-explanatory individual recommendations, there is still a lack
of methods aimed at uncovering unfairness in recommendation systems beyond
identifying biased user and item features. This paper proposes a new algorithm,
GNNUERS, which uses counterfactuals to pinpoint user unfairness explanations in
terms of user-item interactions within a bi-partite graph. By perturbing the
graph topology, GNNUERS reduces differences in utility between protected and
unprotected demographic groups. The paper evaluates the approach using four
real-world graphs from different domains and demonstrates its ability to
systematically explain user unfairness in three state-of-the-art GNN-based
recommendation models. This perturbed network analysis reveals insightful
patterns that confirm the nature of the unfairness underlying the explanations.
The source code and preprocessed datasets are available at
https://github.com/jackmedda/RS-BGExplaine
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
- …