180 research outputs found
Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education
This report documents the program and the outcomes of Dagstuhl Seminar 23031
``Frontiers of Information Access Experimentation for Research and Education'',
which brought together 37 participants from 12 countries.
The seminar addressed technology-enhanced information access (information
retrieval, recommender systems, natural language processing) and specifically
focused on developing more responsible experimental practices leading to more
valid results, both for research as well as for scientific education.
The seminar brought together experts from various sub-fields of information
access, namely IR, RS, NLP, information science, and human-computer interaction
to create a joint understanding of the problems and challenges presented by
next generation information access systems, from both the research and the
experimentation point of views, to discuss existing solutions and impediments,
and to propose next steps to be pursued in the area in order to improve not
also our research methods and findings but also the education of the new
generation of researchers and developers.
The seminar featured a series of long and short talks delivered by
participants, who helped in setting a common ground and in letting emerge
topics of interest to be explored as the main output of the seminar. This led
to the definition of five groups which investigated challenges, opportunities,
and next steps in the following areas: reality check, i.e. conducting
real-world studies, human-machine-collaborative relevance judgment frameworks,
overcoming methodological challenges in information retrieval and recommender
systems through awareness and education, results-blind reviewing, and guidance
for authors.Comment: Dagstuhl Seminar 23031, report
How Sensitivity Classification Effectiveness Impacts Reviewers in Technology-Assisted Sensitivity Review
All government documents that are released to the public must first be manually reviewed to identify and protect any sensitive information, e.g. confidential information. However, the unassisted manual sensitivity review of born-digital documents is not practical due to, for example, the volume of documents that are created. Previous work has shown that sensitivity classification can be effective for predicting if a document contains sensitive information. However, since all of the released documents must be manually reviewed, it is important to know if sensitivity classification can assist sensitivity reviewers in making their sensitivity judgements. Hence, in this paper, we conduct a digital sensitivity review user study, to investigate if the accuracy of sensitivity classification effects the number of documents that a reviewer correctly judges to be sensitive or not (reviewer accuracy) and the time that it takes to sensitivity review a document (reviewing speed). Our results show that providing reviewers with sensitivity classification predictions, from a classifier that achieves 0.7 Balanced Accuracy, results in a 38% increase in mean reviewer accuracy and an increase of 72% in mean reviewing speeds, compared to when reviewers are not provided with predictions. Overall, our findings demonstrate that sensitivity classification is a viable technology for assisting with the sensitivity review of born-digital government documents
Unbiased Learning to Rank: Counterfactual and Online Approaches
This tutorial covers and contrasts the two main methodologies in unbiased
Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been
an interest in LTR from user interactions, however, this form of implicit
feedback is very biased. In recent years, unbiased LTR methods have been
introduced to remove the effect of different types of bias caused by
user-behavior in search. For instance, a well addressed type of bias is
position bias: the rank at which a document is displayed heavily affects the
interactions it receives. Counterfactual LTR methods deal with such types of
bias by learning from historical interactions while correcting for the effect
of the explicitly modelled biases. Online LTR does not use an explicit user
model, in contrast, it learns through an interactive process where randomized
results are displayed to the user. Through randomization the effect of
different types of bias can be removed from the learning process. Though both
methodologies lead to unbiased LTR, their approaches differ considerably,
furthermore, so do their theoretical guarantees, empirical results, effects on
the user experience during learning, and applicability. Consequently, for
practitioners the choice between the two is very substantial. By providing an
overview of both approaches and contrasting them, we aim to provide an
essential guide to unbiased LTR so as to aid in understanding and choosing
between methodologies.Comment: Abstract for tutorial appearing at SIGIR 201
Tutorial: Are You My Neighbor?: Bringing Order to Neighbor Computing Problems
Finding nearest neighbors is an important topic that has attracted much attention over the years and has applications in many fields, such as market basket analysis, plagiarism and anomaly detection, community detection, ligand-based virtual screening, etc. As data are easier and easier to collect, finding neighbors has become a potential bottleneck in analysis pipelines. Performing pairwise comparisons given the massive datasets of today is no longer feasible. The high computational complexity of the task has led researchers to develop approximate methods, which find many but not all of the nearest neighbors. Yet, for some types of data, efficient exact solutions have been found by carefully partitioning or filtering the search space in a way that avoids most unnecessary comparisons.In recent years, there have been several fundamental advances in our ability to efficiently identify appropriate neighbors, especially in non-traditional data, such as graphs or document collections. In this tutorial, we provide an in-depth overview of recent methods for finding (nearest) neighbors, focusing on the intuition behind choices made in the design of those algorithms and on the utility of the methods in real-world applications. Our tutorial aims to provide a unifying view of neighbor computing problems, spanning from numerical data to graph data, from categorical data to sequential data, and related application scenarios. For each type of data, we will review the current state-of-the-art approaches used to identify neighbors and discuss how neighbor search methods are used to solve important problems
- …