Search CORE

60 research outputs found

Generating Query Suggestions to Support Task-Based Search

Author: Awadallah Ahmed H.
Balog Krisztian
Kelly Diane
Ryen
Verma Manisha
Yilmaz Emine
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/08/2017
Field of study

We address the problem of generating query suggestions to support users in completing their underlying tasks (which motivated them to search in the first place). Given an initial query, these query suggestions should provide a coverage of possible subtasks the user might be looking for. We propose a probabilistic modeling framework that obtains keyphrases from multiple sources and generates query suggestions from these keyphrases. Using the test suites of the TREC Tasks track, we evaluate and analyze each component of our model.Comment: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17), 201

arXiv.org e-Print Archive

Crossref

Target Type Identification for Entity-Bearing Queries

Author: Balog Krisztian
Croft W. Bruce
Mikolov Tomas
Sawant Uma
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/07/2017
Field of study

Identifying the target types of entity-bearing queries can help improve retrieval performance as well as the overall search experience. In this work, we address the problem of automatically detecting the target types of a query with respect to a type taxonomy. We propose a supervised learning approach with a rich variety of features. Using a purpose-built test collection, we show that our approach outperforms existing methods by a remarkable margin. This is an extended version of the article published with the same title in the Proceedings of SIGIR'17.Comment: Extended version of SIGIR'17 short paper, 5 page

arXiv.org e-Print Archive

Crossref

Overview of the TREC 2022 NeuCLIR Track

Author: Lawrie Dawn
MacAvaney Sean
Mayfield James
McNamee Paul
Oard Douglas W.
Soldaini Luca
Yang Eugene
Publication venue
Publication date: 24/09/2023
Field of study

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annotator for one language were assessed by a different annotator when evaluating that topic on a different language. There were 172 total runs submitted by twelve teams.Comment: 22 pages, 13 figures, 10 tables. Part of the Thirty-First Text REtrieval Conference (TREC 2022) Proceedings. Replace the misplaced Russian result tabl

arXiv.org e-Print Archive

Perspectives on Large Language Models for Relevance Judgment

Author: Clarke Charles
Demartini Gianluca
Dietz Laura
Faggioli Guglielmo
Hagen Matthias
Hauff Claudia
Kando Noriko
Kanoulas Evangelos
Potthast Martin
Stein Benno
Wachsmuth Henning
Publication venue
Publication date: 13/04/2023
Field of study

When asked, current large language models (LLMs) like ChatGPT claim that they can assist us with relevance judgments. Many researchers think this would not lead to credible IR research. In this perspective paper, we discuss possible ways for LLMs to assist human experts along with concerns and issues that arise. We devise a human-machine collaboration spectrum that allows categorizing different relevance judgment strategies, based on how much the human relies on the machine. For the extreme point of "fully automated assessment", we further include a pilot experiment on whether LLM-based relevance judgments correlate with judgments from trained human assessors. We conclude the paper by providing two opposing perspectives - for and against the use of LLMs for automatic relevance judgments - and a compromise perspective, informed by our analyses of the literature, our preliminary experimental evidence, and our experience as IR researchers. We hope to start a constructive discussion within the community to avoid a stale-mate during review, where work is dammed if is uses LLMs for evaluation and dammed if it doesn't

arXiv.org e-Print Archive

One-Shot Labeling for Automatic Relevance Estimation

Author: MacAvaney Sean
Soldaini Luca
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/07/2023
Field of study

Dealing with unjudged documents ("holes") in relevance assessments is a perennial problem when evaluating search systems with offline experiments. Holes can reduce the apparent effectiveness of retrieval systems during evaluation and introduce biases in models trained with incomplete data. In this work, we explore whether large language models can help us fill such holes to improve offline evaluations. We examine an extreme, albeit common, evaluation setting wherein only a single known relevant document per query is available for evaluation. We then explore various approaches for predicting the relevance of unjudged documents with respect to a query and the known relevant document, including nearest neighbor, supervised, and prompting techniques. We find that although the predictions of these One-Shot Labelers (1SL) frequently disagree with human assessments, the labels they produce yield a far more reliable ranking of systems than the single labels do alone. Specifically, the strongest approaches can consistently reach system ranking correlations of over 0.86 with the full rankings over a variety of measures. Meanwhile, the approach substantially increases the reliability of t-tests due to filling holes in relevance assessments, giving researchers more confidence in results they find to be significant. Alongside this work, we release an easy-to-use software package to enable the use of 1SL for evaluation of other ad-hoc collections or systems.Comment: SIGIR 202

arXiv.org e-Print Archive

Enlighten

The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

Author: Fröbe Maik
Gienapp Lukas
Hagen Matthias
Potthast Martin
Reimer Jan Heinrich
Scells Harrisen
Schmidt Sebastian
Stein Benno
Publication venue
Publication date: 31/07/2023
Field of study

The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.Comment: SIGIR 2023 resource paper, 13 page

arXiv.org e-Print Archive

Answering Engine for Sport Statistics: Question Processing

Author: Blaauw Øyvind
Publication venue: University of Stavanger, Norway
Publication date: 01/01/2017
Field of study

Master's thesis in Computer scienceIn recent years, there has been an increasing growth of interest among computer scientists for the topic of Linked Data and the Semantic Web. By connecting and publishing structured data from multiple sources, the Web enables us to retrieve specific information without needing to go through documents of unstructured text. Question answering systems can utilise the benefit of Linked Data, and enable users to ask question in a natural language in order to provide direct answers. In this thesis we implement a system that can answer natural language questions related to the field of Formula 1 statistics. We show how data is collected and connected based on a conceptual model, and go through the necessary steps for converting a question into a machine-readable query. We perform an evaluation of the system, both on component level and on the system as a whole. We analyse and discuss challenges and topics for improvements, before we conclude our work and summarise the most important steps to consider for future work

UiS Brage

Understanding Differential Search Index for Text Retrieval

Author: Chen Xiaoyang
He Ben
Liu Yanjiang
Sun Le
Sun Yingfei
Publication venue
Publication date: 23/05/2023
Field of study

The Differentiable Search Index (DSI) is a novel information retrieval (IR) framework that utilizes a differentiable function to generate a sorted list of document identifiers in response to a given query. However, due to the black-box nature of the end-to-end neural architecture, it remains to be understood to what extent DSI possesses the basic indexing and retrieval abilities. To mitigate this gap, in this study, we define and examine three important abilities that a functioning IR framework should possess, namely, exclusivity, completeness, and relevance ordering. Our analytical experimentation shows that while DSI demonstrates proficiency in memorizing the unidirectional mapping from pseudo queries to document identifiers, it falls short in distinguishing relevant documents from random ones, thereby negatively impacting its retrieval effectiveness. To address this issue, we propose a multi-task distillation approach to enhance the retrieval quality without altering the structure of the model and successfully endow it with improved indexing abilities. Through experiments conducted on various datasets, we demonstrate that our proposed method outperforms previous DSI baselines.Comment: Accepted to Findings of ACL 202

arXiv.org e-Print Archive