85 research outputs found
The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives
The Archive Query Log (AQL) is a previously unused, comprehensive query log
collected at the Internet Archive over the last 25 years. Its first version
includes 356 million queries, 166 million search result pages, and 1.7 billion
search results across 550 search providers. Although many query logs have been
studied in the literature, the search providers that own them generally do not
publish their logs to protect user privacy and vital business data. Of the few
query logs publicly available, none combines size, scope, and diversity. The
AQL is the first to do so, enabling research on new retrieval models and
(diachronic) search engine analyses. Provided in a privacy-preserving manner,
it promotes open research as well as more transparency and accountability in
the search industry.Comment: SIGIR 2023 resource paper, 13 page
A Call for Standardization and Validation of Text Style Transfer Evaluation
Text Style Transfer (TST) evaluation is, in practice, inconsistent.
Therefore, we conduct a meta-analysis on human and automated TST evaluation and
experimentation that thoroughly examines existing literature in the field. The
meta-analysis reveals a substantial standardization gap in human and automated
evaluation. In addition, we also find a validation gap: only few automated
metrics have been validated using human experiments. To this end, we thoroughly
scrutinize both the standardization and validation gap and reveal the resulting
pitfalls. This work also paves the way to close the standardization and
validation gap in TST evaluation by calling out requirements to be met by
future research.Comment: Accepted to Findings of ACL 202
Evaluating Generative Ad Hoc Information Retrieval
Recent advances in large language models have enabled the development of
viable generative information retrieval systems. A generative retrieval system
returns a grounded generated text in response to an information need instead of
the traditional document ranking. Quantifying the utility of these types of
responses is essential for evaluating generative retrieval systems. As the
established evaluation methodology for ranking-based ad hoc retrieval may seem
unsuitable for generative retrieval, new approaches for reliable, repeatable,
and reproducible experimentation are required. In this paper, we survey the
relevant information retrieval and natural language processing literature,
identify search tasks and system architectures in generative retrieval, develop
a corresponding user model, and study its operationalization. This theoretical
analysis provides a foundation and new insights for the evaluation of
generative ad hoc retrieval systems.Comment: 14 pages, 5 figures, 1 tabl
Perspectives on Large Language Models for Relevance Judgment
When asked, current large language models (LLMs) like ChatGPT claim that they
can assist us with relevance judgments. Many researchers think this would not
lead to credible IR research. In this perspective paper, we discuss possible
ways for LLMs to assist human experts along with concerns and issues that
arise. We devise a human-machine collaboration spectrum that allows
categorizing different relevance judgment strategies, based on how much the
human relies on the machine. For the extreme point of "fully automated
assessment", we further include a pilot experiment on whether LLM-based
relevance judgments correlate with judgments from trained human assessors. We
conclude the paper by providing two opposing perspectives - for and against the
use of LLMs for automatic relevance judgments - and a compromise perspective,
informed by our analyses of the literature, our preliminary experimental
evidence, and our experience as IR researchers.
We hope to start a constructive discussion within the community to avoid a
stale-mate during review, where work is dammed if is uses LLMs for evaluation
and dammed if it doesn't
A machine learning approach for result caching in web search engines
A commonly used technique for improving search engine performance is result caching. In result caching, precomputed results (e.g., URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already stored in the cache can be directly served by the result cache, eliminating the need to process the query using costly computing resources. Although other performance metrics are possible, the main performance metric for evaluating the success of a result cache is hit rate. In this work, we present a machine learning approach to improve the hit rate of a result cache by facilitating a large number of features extracted from search engine query logs. We then apply the proposed machine learning approach to static, dynamic, and static-dynamic caching. Compared to the previous methods in the literature, the proposed approach improves the hit rate of the result cache up to 0.66%, which corresponds to 9.60% of the potential room for improvement. © 2017 Elsevier Lt
Podify : a podcast streaming platform with automatic logging of user behaviour for academic research
Podcasts are spoken documents that, in recent years, have gained widespread popularity. Despite the growing research interest in this domain, conducting user studies remains challenging due to the lack of datasets that include user behaviour. In particular, there is a need for a podcast streaming platform that reduces the overhead of conducting user studies. To address these issues, in this work, we present Podify. It is the first web-based platform for podcast streaming and consumption specifically designed for research. The platform highly resembles existing streaming systems to provide users with a high level of familiarity on both desktop and mobile. A catalogue of podcast episodes can be easily created via RSS feeds. The platform also offers Elasticsearch-based indexing and search that is highly customisable, allowing research and experimentation in podcast search. Users can manually curate playlists of podcast episodes for consumption. With mechanisms to collect explicit feedback from users (i.e., liking and disliking behaviour), Podify also automatically collects implicit feedback (i.e., all user interactions). Users' behaviour can be easily exported to a readable format for subsequent experimental analysis. A demonstration of the platform is available at https://youtu.be/k9Z5w_KKHr8, with the code and documentation available at https://github.com/NeuraSearch/Podify
Relevance-based language models : new estimations and applications
[Abstratc] Relevance-Based Language Models introduced in the Language Modelling framework the concept of relevance, which is explicit in other retrieval models such as the Probabilistic models. Relevance Models have been mainly used for a specific task within Information Retrieval called Pseudo-Relevance Feedback, a kind of local query expansion technique where relevance is assumed over a top of documents from the initial retrieval and where those documents are used to select expansion terms for the original query and produce a, hopefully more effective, second retrieval. In this thesis we investigate some new estimations for Relevance Models for both Pseudo-Relevance Feedback and other tasks beyond retrieval, particularly, constrained text clustering and item recommendation in Recommender Systems. We study the benefits of our proposals for those tasks in comparison with existing estimations. This new modellings are able not only to improve the effectiveness of the existing estimations and methods but also to outperform their robustness, a critical factor when dealing with Pseudo-Relevance Feedback methods. These objectives are pursued by different means: promoting divergent terms in the estimation of the Relevance Models, presenting new cluster-based retrieval models, introducing new methods for automatically determine the size of the pseudo-relevant set on a query-basis, and originally producing new modellings under the Relevance-Based Language Modelling framework for the constrained text clustering and the item recommendation problems
- …