7,756 research outputs found
A self-adapting latency/power tradeoff model for replicated search engines
For many search settings, distributed/replicated search engines deploy a large number of machines to ensure efficient retrieval. This paper investigates how the power consumption of a replicated search engine can be automatically reduced when the system has low contention, without compromising its efficiency. We propose a novel self-adapting model to analyse the trade-off between latency and power consumption for distributed search engines. When query volumes are high and there is contention for the resources, the model automatically increases the necessary number of active machines in the system to maintain acceptable query response times. On the other hand, when the load of the system is low and the queries can be served easily, the model is able to reduce the number of active machines, leading to power savings. The model bases its decisions on examining the current and historical query loads of the search engine. Our proposal is formulated as a general dynamic decision problem, which can be quickly solved by dynamic programming in response to changing query loads. Thorough experiments are conducted to validate the usefulness of the proposed adaptive model using historical Web search traffic submitted to a commercial search engine. Our results show that our proposed self-adapting model can achieve an energy saving of 33% while only degrading mean query completion time by 10 ms compared to a baseline that provisions replicas based on a previous day's traffic
Investigating Retrieval Method Selection with Axiomatic Features
We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior
Neural Vector Spaces for Unsupervised Information Retrieval
We propose the Neural Vector Space Model (NVSM), a method that learns
representations of documents in an unsupervised manner for news article
retrieval. In the NVSM paradigm, we learn low-dimensional representations of
words and documents from scratch using gradient descent and rank documents
according to their similarity with query representations that are composed from
word representations. We show that NVSM performs better at document ranking
than existing latent semantic vector space methods. The addition of NVSM to a
mixture of lexical language models and a state-of-the-art baseline vector space
model yields a statistically significant increase in retrieval effectiveness.
Consequently, NVSM adds a complementary relevance signal. Next to semantic
matching, we find that NVSM performs well in cases where lexical matching is
needed.
NVSM learns a notion of term specificity directly from the document
collection without feature engineering. We also show that NVSM learns
regularities related to Luhn significance. Finally, we give advice on how to
deploy NVSM in situations where model selection (e.g., cross-validation) is
infeasible. We find that an unsupervised ensemble of multiple models trained
with different hyperparameter values performs better than a single
cross-validated model. Therefore, NVSM can safely be used for ranking documents
without supervised relevance judgments.Comment: TOIS 201
Identifying effective translations for cross-lingual Arabic-to-English user-generated speech search
Cross Language Information Retrieval
(CLIR) systems are a valuable tool to enable speakers of one language to search for
content of interest expressed in a different
language. A group for whom this is of particular interest is bilingual Arabic speakers
who wish to search for English language
content using information needs expressed
in Arabic queries. A key challenge in
CLIR is crossing the language barrier
between the query and the documents.
The most common approach to bridging
this gap is automated query translation,
which can be unreliable for vague or short
queries. In this work, we examine the
potential for improving CLIR effectiveness
by predicting the translation effectiveness
using Query Performance Prediction (QPP)
techniques. We propose a novel QPP
method to estimate the quality of translation for an Arabic-Engish Cross-lingual
User-generated Speech Search (CLUGS)
task. We present an empirical evaluation
that demonstrates the quality of our method
on alternative translation outputs extracted
from an Arabic-to-English Machine Translation system developed for this task. Finally, we show how this framework can be
integrated in CLUGS to find relevant translations for improved retrieval performance
- …