585 research outputs found
Learning Early Exit Strategies for Additive Ranking Ensembles
Modern search engine ranking pipelines are commonly based on large machine-learned ensembles of regression trees. We propose LEAR, a novel - learned - technique aimed to reduce the average number of trees traversed by documents to accumulate the scores, thus reducing the overall query response time. LEAR exploits a classifier that predicts whether a document can early exit the ensemble because it is unlikely to be ranked among the final top-k results. The early exit decision occurs at a sentinel point, i.e., after having evaluated a limited number of trees, and the partial scores are exploited to filter out non-promising documents. We evaluate LEAR by deploying it in a production-like setting, adopting a state-of-the-art algorithm for ensembles traversal. We provide a comprehensive experimental evaluation on two public datasets. The experiments show that LEAR has a significant impact on the efficiency of the query processing without hindering its ranking quality. In detail, on a first dataset, LEAR is able to achieve a speedup of 3x without any loss in NDCG@10, while on a second dataset the speedup is larger than 5x with a negligible NDCG@10 loss (< 0.05%)
Runtime Optimizations for Prediction with Tree-Based Models
Tree-based models have proven to be an effective solution for web ranking as
well as other problems in diverse domains. This paper focuses on optimizing the
runtime performance of applying such models to make predictions, given an
already-trained model. Although exceedingly simple conceptually, most
implementations of tree-based models do not efficiently utilize modern
superscalar processor architectures. By laying out data structures in memory in
a more cache-conscious fashion, removing branches from the execution flow using
a technique called predication, and micro-batching predictions using a
technique called vectorization, we are able to better exploit modern processor
architectures and significantly improve the speed of tree-based models over
hard-coded if-else blocks. Our work contributes to the exploration of
architecture-conscious runtime implementations of machine learning algorithms
Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence
We study the tradeoff between computational effort and accuracy in a cascade
of deep neural networks. During inference, early termination in the cascade is
controlled by confidence levels derived directly from the softmax outputs of
intermediate classifiers. The advantage of early termination is that
classification is performed using less computation, thus adjusting the
computational effort to the complexity of the input. Moreover, dynamic
modification of confidence thresholds allow one to trade accuracy for
computational effort without requiring retraining. Basing of early termination
on softmax classifier outputs is justified by experimentation that demonstrates
an almost linear relation between confidence levels in intermediate classifiers
and accuracy. Our experimentation with architectures based on ResNet obtained
the following results. (i) A speedup of 1.5 that sacrifices 1.4% accuracy with
respect to the CIFAR-10 test set. (ii) A speedup of 1.19 that sacrifices 0.7%
accuracy with respect to the CIFAR-100 test set. (iii) A speedup of 2.16 that
sacrifices 1.4% accuracy with respect to the SVHN test set
Runtime Optimizations for Tree-Based Machine Learning Models
Tree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression trees for learning to rank. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processors. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures. Experiments on synthetic data and on three standard learning-to-rank datasets show that our approach is significantly faster than standard implementations
Quality versus efficiency in document scoring with learning-to-rank models
Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of docu- ments and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to effectively rank them. Although the scoring efficiency of LtR models is critical in several applications – e.g., it directly impacts on response time and throughput of Web query processing – it has received relatively little attention so far.
The goal of this work is to experimentally investigate the scoring efficiency of LtR models along with their ranking quality. Specifically, we show that machine-learned ranking mod- els exhibit a quality versus efficiency trade-off. For example, each family of LtR algorithms has tuning parameters that can influence both effectiveness and efficiency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more effective than other algorithms that induce simpler models like linear combination of features.
We extensively analyze the quality versus efficiency trade-off of a wide spectrum of state- of-the-art LtR, and we propose a sound methodology to devise the most effective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded imple- mentations of the most effective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (λ-MART), and the first public-domain implementation of Oblivious Lambda-Mart (λ-MART), an algorithm that induces forests of oblivious regression trees.
We investigate how the different training parameters impact on the quality versus effi- ciency trade-off, and provide a thorough comparison of several algorithms in the quality- cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget
- …