9,992 research outputs found
Optimizing Ranking Models in an Online Setting
Online Learning to Rank (OLTR) methods optimize ranking models by directly
interacting with users, which allows them to be very efficient and responsive.
All OLTR methods introduced during the past decade have extended on the
original OLTR method: Dueling Bandit Gradient Descent (DBGD). Recently, a
fundamentally different approach was introduced with the Pairwise
Differentiable Gradient Descent (PDGD) algorithm. To date the only comparisons
of the two approaches are limited to simulations with cascading click models
and low levels of noise. The main outcome so far is that PDGD converges at
higher levels of performance and learns considerably faster than DBGD-based
methods. However, the PDGD algorithm assumes cascading user behavior,
potentially giving it an unfair advantage. Furthermore, the robustness of both
methods to high levels of noise has not been investigated. Therefore, it is
unclear whether the reported advantages of PDGD over DBGD generalize to
different experimental conditions. In this paper, we investigate whether the
previous conclusions about the PDGD and DBGD comparison generalize from ideal
to worst-case circumstances. We do so in two ways. First, we compare the
theoretical properties of PDGD and DBGD, by taking a critical look at
previously proven properties in the context of ranking. Second, we estimate an
upper and lower bound on the performance of methods by simulating both ideal
user behavior and extremely difficult behavior, i.e., almost-random
non-cascading user models. Our findings show that the theoretical bounds of
DBGD do not apply to any common ranking model and, furthermore, that the
performance of DBGD is substantially worse than PDGD in both ideal and
worst-case circumstances. These results reproduce previously published findings
about the relative performance of PDGD vs. DBGD and generalize them to
extremely noisy and non-cascading circumstances.Comment: European Conference on Information Retrieval (ECIR) 201
Balancing Speed and Quality in Online Learning to Rank for Information Retrieval
In Online Learning to Rank (OLTR) the aim is to find an optimal ranking model
by interacting with users. When learning from user behavior, systems must
interact with users while simultaneously learning from those interactions.
Unlike other Learning to Rank (LTR) settings, existing research in this field
has been limited to linear models. This is due to the speed-quality tradeoff
that arises when selecting models: complex models are more expressive and can
find the best rankings but need more user interactions to do so, a requirement
that risks frustrating users during training. Conversely, simpler models can be
optimized on fewer interactions and thus provide a better user experience, but
they will converge towards suboptimal rankings. This tradeoff creates a
deadlock, since novel models will not be able to improve either the user
experience or the final convergence point, without sacrificing the other. Our
contribution is twofold. First, we introduce a fast OLTR model called Sim-MGD
that addresses the speed aspect of the speed-quality tradeoff. Sim-MGD ranks
documents based on similarities with reference documents. It converges rapidly
and, hence, gives a better user experience but it does not converge towards the
optimal rankings. Second, we contribute Cascading Multileave Gradient Descent
(C-MGD) for OLTR that directly addresses the speed-quality tradeoff by using a
cascade that enables combinations of the best of two worlds: fast learning and
high quality final convergence. C-MGD can provide the better user experience of
Sim-MGD while maintaining the same convergence as the state-of-the-art MGD
model. This opens the door for future work to design new models for OLTR
without having to deal with the speed-quality tradeoff.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen
Variance Reduction in Gradient Exploration for Online Learning to Rank
Online Learning to Rank (OL2R) algorithms learn from implicit user feedback
on the fly. The key of such algorithms is an unbiased estimation of gradients,
which is often (trivially) achieved by uniformly sampling from the entire
parameter space. This unfortunately introduces high-variance in gradient
estimation, and leads to a worse regret of model estimation, especially when
the dimension of parameter space is large.
In this paper, we aim at reducing the variance of gradient estimation in OL2R
algorithms. We project the selected updating direction into a space spanned by
the feature vectors from examined documents under the current query (termed the
"document space" for short), after interleaved test. Our key insight is that
the result of interleaved test solely is governed by a user's relevance
evaluation over the examined documents. Hence, the true gradient introduced by
this test result should lie in the constructed document space, and components
orthogonal to the document space in the proposed gradient can be safely removed
for variance reduction. We prove that the projected gradient is an unbiased
estimation of the true gradient, and show that this lower-variance gradient
estimation results in significant regret reduction. Our proposed method is
compatible with all existing OL2R algorithms which rank documents using a
linear model. Extensive experimental comparisons with several state-of-the-art
OL2R algorithms have confirmed the effectiveness of our proposed method in
reducing the variance of gradient estimation and improving overall performance.Comment: Proceedings of the 42nd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '19); Key Words:
Online learning to rank, Dueling bandit, Variance Reductio
Fuzzy local linear approximation-based sequential design
When approximating complex high-fidelity black box simulators with surrogate models, the experimental design is often created sequentially. LOLA-Voronoi, a powerful state of the art method for sequential design combines an Exploitation and Exploration algorithm and adapts the sampling distribution to provide extra samples in non-linear regions. The LOLA algorithm estimates gradients to identify interesting regions, but has a bad complexity which results in long computation time when simulators are high-dimensional. In this paper, a new gradient estimation approach for the LOLA algorithm is proposed based on Fuzzy Logic. Experiments show the new method is a lot faster and results in experimental designs of comparable quality
Fast and reliable online learning to rank for information retrieval
The amount of digital data we produce every day far surpasses our ability to process this data, and finding useful information in this constant flow of data has become one of the major challenges of the 21st century. Search engines are one way of accessing large data collections. Their algorithms have evolved far beyond simply matching search queries to sets of documents. Today’s most sophisticated search engines combine hundreds of relevance signals to provide the best possible results for each searcher. Current approaches for tuning the parameters of search engines can be highly effective. However, they typically require considerable expertise and manual effort. They rely on supervised learning to rank, meaning that they learn from manually annotated examples of relevant documents for given queries. Obtaining large quantities of sufficiently accurate manual annotations is becoming increasingly difficult, especially for personalized search, access to sensitive data, or search in settings that change over time. In this thesis, I develop new online learning to rank techniques, based on insights from reinforcement learning. In contrast to supervised approaches, these methods allow search engines to learn directly from users’ interactions. User interactions can typically be observed easily and cheaply, and reflect the preferences of real users. Interpreting user interactions and learning from them is challenging, because they can be biased and noisy. The contributions of this thesis include a novel interleaved comparison method, called probabilistic interleave, that allows unbiased comparisons of search engine result rankings, and methods for learning quickly and effectively from the resulting relative feedback. The obtained analytical and experimental results show how search engines can effectively learn from user interactions. In the future, these and similar techniques can open up new ways for gaining useful information from ever larger amounts of data
- …