9,992 research outputs found

    Optimizing Ranking Models in an Online Setting

    Get PDF
    Online Learning to Rank (OLTR) methods optimize ranking models by directly interacting with users, which allows them to be very efficient and responsive. All OLTR methods introduced during the past decade have extended on the original OLTR method: Dueling Bandit Gradient Descent (DBGD). Recently, a fundamentally different approach was introduced with the Pairwise Differentiable Gradient Descent (PDGD) algorithm. To date the only comparisons of the two approaches are limited to simulations with cascading click models and low levels of noise. The main outcome so far is that PDGD converges at higher levels of performance and learns considerably faster than DBGD-based methods. However, the PDGD algorithm assumes cascading user behavior, potentially giving it an unfair advantage. Furthermore, the robustness of both methods to high levels of noise has not been investigated. Therefore, it is unclear whether the reported advantages of PDGD over DBGD generalize to different experimental conditions. In this paper, we investigate whether the previous conclusions about the PDGD and DBGD comparison generalize from ideal to worst-case circumstances. We do so in two ways. First, we compare the theoretical properties of PDGD and DBGD, by taking a critical look at previously proven properties in the context of ranking. Second, we estimate an upper and lower bound on the performance of methods by simulating both ideal user behavior and extremely difficult behavior, i.e., almost-random non-cascading user models. Our findings show that the theoretical bounds of DBGD do not apply to any common ranking model and, furthermore, that the performance of DBGD is substantially worse than PDGD in both ideal and worst-case circumstances. These results reproduce previously published findings about the relative performance of PDGD vs. DBGD and generalize them to extremely noisy and non-cascading circumstances.Comment: European Conference on Information Retrieval (ECIR) 201

    Balancing Speed and Quality in Online Learning to Rank for Information Retrieval

    Full text link
    In Online Learning to Rank (OLTR) the aim is to find an optimal ranking model by interacting with users. When learning from user behavior, systems must interact with users while simultaneously learning from those interactions. Unlike other Learning to Rank (LTR) settings, existing research in this field has been limited to linear models. This is due to the speed-quality tradeoff that arises when selecting models: complex models are more expressive and can find the best rankings but need more user interactions to do so, a requirement that risks frustrating users during training. Conversely, simpler models can be optimized on fewer interactions and thus provide a better user experience, but they will converge towards suboptimal rankings. This tradeoff creates a deadlock, since novel models will not be able to improve either the user experience or the final convergence point, without sacrificing the other. Our contribution is twofold. First, we introduce a fast OLTR model called Sim-MGD that addresses the speed aspect of the speed-quality tradeoff. Sim-MGD ranks documents based on similarities with reference documents. It converges rapidly and, hence, gives a better user experience but it does not converge towards the optimal rankings. Second, we contribute Cascading Multileave Gradient Descent (C-MGD) for OLTR that directly addresses the speed-quality tradeoff by using a cascade that enables combinations of the best of two worlds: fast learning and high quality final convergence. C-MGD can provide the better user experience of Sim-MGD while maintaining the same convergence as the state-of-the-art MGD model. This opens the door for future work to design new models for OLTR without having to deal with the speed-quality tradeoff.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information and Knowledge Managemen

    Variance Reduction in Gradient Exploration for Online Learning to Rank

    Full text link
    Online Learning to Rank (OL2R) algorithms learn from implicit user feedback on the fly. The key of such algorithms is an unbiased estimation of gradients, which is often (trivially) achieved by uniformly sampling from the entire parameter space. This unfortunately introduces high-variance in gradient estimation, and leads to a worse regret of model estimation, especially when the dimension of parameter space is large. In this paper, we aim at reducing the variance of gradient estimation in OL2R algorithms. We project the selected updating direction into a space spanned by the feature vectors from examined documents under the current query (termed the "document space" for short), after interleaved test. Our key insight is that the result of interleaved test solely is governed by a user's relevance evaluation over the examined documents. Hence, the true gradient introduced by this test result should lie in the constructed document space, and components orthogonal to the document space in the proposed gradient can be safely removed for variance reduction. We prove that the projected gradient is an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction. Our proposed method is compatible with all existing OL2R algorithms which rank documents using a linear model. Extensive experimental comparisons with several state-of-the-art OL2R algorithms have confirmed the effectiveness of our proposed method in reducing the variance of gradient estimation and improving overall performance.Comment: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19); Key Words: Online learning to rank, Dueling bandit, Variance Reductio

    Fuzzy local linear approximation-based sequential design

    Get PDF
    When approximating complex high-fidelity black box simulators with surrogate models, the experimental design is often created sequentially. LOLA-Voronoi, a powerful state of the art method for sequential design combines an Exploitation and Exploration algorithm and adapts the sampling distribution to provide extra samples in non-linear regions. The LOLA algorithm estimates gradients to identify interesting regions, but has a bad complexity which results in long computation time when simulators are high-dimensional. In this paper, a new gradient estimation approach for the LOLA algorithm is proposed based on Fuzzy Logic. Experiments show the new method is a lot faster and results in experimental designs of comparable quality

    Large Scale Co-Regularized Ranking

    Get PDF

    Fast and reliable online learning to rank for information retrieval

    Get PDF
    The amount of digital data we produce every day far surpasses our ability to process this data, and finding useful information in this constant flow of data has become one of the major challenges of the 21st century. Search engines are one way of accessing large data collections. Their algorithms have evolved far beyond simply matching search queries to sets of documents. Today’s most sophisticated search engines combine hundreds of relevance signals to provide the best possible results for each searcher. Current approaches for tuning the parameters of search engines can be highly effective. However, they typically require considerable expertise and manual effort. They rely on supervised learning to rank, meaning that they learn from manually annotated examples of relevant documents for given queries. Obtaining large quantities of sufficiently accurate manual annotations is becoming increasingly difficult, especially for personalized search, access to sensitive data, or search in settings that change over time. In this thesis, I develop new online learning to rank techniques, based on insights from reinforcement learning. In contrast to supervised approaches, these methods allow search engines to learn directly from users’ interactions. User interactions can typically be observed easily and cheaply, and reflect the preferences of real users. Interpreting user interactions and learning from them is challenging, because they can be biased and noisy. The contributions of this thesis include a novel interleaved comparison method, called probabilistic interleave, that allows unbiased comparisons of search engine result rankings, and methods for learning quickly and effectively from the resulting relative feedback. The obtained analytical and experimental results show how search engines can effectively learn from user interactions. In the future, these and similar techniques can open up new ways for gaining useful information from ever larger amounts of data
    • …
    corecore