41 research outputs found

    LambdaLoss: Metric-Driven Loss for Learning-to Rank

    Get PDF
    How to directly optimize ranking metrics such as Normalized Dis­counted Cumulative Gain (NDCG) is an interesting but challenging problem, because ranking metrics are either flat or discontinuous ev­erywhere. Among existing approaches, LambdaRank is a novel algo­rithm that incorporates metrics into its learning procedure. Though empirically effective, it still lacks theoretical justification. For ex­ample, what is the underlying loss that LambdaRank optimizes for? Due to this, it is unclear whether LambdaRank will always converge. In this paper, we present a well-defined loss for LambdaRank in a probabilistic framework and show that LambdaRank is a special configuration in our framework. This framework, which we call LambdaLoss, provides theoretical justification for Lamb-daRank. Furthermore, we propose a few more metric-driven loss functions in our LambdaLoss framework. Our loss functions have clear connection to ranking metrics and can be optimized in our framework efficiently. Experiments on three publicly available data sets show that our methods significantly outperform the state-of-the-art learning-to-rank algorithms. This confirms both the theo­retical soundness and the practical effectiveness of the LambdaLoss framework

    Efficient AUC Optimization for Information Ranking Applications

    Full text link
    Adequate evaluation of an information retrieval system to estimate future performance is a crucial task. Area under the ROC curve (AUC) is widely used to evaluate the generalization of a retrieval system. However, the objective function optimized in many retrieval systems is the error rate and not the AUC value. This paper provides an efficient and effective non-linear approach to optimize AUC using additive regression trees, with a special emphasis on the use of multi-class AUC (MAUC) because multiple relevance levels are widely used in many ranking applications. Compared to a conventional linear approach, the performance of the non-linear approach is comparable on binary-relevance benchmark datasets and is better on multi-relevance benchmark datasets.Comment: 12 page

    Factorizing LambdaMART for cold start recommendations

    Full text link
    Recommendation systems often rely on point-wise loss metrics such as the mean squared error. However, in real recommendation settings only few items are presented to a user. This observation has recently encouraged the use of rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to rank which relies on such a metric. Despite its success it does not have a principled regularization mechanism relying in empirical approaches to control model complexity leaving it thus prone to overfitting. Motivated by the fact that very often the users' and items' descriptions as well as the preference behavior can be well summarized by a small number of hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization (LambdaMART-MF), that learns a low rank latent representation of users and items using gradient boosted trees. The algorithm factorizes lambdaMART by defining relevance scores as the inner product of the learned representations of the users and items. The low rank is essentially a model complexity controller; on top of it we propose additional regularizers to constraint the learned latent representations that reflect the user and item manifolds as these are defined by their original feature based descriptors and the preference behavior. Finally we also propose to use a weighted variant of NDCG to reduce the penalty for similar items with large rating discrepancy. We experiment on two very different recommendation datasets, meta-mining and movies-users, and evaluate the performance of LambdaMART-MF, with and without regularization, in the cold start setting as well as in the simpler matrix completion setting. In both cases it outperforms in a significant manner current state of the art algorithms

    Structured learning for non-smooth ranking losses

    Get PDF
    Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almost-linear-time algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g. MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization. The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion

    랭킹 라벨을 정확히 예측하기 위한 유전 알고리즘의 LambdaMART 포레스트에 대한 적용

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 공과대학 컴퓨터공학부, 2017. 8. Srinivasa Rao Satti.In this thesis, principles of genetic algorithm (GA) will be applied to forests of LambdaMART to get more accurate ranking results. Ranking problem is considered one kind of prediction function problems, and various solutions were proposed for the ranking problem. Applying machine learning techniques has improved ranking quality of algorithm. One of the techniques is ensemble of decision tree learning where each tree is trained one by one and these trees are used to predict the result with the given input values. LambdaMART is a fusion of LambdaRank and MART (Multiple Additive Regression Trees), where gradients of scores are calculated by LambdaRank and multiple trees are generated and trained with predefined steps in MART. LambdaMART is also main contributor for the winner of ``Yahoo! Learning to Rank Challenge (2010)" though the challenge reports that ranking solution performance has reached saturation point. However, LambdaMART might have problems about overfitting to training data, which means it could not predict outcome precisely on other unobserved data after being trained with data. In addition, genetic algorithm can provide greater searching ability for solution space though the ability depends on designing core operations such as crossover, mutation, and so on. Combining this search ability with LambdaMART could enhance solution's quality and reduce some chance of overfitting to training data. Each LambdaMART forest will become a chromosome in this scheme, and multiple forests will be operands of genetic operations. This scheme shows higher accuracy measure value than original LambdaMART and total training time per forest has also been saved.Chapter 1 Introduction 1 Chapter 2 Background 3 2.1 Information Retrieval: Ranking 3 2.1.1 Ranking Problem 3 2.1.2 Ranking Measures 4 2.2 Classication and Regression Trees 7 2.3 Genetic Algorithm (GA) 7 2.3.1 Selection 8 2.3.2 Crossover 9 2.3.3 Mutation 10 2.3.4 Replacement 10 Chapter 3 Related Work 11 3.1 RankNet 11 3.2 LambdaRank 13 3.3 MART (Multiple Additive Regression Tree) 14 3.4 LambdaMART 15 Chapter 4 LambdaMART with GA 17 4.1 Overview 17 4.2 Genetic Operations 18 4.2.1 Selection 19 4.2.2 Crossover 19 4.2.3 Mutation 20 4.2.4 Replacement 21 Chapter 5 Experimental Results 22 5.1 System Settings and Datasets 22 5.2 Implementation 23 5.3 Results 23 Chapter 6 Conclusion 31 Bibliography 32 요약 35 Acknowledgements 36Maste
    corecore