41 research outputs found
LambdaLoss: Metric-Driven Loss for Learning-to Rank
How to directly optimize ranking metrics such as Normalized Discounted Cumulative Gain (NDCG) is an interesting but challenging problem, because ranking metrics are either flat or discontinuous everywhere. Among existing approaches, LambdaRank is a novel algorithm that incorporates metrics into its learning procedure. Though empirically effective, it still lacks theoretical justification. For example, what is the underlying loss that LambdaRank optimizes for? Due to this, it is unclear whether LambdaRank will always converge. In this paper, we present a well-defined loss for LambdaRank in a probabilistic framework and show that LambdaRank is a special configuration in our framework. This framework, which we call LambdaLoss, provides theoretical justification for Lamb-daRank. Furthermore, we propose a few more metric-driven loss functions in our LambdaLoss framework. Our loss functions have clear connection to ranking metrics and can be optimized in our framework efficiently. Experiments on three publicly available data sets show that our methods significantly outperform the state-of-the-art learning-to-rank algorithms. This confirms both the theoretical soundness and the practical effectiveness of the LambdaLoss framework
Efficient AUC Optimization for Information Ranking Applications
Adequate evaluation of an information retrieval system to estimate future
performance is a crucial task. Area under the ROC curve (AUC) is widely used to
evaluate the generalization of a retrieval system. However, the objective
function optimized in many retrieval systems is the error rate and not the AUC
value. This paper provides an efficient and effective non-linear approach to
optimize AUC using additive regression trees, with a special emphasis on the
use of multi-class AUC (MAUC) because multiple relevance levels are widely used
in many ranking applications. Compared to a conventional linear approach, the
performance of the non-linear approach is comparable on binary-relevance
benchmark datasets and is better on multi-relevance benchmark datasets.Comment: 12 page
Factorizing LambdaMART for cold start recommendations
Recommendation systems often rely on point-wise loss metrics such as the mean
squared error. However, in real recommendation settings only few items are
presented to a user. This observation has recently encouraged the use of
rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to
rank which relies on such a metric. Despite its success it does not have a
principled regularization mechanism relying in empirical approaches to control
model complexity leaving it thus prone to overfitting.
Motivated by the fact that very often the users' and items' descriptions as
well as the preference behavior can be well summarized by a small number of
hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization
(LambdaMART-MF), that learns a low rank latent representation of users and
items using gradient boosted trees. The algorithm factorizes lambdaMART by
defining relevance scores as the inner product of the learned representations
of the users and items. The low rank is essentially a model complexity
controller; on top of it we propose additional regularizers to constraint the
learned latent representations that reflect the user and item manifolds as
these are defined by their original feature based descriptors and the
preference behavior. Finally we also propose to use a weighted variant of NDCG
to reduce the penalty for similar items with large rating discrepancy.
We experiment on two very different recommendation datasets, meta-mining and
movies-users, and evaluate the performance of LambdaMART-MF, with and without
regularization, in the cold start setting as well as in the simpler matrix
completion setting. In both cases it outperforms in a significant manner
current state of the art algorithms
Structured learning for non-smooth ranking losses
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almost-linear-time algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g. MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization. The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion
랭킹 라벨을 정확히 예측하기 위한 유전 알고리즘의 LambdaMART 포레스트에 대한 적용
학위논문 (석사)-- 서울대학교 대학원 공과대학 컴퓨터공학부, 2017. 8. Srinivasa Rao Satti.In this thesis, principles of genetic algorithm (GA) will be applied to forests of LambdaMART to get more accurate ranking results. Ranking problem is considered one kind of prediction function problems, and various solutions were proposed for the ranking problem. Applying machine learning techniques has improved ranking quality of algorithm. One of the techniques is ensemble of decision tree learning where each tree is trained one by one and these trees are used to predict the result with the given input values.
LambdaMART is a fusion of LambdaRank and MART (Multiple Additive Regression Trees), where gradients of scores are calculated by LambdaRank and multiple trees are generated and trained with predefined steps in MART. LambdaMART is also main contributor for the winner of ``Yahoo! Learning to Rank Challenge (2010)" though the challenge reports that ranking solution performance has reached saturation point. However, LambdaMART might have problems about overfitting to training data, which means it could not predict outcome precisely on other unobserved data after being trained with data. In addition, genetic algorithm can provide greater searching ability for solution space though the ability depends on designing core operations such as crossover, mutation, and so on.
Combining this search ability with LambdaMART could enhance solution's quality and reduce some chance of overfitting to training data. Each LambdaMART forest will become a chromosome in this scheme, and multiple forests will be operands of genetic operations. This scheme shows higher accuracy measure value than original LambdaMART and total training time per forest has also been saved.Chapter 1 Introduction 1
Chapter 2 Background 3
2.1 Information Retrieval: Ranking 3
2.1.1 Ranking Problem 3
2.1.2 Ranking Measures 4
2.2 Classication and Regression Trees 7
2.3 Genetic Algorithm (GA) 7
2.3.1 Selection 8
2.3.2 Crossover 9
2.3.3 Mutation 10
2.3.4 Replacement 10
Chapter 3 Related Work 11
3.1 RankNet 11
3.2 LambdaRank 13
3.3 MART (Multiple Additive Regression Tree) 14
3.4 LambdaMART 15
Chapter 4 LambdaMART with GA 17
4.1 Overview 17
4.2 Genetic Operations 18
4.2.1 Selection 19
4.2.2 Crossover 19
4.2.3 Mutation 20
4.2.4 Replacement 21
Chapter 5 Experimental Results 22
5.1 System Settings and Datasets 22
5.2 Implementation 23
5.3 Results 23
Chapter 6 Conclusion 31
Bibliography 32
요약 35
Acknowledgements 36Maste