17,731 research outputs found
Optimal Information Retrieval with Complex Utility Functions
Existing retrieval models all attempt to optimize one single utility function, which is often based on the topical relevance of a document with respect to a query. In real applications, retrieval involves more complex utility functions that may involve preferences on several different dimensions. In this paper, we present a general optimization framework for retrieval with complex utility functions. A query language is designed according to this framework to enable users to submit complex queries. We propose an efficient algorithm for retrieval with complex utility functions based on the a-priori algorithm. As a case study, we apply our algorithm to a complex utility retrieval problem in distributed IR. Experiment results show that our algorithm allows for flexible tradeoff between multiple retrieval criteria. Finally, we study the efficiency issue of our algorithm on simulated data
Using the quantum probability ranking principle to rank interdependent documents
A known limitation of the Probability Ranking Principle (PRP) is that it does not cater for dependence between documents. Recently, the Quantum Probability Ranking Principle (QPRP) has been proposed, which implicitly captures dependencies between documents through “quantum interference”. This paper explores whether this new ranking principle leads to improved performance for subtopic retrieval, where novelty and diversity is required. In a thorough empirical investigation, models based on the PRP, as well as other recently proposed ranking strategies for subtopic retrieval (i.e. Maximal Marginal Relevance (MMR) and Portfolio Theory(PT)), are compared against the QPRP. On the given task, it is shown that the QPRP outperforms these other ranking strategies. And unlike MMR and PT, one of the main advantages of the QPRP is that no parameter estimation/tuning is required; making the QPRP both simple and effective. This research demonstrates that the application of quantum theory to problems within information retrieval can lead to significant improvements
Sparse Support Vector Infinite Push
In this paper, we address the problem of embedded feature selection for
ranking on top of the list problems. We pose this problem as a regularized
empirical risk minimization with -norm push loss function () and
sparsity inducing regularizers. We leverage the issues related to this
challenging optimization problem by considering an alternating direction method
of multipliers algorithm which is built upon proximal operators of the loss
function and the regularizer. Our main technical contribution is thus to
provide a numerical scheme for computing the infinite push loss function
proximal operator. Experimental results on toy, DNA microarray and BCI problems
show how our novel algorithm compares favorably to competitors for ranking on
top while using fewer variables in the scoring function.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Efficient Optimization for Rank-based Loss Functions
The accuracy of information retrieval systems is often measured using complex
loss functions such as the average precision (AP) or the normalized discounted
cumulative gain (NDCG). Given a set of positive and negative samples, the
parameters of a retrieval system can be estimated by minimizing these loss
functions. However, the non-differentiability and non-decomposability of these
loss functions does not allow for simple gradient based optimization
algorithms. This issue is generally circumvented by either optimizing a
structured hinge-loss upper bound to the loss function or by using asymptotic
methods like the direct-loss minimization framework. Yet, the high
computational complexity of loss-augmented inference, which is necessary for
both the frameworks, prohibits its use in large training data sets. To
alleviate this deficiency, we present a novel quicksort flavored algorithm for
a large class of non-decomposable loss functions. We provide a complete
characterization of the loss functions that are amenable to our algorithm, and
show that it includes both AP and NDCG based loss functions. Furthermore, we
prove that no comparison based algorithm can improve upon the computational
complexity of our approach asymptotically. We demonstrate the effectiveness of
our approach in the context of optimizing the structured hinge loss upper bound
of AP and NDCG loss for learning models for a variety of vision tasks. We show
that our approach provides significantly better results than simpler
decomposable loss functions, while requiring a comparable training time.Comment: 15 pages, 2 figure
Training linear ranking SVMs in linearithmic time using red-black trees
We introduce an efficient method for training the linear ranking support
vector machine. The method combines cutting plane optimization with red-black
tree based approach to subgradient calculations, and has O(m*s+m*log(m)) time
complexity, where m is the number of training examples, and s the average
number of non-zero features per example. Best previously known training
algorithms achieve the same efficiency only for restricted special cases,
whereas the proposed approach allows any real valued utility scores in the
training data. Experiments demonstrate the superior scalability of the proposed
approach, when compared to the fastest existing RankSVM implementations.Comment: 20 pages, 4 figure
- …