8 research outputs found
Efficient Optimization for Rank-based Loss Functions
The accuracy of information retrieval systems is often measured using complex
loss functions such as the average precision (AP) or the normalized discounted
cumulative gain (NDCG). Given a set of positive and negative samples, the
parameters of a retrieval system can be estimated by minimizing these loss
functions. However, the non-differentiability and non-decomposability of these
loss functions does not allow for simple gradient based optimization
algorithms. This issue is generally circumvented by either optimizing a
structured hinge-loss upper bound to the loss function or by using asymptotic
methods like the direct-loss minimization framework. Yet, the high
computational complexity of loss-augmented inference, which is necessary for
both the frameworks, prohibits its use in large training data sets. To
alleviate this deficiency, we present a novel quicksort flavored algorithm for
a large class of non-decomposable loss functions. We provide a complete
characterization of the loss functions that are amenable to our algorithm, and
show that it includes both AP and NDCG based loss functions. Furthermore, we
prove that no comparison based algorithm can improve upon the computational
complexity of our approach asymptotically. We demonstrate the effectiveness of
our approach in the context of optimizing the structured hinge loss upper bound
of AP and NDCG loss for learning models for a variety of vision tasks. We show
that our approach provides significantly better results than simpler
decomposable loss functions, while requiring a comparable training time.Comment: 15 pages, 2 figure
The Complexity of General-Valued CSPs
An instance of the Valued Constraint Satisfaction Problem (VCSP) is given by a finite set of variables, a finite domain of labels, and a sum of functions, each function depending on a subset of the variables. Each function can take finite values specifying costs of assignments of labels to its variables or the infinite value, which indicates an infeasible assignment. The goal is to find an assignment of labels to the variables that minimizes the sum. We study, assuming that P ≠NP, how the complexity of this very general problem depends on the set of functions allowed in the instances, the so-called constraint language. The case when all allowed functions take values in {0, ∞} corresponds to ordinary CSPs, where one deals only with the feasibility issue and there is no optimization. This case is the subject of the Algebraic CSP Dichotomy Conjecture predicting for which constraint languages CSPs are tractable (i.e. solvable in polynomial time) and for which NP-hard. The case when all allowed functions take only finite values corresponds to finite-valued CSP, where the feasibility aspect is trivial and one deals only with the optimization issue. The complexity of finite-valued CSPs was fully classified by Thapper and Zivny. An algebraic necessary condition for tractability of a general-valued CSP with a fixed constraint language was recently given by Kozik and Ochremiak. As our main result, we prove that if a constraint language satisfies this algebraic necessary condition, and the feasibility CSP (i.e. the problem of deciding whether a given instance has a feasible solution) corresponding to the VCSP with this language is tractable, then the VCSP is tractable. The algorithm is a simple combination of the assumed algorithm for the feasibility CSP and the standard LP relaxation. As a corollary, we obtain that a dichotomy for ordinary CSPs would imply a dichotomy for general-valued CSPs
Variational Autoencoders Pursue PCA Directions (by Accident)
The Variational Autoencoder (VAE) is a powerful architecture capable of
representation learning and generative modeling. When it comes to learning
interpretable (disentangled) representations, VAE and its variants show
unparalleled performance. However, the reasons for this are unclear, since a
very particular alignment of the latent embedding is needed but the design of
the VAE does not encourage it in any explicit way. We address this matter and
offer the following explanation: the diagonal approximation in the encoder
together with the inherent stochasticity force local orthogonality of the
decoder. The local behavior of promoting both reconstruction and orthogonality
matches closely how the PCA embedding is chosen. Alongside providing an
intuitive understanding, we justify the statement with full theoretical
analysis as well as with experiments
Efficient optimization for rank-based loss functions
The accuracy of information retrieval systems is often measured using complex loss functions such as the average precision (AP) or the normalized discounted cumulative gain (NDCG). Given a set of positive and negative samples, the parameters of a retrieval system can be estimated by minimizing these loss functions. However, the non-differentiability and non-decomposability of these loss functions does not allow for simple gradient based optimization algorithms. This issue is generally circumvented by either optimizing a structured hinge-loss upper bound to the loss function or by using asymptotic methods like the direct-loss minimization framework. Yet, the high computational complexity of loss-augmented inference, which is necessary for both the frameworks, prohibits its use in large training data sets. To alleviate this deficiency, we present a novel quicksort flavored algorithm for a large class of non-decomposable loss functions. We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions. Furthermore, we prove that no comparison based algorithm can improve upon the computational complexity of our approach asymptotically. We demonstrate the effectiveness of our approach in the context of optimizing the structured hinge loss upper bound of AP and NDCG loss for learning models for a variety of vision tasks. We show that our approach provides significantly better results than simpler decomposable loss functions, while requiring a comparable training time