926 research outputs found

    Impossibility of dimension reduction in the nuclear norm

    Full text link
    Let S1\mathsf{S}_1 (the Schatten--von Neumann trace class) denote the Banach space of all compact linear operators T:22T:\ell_2\to \ell_2 whose nuclear norm TS1=j=1σj(T)\|T\|_{\mathsf{S}_1}=\sum_{j=1}^\infty\sigma_j(T) is finite, where {σj(T)}j=1\{\sigma_j(T)\}_{j=1}^\infty are the singular values of TT. We prove that for arbitrarily large nNn\in \mathbb{N} there exists a subset CS1\mathcal{C}\subseteq \mathsf{S}_1 with C=n|\mathcal{C}|=n that cannot be embedded with bi-Lipschitz distortion O(1)O(1) into any no(1)n^{o(1)}-dimensional linear subspace of S1\mathsf{S}_1. C\mathcal{C} is not even a O(1)O(1)-Lipschitz quotient of any subset of any no(1)n^{o(1)}-dimensional linear subspace of S1\mathsf{S}_1. Thus, S1\mathsf{S}_1 does not admit a dimension reduction result \'a la Johnson and Lindenstrauss (1984), which complements the work of Harrow, Montanaro and Short (2011) on the limitations of quantum dimension reduction under the assumption that the embedding into low dimensions is a quantum channel. Such a statement was previously known with S1\mathsf{S}_1 replaced by the Banach space 1\ell_1 of absolutely summable sequences via the work of Brinkman and Charikar (2003). In fact, the above set C\mathcal{C} can be taken to be the same set as the one that Brinkman and Charikar considered, viewed as a collection of diagonal matrices in S1\mathsf{S}_1. The challenge is to demonstrate that C\mathcal{C} cannot be faithfully realized in an arbitrary low-dimensional subspace of S1\mathsf{S}_1, while Brinkman and Charikar obtained such an assertion only for subspaces of S1\mathsf{S}_1 that consist of diagonal operators (i.e., subspaces of 1\ell_1). We establish this by proving that the Markov 2-convexity constant of any finite dimensional linear subspace XX of S1\mathsf{S}_1 is at most a universal constant multiple of logdim(X)\sqrt{\log \mathrm{dim}(X)}

    An Efficient Dual Approach to Distance Metric Learning

    Full text link
    Distance metric learning is of fundamental interest in machine learning because the distance metric employed can significantly affect the performance of many learning methods. Quadratic Mahalanobis metric learning is a popular approach to the problem, but typically requires solving a semidefinite programming (SDP) problem, which is computationally expensive. Standard interior-point SDP solvers typically have a complexity of O(D6.5)O(D^{6.5}) (with DD the dimension of input data), and can thus only practically solve problems exhibiting less than a few thousand variables. Since the number of variables is D(D+1)/2D (D+1) / 2 , this implies a limit upon the size of problem that can practically be solved of around a few hundred dimensions. The complexity of the popular quadratic Mahalanobis metric learning approach thus limits the size of problem to which metric learning can be applied. Here we propose a significantly more efficient approach to the metric learning problem based on the Lagrange dual formulation of the problem. The proposed formulation is much simpler to implement, and therefore allows much larger Mahalanobis metric learning problems to be solved. The time complexity of the proposed method is O(D3)O (D ^ 3) , which is significantly lower than that of the SDP approach. Experiments on a variety of datasets demonstrate that the proposed method achieves an accuracy comparable to the state-of-the-art, but is applicable to significantly larger problems. We also show that the proposed method can be applied to solve more general Frobenius-norm regularized SDP problems approximately

    Regularized Optimal Transport and the Rot Mover's Distance

    Full text link
    This paper presents a unified framework for smooth convex regularization of discrete optimal transport problems. In this context, the regularized optimal transport turns out to be equivalent to a matrix nearness problem with respect to Bregman divergences. Our framework thus naturally generalizes a previously proposed regularization based on the Boltzmann-Shannon entropy related to the Kullback-Leibler divergence, and solved with the Sinkhorn-Knopp algorithm. We call the regularized optimal transport distance the rot mover's distance in reference to the classical earth mover's distance. We develop two generic schemes that we respectively call the alternate scaling algorithm and the non-negative alternate scaling algorithm, to compute efficiently the regularized optimal plans depending on whether the domain of the regularizer lies within the non-negative orthant or not. These schemes are based on Dykstra's algorithm with alternate Bregman projections, and further exploit the Newton-Raphson method when applied to separable divergences. We enhance the separable case with a sparse extension to deal with high data dimensions. We also instantiate our proposed framework and discuss the inherent specificities for well-known regularizers and statistical divergences in the machine learning and information geometry communities. Finally, we demonstrate the merits of our methods with experiments using synthetic data to illustrate the effect of different regularizers and penalties on the solutions, as well as real-world data for a pattern recognition application to audio scene classification

    Efficient Relaxations for Dense CRFs with Sparse Higher Order Potentials

    Full text link
    Dense conditional random fields (CRFs) have become a popular framework for modelling several problems in computer vision such as stereo correspondence and multi-class semantic segmentation. By modelling long-range interactions, dense CRFs provide a labelling that captures finer detail than their sparse counterparts. Currently, the state-of-the-art algorithm performs mean-field inference using a filter-based method but fails to provide a strong theoretical guarantee on the quality of the solution. A question naturally arises as to whether it is possible to obtain a maximum a posteriori (MAP) estimate of a dense CRF using a principled method. Within this paper, we show that this is indeed possible. We will show that, by using a filter-based method, continuous relaxations of the MAP problem can be optimised efficiently using state-of-the-art algorithms. Specifically, we will solve a quadratic programming (QP) relaxation using the Frank-Wolfe algorithm and a linear programming (LP) relaxation by developing a proximal minimisation framework. By exploiting labelling consistency in the higher-order potentials and utilising the filter-based method, we are able to formulate the above algorithms such that each iteration has a complexity linear in the number of classes and random variables. The presented algorithms can be applied to any labelling problem using a dense CRF with sparse higher-order potentials. In this paper, we use semantic segmentation as an example application as it demonstrates the ability of the algorithm to scale to dense CRFs with large dimensions. We perform experiments on the Pascal dataset to indicate that the presented algorithms are able to attain lower energies than the mean-field inference method
    corecore