376 research outputs found
Primal-Dual Algorithms for Non-negative Matrix Factorization with the Kullback-Leibler Divergence
Non-negative matrix factorization (NMF) approximates a given matrix as a
product of two non-negative matrices. Multiplicative algorithms deliver
reliable results, but they show slow convergence for high-dimensional data and
may be stuck away from local minima. Gradient descent methods have better
behavior, but only apply to smooth losses such as the least-squares loss. In
this article, we propose a first-order primal-dual algorithm for non-negative
decomposition problems (where one factor is fixed) with the KL divergence,
based on the Chambolle-Pock algorithm. All required computations may be
obtained in closed form and we provide an efficient heuristic way to select
step-sizes. By using alternating optimization, our algorithm readily extends to
NMF and, on synthetic examples, face recognition or music source separation
datasets, it is either faster than existing algorithms, or leads to improved
local optima, or both
Regularized Optimal Transport and the Rot Mover's Distance
This paper presents a unified framework for smooth convex regularization of
discrete optimal transport problems. In this context, the regularized optimal
transport turns out to be equivalent to a matrix nearness problem with respect
to Bregman divergences. Our framework thus naturally generalizes a previously
proposed regularization based on the Boltzmann-Shannon entropy related to the
Kullback-Leibler divergence, and solved with the Sinkhorn-Knopp algorithm. We
call the regularized optimal transport distance the rot mover's distance in
reference to the classical earth mover's distance. We develop two generic
schemes that we respectively call the alternate scaling algorithm and the
non-negative alternate scaling algorithm, to compute efficiently the
regularized optimal plans depending on whether the domain of the regularizer
lies within the non-negative orthant or not. These schemes are based on
Dykstra's algorithm with alternate Bregman projections, and further exploit the
Newton-Raphson method when applied to separable divergences. We enhance the
separable case with a sparse extension to deal with high data dimensions. We
also instantiate our proposed framework and discuss the inherent specificities
for well-known regularizers and statistical divergences in the machine learning
and information geometry communities. Finally, we demonstrate the merits of our
methods with experiments using synthetic data to illustrate the effect of
different regularizers and penalties on the solutions, as well as real-world
data for a pattern recognition application to audio scene classification
Topological Data Analysis with Bregman Divergences
Given a finite set in a metric space, the topological analysis generalizes
hierarchical clustering using a 1-parameter family of homology groups to
quantify connectivity in all dimensions. The connectivity is compactly
described by the persistence diagram. One limitation of the current framework
is the reliance on metric distances, whereas in many practical applications
objects are compared by non-metric dissimilarity measures. Examples are the
Kullback-Leibler divergence, which is commonly used for comparing text and
images, and the Itakura-Saito divergence, popular for speech and sound. These
are two members of the broad family of dissimilarities called Bregman
divergences.
We show that the framework of topological data analysis can be extended to
general Bregman divergences, widening the scope of possible applications. In
particular, we prove that appropriately generalized Cech and Delaunay (alpha)
complexes capture the correct homotopy type, namely that of the corresponding
union of Bregman balls. Consequently, their filtrations give the correct
persistence diagram, namely the one generated by the uniformly growing Bregman
balls. Moreover, we show that unlike the metric setting, the filtration of
Vietoris-Rips complexes may fail to approximate the persistence diagram. We
propose algorithms to compute the thus generalized Cech, Vietoris-Rips and
Delaunay complexes and experimentally test their efficiency. Lastly, we explain
their surprisingly good performance by making a connection with discrete Morse
theory
Proximity Operators of Discrete Information Divergences
Information divergences allow one to assess how close two distributions are
from each other. Among the large panel of available measures, a special
attention has been paid to convex -divergences, such as
Kullback-Leibler, Jeffreys-Kullback, Hellinger, Chi-Square, Renyi, and
I divergences. While -divergences have been extensively
studied in convex analysis, their use in optimization problems often remains
challenging. In this regard, one of the main shortcomings of existing methods
is that the minimization of -divergences is usually performed with
respect to one of their arguments, possibly within alternating optimization
techniques. In this paper, we overcome this limitation by deriving new
closed-form expressions for the proximity operator of such two-variable
functions. This makes it possible to employ standard proximal methods for
efficiently solving a wide range of convex optimization problems involving
-divergences. In addition, we show that these proximity operators are
useful to compute the epigraphical projection of several functions of practical
interest. The proposed proximal tools are numerically validated in the context
of optimal query execution within database management systems, where the
problem of selectivity estimation plays a central role. Experiments are carried
out on small to large scale scenarios
Blind Source Separation with Optimal Transport Non-negative Matrix Factorization
Optimal transport as a loss for machine learning optimization problems has
recently gained a lot of attention. Building upon recent advances in
computational optimal transport, we develop an optimal transport non-negative
matrix factorization (NMF) algorithm for supervised speech blind source
separation (BSS). Optimal transport allows us to design and leverage a cost
between short-time Fourier transform (STFT) spectrogram frequencies, which
takes into account how humans perceive sound. We give empirical evidence that
using our proposed optimal transport NMF leads to perceptually better results
than Euclidean NMF, for both isolated voice reconstruction and BSS tasks.
Finally, we demonstrate how to use optimal transport for cross domain sound
processing tasks, where frequencies represented in the input spectrograms may
be different from one spectrogram to another.Comment: 22 pages, 7 figures, 2 additional file
Factor analysis with finite data
Factor analysis aims to describe high dimensional random vectors by means of
a small number of unknown common factors. In mathematical terms, it is required
to decompose the covariance matrix of the random vector as the sum of
a diagonal matrix | accounting for the idiosyncratic noise in the data |
and a low rank matrix | accounting for the variance of the common factors |
in such a way that the rank of is as small as possible so that the number
of common factors is minimal. In practice, however, the matrix is
unknown and must be replaced by its estimate, i.e. the sample covariance, which
comes from a finite amount of data. This paper provides a strategy to account
for the uncertainty in the estimation of in the factor analysis
problem.Comment: Draft, the final version will appear in the 56th IEEE Conference on
Decision and Control, Melbourne, Australia, 201
Entropic Wasserstein Gradient Flows
This article details a novel numerical scheme to approximate gradient flows
for optimal transport (i.e. Wasserstein) metrics. These flows have proved
useful to tackle theoretically and numerically non-linear diffusion equations
that model for instance porous media or crowd evolutions. These gradient flows
define a suitable notion of weak solutions for these evolutions and they can be
approximated in a stable way using discrete flows. These discrete flows are
implicit Euler time stepping according to the Wasserstein metric. A bottleneck
of these approaches is the high computational load induced by the resolution of
each step. Indeed, this corresponds to the resolution of a convex optimization
problem involving a Wasserstein distance to the previous iterate. Following
several recent works on the approximation of Wasserstein distances, we consider
a discrete flow induced by an entropic regularization of the transportation
coupling. This entropic regularization allows one to trade the initial
Wasserstein fidelity term for a Kulback-Leibler divergence, which is easier to
deal with numerically. We show how KL proximal schemes, and in particular
Dykstra's algorithm, can be used to compute each step of the regularized flow.
The resulting algorithm is both fast, parallelizable and versatile, because it
only requires multiplications by a Gibbs kernel. On Euclidean domains
discretized on an uniform grid, this corresponds to a linear filtering (for
instance a Gaussian filtering when is the squared Euclidean distance) which
can be computed in nearly linear time. On more general domains, such as
(possibly non-convex) shapes or on manifolds discretized by a triangular mesh,
following a recently proposed numerical scheme for optimal transport, this
Gibbs kernel multiplication is approximated by a short-time heat diffusion
- …