273 research outputs found
Blind Source Separation with Optimal Transport Non-negative Matrix Factorization
Optimal transport as a loss for machine learning optimization problems has
recently gained a lot of attention. Building upon recent advances in
computational optimal transport, we develop an optimal transport non-negative
matrix factorization (NMF) algorithm for supervised speech blind source
separation (BSS). Optimal transport allows us to design and leverage a cost
between short-time Fourier transform (STFT) spectrogram frequencies, which
takes into account how humans perceive sound. We give empirical evidence that
using our proposed optimal transport NMF leads to perceptually better results
than Euclidean NMF, for both isolated voice reconstruction and BSS tasks.
Finally, we demonstrate how to use optimal transport for cross domain sound
processing tasks, where frequencies represented in the input spectrograms may
be different from one spectrogram to another.Comment: 22 pages, 7 figures, 2 additional file
Entropic optimal transport is maximum-likelihood deconvolution
We give a statistical interpretation of entropic optimal transport by showing
that performing maximum-likelihood estimation for Gaussian deconvolution
corresponds to calculating a projection with respect to the entropic optimal
transport distance. This structural result gives theoretical support for the
wide adoption of these tools in the machine learning community
Learning Generative Models with Sinkhorn Divergences
The ability to compare two degenerate probability distributions (i.e. two
probability distributions supported on two distinct low-dimensional manifolds
living in a much higher-dimensional space) is a crucial problem arising in the
estimation of generative models for high-dimensional observations such as those
arising in computer vision or natural language. It is known that optimal
transport metrics can represent a cure for this problem, since they were
specifically designed as an alternative to information divergences to handle
such problematic scenarios. Unfortunately, training generative machines using
OT raises formidable computational and statistical challenges, because of (i)
the computational burden of evaluating OT losses, (ii) the instability and lack
of smoothness of these losses, (iii) the difficulty to estimate robustly these
losses and their gradients in high dimension. This paper presents the first
tractable computational method to train large scale generative models using an
optimal transport loss, and tackles these three issues by relying on two key
ideas: (a) entropic smoothing, which turns the original OT loss into one that
can be computed using Sinkhorn fixed point iterations; (b) algorithmic
(automatic) differentiation of these iterations. These two approximations
result in a robust and differentiable approximation of the OT loss with
streamlined GPU execution. Entropic smoothing generates a family of losses
interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus
allowing to find a sweet spot leveraging the geometry of OT and the favorable
high-dimensional sample complexity of MMD which comes with unbiased gradient
estimates. The resulting computational architecture complements nicely standard
deep network generative models by a stack of extra layers implementing the loss
function
Model-Informed Machine Learning for Multi-component T2 Relaxometry
Recovering the T2 distribution from multi-echo T2 magnetic resonance (MR)
signals is challenging but has high potential as it provides biomarkers
characterizing the tissue micro-structure, such as the myelin water fraction
(MWF). In this work, we propose to combine machine learning and aspects of
parametric (fitting from the MRI signal using biophysical models) and
non-parametric (model-free fitting of the T2 distribution from the signal)
approaches to T2 relaxometry in brain tissue by using a multi-layer perceptron
(MLP) for the distribution reconstruction. For training our network, we
construct an extensive synthetic dataset derived from biophysical models in
order to constrain the outputs with \textit{a priori} knowledge of \textit{in
vivo} distributions. The proposed approach, called Model-Informed Machine
Learning (MIML), takes as input the MR signal and directly outputs the
associated T2 distribution. We evaluate MIML in comparison to non-parametric
and parametric approaches on synthetic data, an ex vivo scan, and
high-resolution scans of healthy subjects and a subject with Multiple
Sclerosis. In synthetic data, MIML provides more accurate and noise-robust
distributions. In real data, MWF maps derived from MIML exhibit the greatest
conformity to anatomical scans, have the highest correlation to a histological
map of myelin volume, and the best unambiguous lesion visualization and
localization, with superior contrast between lesions and normal appearing
tissue. In whole-brain analysis, MIML is 22 to 4980 times faster than
non-parametric and parametric methods, respectively.Comment: Preprint submitted to Medical Image Analysis (July 14, 2020
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Gaussian mixture models (GMM) are powerful parametric tools with many
applications in machine learning and computer vision. Expectation maximization
(EM) is the most popular algorithm for estimating the GMM parameters. However,
EM guarantees only convergence to a stationary point of the log-likelihood
function, which could be arbitrarily worse than the optimal solution. Inspired
by the relationship between the negative log-likelihood function and the
Kullback-Leibler (KL) divergence, we propose an alternative formulation for
estimating the GMM parameters using the sliced Wasserstein distance, which
gives rise to a new algorithm. Specifically, we propose minimizing the
sliced-Wasserstein distance between the mixture model and the data distribution
with respect to the GMM parameters. In contrast to the KL-divergence, the
energy landscape for the sliced-Wasserstein distance is more well-behaved and
therefore more suitable for a stochastic gradient descent scheme to obtain the
optimal GMM parameters. We show that our formulation results in parameter
estimates that are more robust to random initializations and demonstrate that
it can estimate high-dimensional data distributions more faithfully than the EM
algorithm
Ground Metric Learning on Graphs
Optimal transport (OT) distances between probability distributions are
parameterized by the ground metric they use between observations. Their
relevance for real-life applications strongly hinges on whether that ground
metric parameter is suitably chosen. Selecting it adaptively and
algorithmically from prior knowledge, the so-called ground metric learning GML)
problem, has therefore appeared in various settings. We consider it in this
paper when the learned metric is constrained to be a geodesic distance on a
graph that supports the measures of interest. This imposes a rich structure for
candidate metrics, but also enables far more efficient learning procedures when
compared to a direct optimization over the space of all metric matrices. We use
this setting to tackle an inverse problem stemming from the observation of a
density evolving with time: we seek a graph ground metric such that the OT
interpolation between the starting and ending densities that result from that
ground metric agrees with the observed evolution. This OT dynamic framework is
relevant to model natural phenomena exhibiting displacements of mass, such as
for instance the evolution of the color palette induced by the modification of
lighting and materials.Comment: Fixed sign of gradien
- …