135 research outputs found
From Symmetry to Geometry: Tractable Nonconvex Problems
As science and engineering have become increasingly data-driven, the role of
optimization has expanded to touch almost every stage of the data analysis
pipeline, from the signal and data acquisition to modeling and prediction. The
optimization problems encountered in practice are often nonconvex. While
challenges vary from problem to problem, one common source of nonconvexity is
nonlinearity in the data or measurement model. Nonlinear models often exhibit
symmetries, creating complicated, nonconvex objective landscapes, with multiple
equivalent solutions. Nevertheless, simple methods (e.g., gradient descent)
often perform surprisingly well in practice.
The goal of this survey is to highlight a class of tractable nonconvex
problems, which can be understood through the lens of symmetries. These
problems exhibit a characteristic geometric structure: local minimizers are
symmetric copies of a single "ground truth" solution, while other critical
points occur at balanced superpositions of symmetric copies of the ground
truth, and exhibit negative curvature in directions that break the symmetry.
This structure enables efficient methods to obtain global minimizers. We
discuss examples of this phenomenon arising from a wide range of problems in
imaging, signal processing, and data analysis. We highlight the key role of
symmetry in shaping the objective landscape and discuss the different roles of
rotational and discrete symmetries. This area is rich with observed phenomena
and open problems; we close by highlighting directions for future research.Comment: review paper submitted to SIAM Review, 34 pages, 10 figure
Sampling binary sparse coding QUBO models using a spiking neuromorphic processor
We consider the problem of computing a sparse binary representation of an
image. To be precise, given an image and an overcomplete, non-orthonormal
basis, we aim to find a sparse binary vector indicating the minimal set of
basis vectors that when added together best reconstruct the given input. We
formulate this problem with an loss on the reconstruction error, and an
(or, equivalently, an ) loss on the binary vector enforcing
sparsity. This yields a so-called Quadratic Unconstrained Binary Optimization
(QUBO) problem, whose solution is generally NP-hard to find. The contribution
of this work is twofold. First, the method of unsupervised and unnormalized
dictionary feature learning for a desired sparsity level to best match the data
is presented. Second, the binary sparse coding problem is then solved on the
Loihi 1 neuromorphic chip by the use of stochastic networks of neurons to
traverse the non-convex energy landscape. The solutions are benchmarked against
the classical heuristic simulated annealing. We demonstrate neuromorphic
computing is suitable for sampling low energy solutions of binary sparse coding
QUBO models, and although Loihi 1 is capable of sampling very sparse solutions
of the QUBO models, there needs to be improvement in the implementation in
order to be competitive with simulated annealing
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Gaussian mixture models (GMM) are powerful parametric tools with many
applications in machine learning and computer vision. Expectation maximization
(EM) is the most popular algorithm for estimating the GMM parameters. However,
EM guarantees only convergence to a stationary point of the log-likelihood
function, which could be arbitrarily worse than the optimal solution. Inspired
by the relationship between the negative log-likelihood function and the
Kullback-Leibler (KL) divergence, we propose an alternative formulation for
estimating the GMM parameters using the sliced Wasserstein distance, which
gives rise to a new algorithm. Specifically, we propose minimizing the
sliced-Wasserstein distance between the mixture model and the data distribution
with respect to the GMM parameters. In contrast to the KL-divergence, the
energy landscape for the sliced-Wasserstein distance is more well-behaved and
therefore more suitable for a stochastic gradient descent scheme to obtain the
optimal GMM parameters. We show that our formulation results in parameter
estimates that are more robust to random initializations and demonstrate that
it can estimate high-dimensional data distributions more faithfully than the EM
algorithm
- β¦