15,995 research outputs found
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous
variants of stochastic gradient descent have become popular methods to solve a
wide range of large-scale optimization problems on multi-core architectures.
Yet, despite their practical success, support for nonsmooth objectives is still
lacking, making them unsuitable for many problems of interest in machine
learning, such as the Lasso, group Lasso or empirical risk minimization with
convex constraints.
In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse
method inspired by SAGA, a variance reduced incremental gradient algorithm. The
proposed method is easy to implement and significantly outperforms the state of
the art on several nonsmooth, large-scale problems. We prove that our method
achieves a theoretical linear speedup with respect to the sequential version
under assumptions on the sparsity of gradients and block-separability of the
proximal term. Empirical benchmarks on a multi-core architecture illustrate
practical speedups of up to 12x on a 20-core machine.Comment: Appears in Advances in Neural Information Processing Systems 30 (NIPS
2017), 28 page
Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering
Let Q be a given nĂ—n square symmetric matrix of nonnegative elements between 0 and 1, similarities. Fuzzy clustering results in fuzzy assignment of individuals to K clusters. In additive fuzzy clustering, the nĂ—K fuzzy memberships matrix P is found by least-squares approximation of the off-diagonal elements of Q by inner products of rows of P. By contrast, kernelized fuzzy c-means is not least-squares and requires an additional fuzziness parameter. The aim is to popularize additive fuzzy clustering by interpreting it as a latent class model, whereby the elements of Q are modeled as the probability that two individuals share the same class on the basis of the assignment probability matrix P. Two new algorithms are provided, a brute force genetic algorithm (differential evolution) and an iterative row-wise quadratic programming algorithm of which the latter is the more effective. Simulations showed that (1) the method usually has a unique solution, except in special cases, (2) both algorithms reached this solution from random restarts and (3) the number of clusters can be well estimated by AIC. Additive fuzzy clustering is computationally efficient and combines attractive features of both the vector model and the cluster mode
Radio Astronomical Image Formation using Constrained Least Squares and Krylov Subspaces
Image formation for radio astronomy can be defined as estimating the spatial
power distribution of celestial sources over the sky, given an array of
antennas. One of the challenges with image formation is that the problem
becomes ill-posed as the number of pixels becomes large. The introduction of
constraints that incorporate a-priori knowledge is crucial. In this paper we
show that in addition to non-negativity, the magnitude of each pixel in an
image is also bounded from above. Indeed, the classical "dirty image" is an
upper bound, but a much tighter upper bound can be formed from the data using
array processing techniques. This formulates image formation as a least squares
optimization problem with inequality constraints. We propose to solve this
constrained least squares problem using active set techniques, and the steps
needed to implement it are described. It is shown that the least squares part
of the problem can be efficiently implemented with Krylov subspace based
techniques, where the structure of the problem allows massive parallelism and
reduced storage needs. The performance of the algorithm is evaluated using
simulations
Block Coordinate Descent for Sparse NMF
Nonnegative matrix factorization (NMF) has become a ubiquitous tool for data
analysis. An important variant is the sparse NMF problem which arises when we
explicitly require the learnt features to be sparse. A natural measure of
sparsity is the L norm, however its optimization is NP-hard. Mixed norms,
such as L/L measure, have been shown to model sparsity robustly, based
on intuitive attributes that such measures need to satisfy. This is in contrast
to computationally cheaper alternatives such as the plain L norm. However,
present algorithms designed for optimizing the mixed norm L/L are slow
and other formulations for sparse NMF have been proposed such as those based on
L and L norms. Our proposed algorithm allows us to solve the mixed norm
sparsity constraints while not sacrificing computation time. We present
experimental evidence on real-world datasets that shows our new algorithm
performs an order of magnitude faster compared to the current state-of-the-art
solvers optimizing the mixed norm and is suitable for large-scale datasets
- …