3,882 research outputs found
Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees
Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe
(FW) algorithms regained popularity in recent years due to their simplicity,
effectiveness and theoretical guarantees. MP and FW address optimization over
the linear span and the convex hull of a set of atoms, respectively. In this
paper, we consider the intermediate case of optimization over the convex cone,
parametrized as the conic hull of a generic atom set, leading to the first
principled definitions of non-negative MP algorithms for which we give explicit
convergence rates and demonstrate excellent empirical performance. In
particular, we derive sublinear () convergence on general
smooth and convex objectives, and linear convergence () on
strongly convex objectives, in both cases for general sets of atoms.
Furthermore, we establish a clear correspondence of our algorithms to known
algorithms from the MP and FW literature. Our novel algorithms and analyses
target general atom sets and general objective functions, and hence are
directly applicable to a large variety of learning settings.Comment: NIPS 201
A Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization: Convergence Analysis and Optimality
Symmetric nonnegative matrix factorization (SymNMF) has important
applications in data analytics problems such as document clustering, community
detection and image segmentation. In this paper, we propose a novel nonconvex
variable splitting method for solving SymNMF. The proposed algorithm is
guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the
nonconvex SymNMF problem. Furthermore, it achieves a global sublinear
convergence rate. We also show that the algorithm can be efficiently
implemented in parallel. Further, sufficient conditions are provided which
guarantee the global and local optimality of the obtained solutions. Extensive
numerical results performed on both synthetic and real data sets suggest that
the proposed algorithm converges quickly to a local minimum solution.Comment: IEEE Transactions on Signal Processing (to appear
A unified approach to non-negative matrix factorization and probabilistic latent semantic indexing
Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data.
We present a generalized approach to NMF based on Renyi\u27s divergence between two non-negative matrices related to the Poisson likelihood. Our approach unifies various competing models and provides a unique framework for NMF. Furthermore, we generalize the equivalence between NMF and probabilistic latent semantic indexing, a well-known method used in text mining and document clustering applications. We evaluate the performance of our method in the unsupervised setting using consensus clustering and demonstrate its applicability using real-life and simulated data
Exploring multimodal data fusion through joint decompositions with flexible couplings
A Bayesian framework is proposed to define flexible coupling models for joint
tensor decompositions of multiple data sets. Under this framework, a natural
formulation of the data fusion problem is to cast it in terms of a joint
maximum a posteriori (MAP) estimator. Data driven scenarios of joint posterior
distributions are provided, including general Gaussian priors and non Gaussian
coupling priors. We present and discuss implementation issues of algorithms
used to obtain the joint MAP estimator. We also show how this framework can be
adapted to tackle the problem of joint decompositions of large datasets. In the
case of a conditional Gaussian coupling with a linear transformation, we give
theoretical bounds on the data fusion performance using the Bayesian Cramer-Rao
bound. Simulations are reported for hybrid coupling models ranging from simple
additive Gaussian models, to Gamma-type models with positive variables and to
the coupling of data sets which are inherently of different size due to
different resolution of the measurement devices.Comment: 15 pages, 7 figures, revised versio
Adaptive Density Estimation for Generative Models
Unsupervised learning of generative models has seen tremendous progress over
recent years, in particular due to generative adversarial networks (GANs),
variational autoencoders, and flow-based models. GANs have dramatically
improved sample quality, but suffer from two drawbacks: (i) they mode-drop,
i.e., do not cover the full support of the train data, and (ii) they do not
allow for likelihood evaluations on held-out data. In contrast,
likelihood-based training encourages models to cover the full support of the
train data, but yields poorer samples. These mutual shortcomings can in
principle be addressed by training generative latent variable models in a
hybrid adversarial-likelihood manner. However, we show that commonly made
parametric assumptions create a conflict between them, making successful hybrid
models non trivial. As a solution, we propose to use deep invertible
transformations in the latent variable decoder. This approach allows for
likelihood computations in image space, is more efficient than fully invertible
models, and can take full advantage of adversarial training. We show that our
model significantly improves over existing hybrid models: offering GAN-like
samples, IS and FID scores that are competitive with fully adversarial models,
and improved likelihood scores
A new steplength selection for scaled gradient methods with application to image deblurring
Gradient methods are frequently used in large scale image deblurring problems
since they avoid the onerous computation of the Hessian matrix of the objective
function. Second order information is typically sought by a clever choice of
the steplength parameter defining the descent direction, as in the case of the
well-known Barzilai and Borwein rules. In a recent paper, a strategy for the
steplength selection approximating the inverse of some eigenvalues of the
Hessian matrix has been proposed for gradient methods applied to unconstrained
minimization problems. In the quadratic case, this approach is based on a
Lanczos process applied every m iterations to the matrix of the most recent m
back gradients but the idea can be extended to a general objective function. In
this paper we extend this rule to the case of scaled gradient projection
methods applied to non-negatively constrained minimization problems, and we
test the effectiveness of the proposed strategy in image deblurring problems in
both the presence and the absence of an explicit edge-preserving regularization
term
- …