222 research outputs found
Dictionary-based Tensor Canonical Polyadic Decomposition
To ensure interpretability of extracted sources in tensor decomposition, we
introduce in this paper a dictionary-based tensor canonical polyadic
decomposition which enforces one factor to belong exactly to a known
dictionary. A new formulation of sparse coding is proposed which enables high
dimensional tensors dictionary-based canonical polyadic decomposition. The
benefits of using a dictionary in tensor decomposition models are explored both
in terms of parameter identifiability and estimation accuracy. Performances of
the proposed algorithms are evaluated on the decomposition of simulated data
and the unmixing of hyperspectral images
Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing
Nonnegative matrix factorization (NMF) has become a very popular technique in
machine learning because it automatically extracts meaningful features through
a sparse and part-based representation. However, NMF has the drawback of being
highly ill-posed, that is, there typically exist many different but equivalent
factorizations. In this paper, we introduce a completely new way to obtaining
more well-posed NMF problems whose solutions are sparser. Our technique is
based on the preprocessing of the nonnegative input data matrix, and relies on
the theory of M-matrices and the geometric interpretation of NMF. This approach
provably leads to optimal and sparse solutions under the separability
assumption of Donoho and Stodden (NIPS, 2003), and, for rank-three matrices,
makes the number of exact factorizations finite. We illustrate the
effectiveness of our technique on several image datasets.Comment: 34 pages, 11 figure
Model Selection for Nonnegative Matrix Factorization by Support Union Recovery
Nonnegative matrix factorization (NMF) has been widely used in machine
learning and signal processing because of its non-subtractive, part-based
property which enhances interpretability. It is often assumed that the latent
dimensionality (or the number of components) is given. Despite the large amount
of algorithms designed for NMF, there is little literature about automatic
model selection for NMF with theoretical guarantees. In this paper, we propose
an algorithm that first calculates an empirical second-order moment from the
empirical fourth-order cumulant tensor, and then estimates the latent
dimensionality by recovering the support union (the index set of non-zero rows)
of a matrix related to the empirical second-order moment. By assuming a
generative model of the data with additional mild conditions, our algorithm
provably detects the true latent dimensionality. We show on synthetic examples
that our proposed algorithm is able to find an approximately correct number of
components
A review on initialization methods for nonnegative matrix factorization: Towards omics data experiments
Nonnegative Matrix Factorization (NMF) has acquired a relevant role in the panorama of knowledge extraction, thanks to the peculiarity that non-negativity applies to both bases and weights, which allows meaningful interpretations and is consistent with the natural human part-based learning process. Nevertheless, most NMF algorithms are iterative, so initialization methods affect convergence behaviour, the quality of the final solution, and NMF performance in terms of the residual of the cost function. Studies on the impact of NMF initialization techniques have been conducted for text or image datasets, but very few considerations can be found in the literature when biological datasets are studied, even though NMFs have largely demonstrated their usefulness in better understanding biological mechanisms with omic datasets. This paper aims to present the state-of-the-art on NMF initialization schemes along with some initial considerations on the impact of initialization methods when microarrays (a simple instance of omic data) are evaluated with NMF mechanisms. Using a series of measures to qualitatively examine the biological information extracted by a given NMF scheme, it preliminary appears that some information (e.g., represented by genes) can be extracted regardless of the initialization scheme used
A Fast Gradient Method for Nonnegative Sparse Regression with Self Dictionary
A nonnegative matrix factorization (NMF) can be computed efficiently under
the separability assumption, which asserts that all the columns of the given
input data matrix belong to the cone generated by a (small) subset of them. The
provably most robust methods to identify these conic basis columns are based on
nonnegative sparse regression and self dictionaries, and require the solution
of large-scale convex optimization problems. In this paper we study a
particular nonnegative sparse regression model with self dictionary. As opposed
to previously proposed models, this model yields a smooth optimization problem
where the sparsity is enforced through linear constraints. We show that the
Euclidean projection on the polyhedron defined by these constraints can be
computed efficiently, and propose a fast gradient method to solve our model. We
compare our algorithm with several state-of-the-art methods on synthetic data
sets and real-world hyperspectral images
Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming
-means clustering is a widely used machine learning method for identifying
patterns in large datasets. Semidefinite programming (SDP) relaxations have
recently been proposed for solving the -means optimization problem that
enjoy strong statistical optimality guarantees, but the prohibitive cost of
implementing an SDP solver renders these guarantees inaccessible to practical
datasets. By contrast, nonnegative matrix factorization (NMF) is a simple
clustering algorithm that is widely used by machine learning practitioners, but
without a solid statistical underpinning nor rigorous guarantees. In this
paper, we describe an NMF-like algorithm that works by solving a nonnegative
low-rank restriction of the SDP relaxed -means formulation using a nonconvex
Burer--Monteiro factorization approach. The resulting algorithm is just as
simple and scalable as state-of-the-art NMF algorithms, while also enjoying the
same strong statistical optimality guarantees as the SDP. In our experiments,
we observe that our algorithm achieves substantially smaller mis-clustering
errors compared to the existing state-of-the-art
- …