597 research outputs found
Identifying global optimality for dictionary learning
Learning new representations of input observations in machine learning is
often tackled using a factorization of the data. For many such problems,
including sparse coding and matrix completion, learning these factorizations
can be difficult, in terms of efficiency and to guarantee that the solution is
a global minimum. Recently, a general class of objectives have been
introduced-which we term induced dictionary learning models (DLMs)-that have an
induced convex form that enables global optimization. Though attractive
theoretically, this induced form is impractical, particularly for large or
growing datasets. In this work, we investigate the use of practical alternating
minimization algorithms for induced DLMs, that ensure convergence to global
optima. We characterize the stationary points of these models, and, using these
insights, highlight practical choices for the objectives. We then provide
theoretical and empirical evidence that alternating minimization, from a random
initialization, converges to global minima for a large subclass of induced
DLMs. In particular, we take advantage of the existence of the (potentially
unknown) convex induced form, to identify when stationary points are global
minima for the dictionary learning objective. We then provide an empirical
investigation into practical optimization choices for using alternating
minimization for induced DLMs, for both batch and stochastic gradient descent.Comment: Updates to previous version include a small modification to
Proposition 2, to only use normed regularizers, and a modification to the
main theorem (previously Theorem 13) to focus on the overcomplete, full rank
setting and to better characterize non-differentiable induced regularizers.
The theory has been significantly modified since version
Robust sketching for multiple square-root LASSO problems
Many learning tasks, such as cross-validation, parameter search, or
leave-one-out analysis, involve multiple instances of similar problems, each
instance sharing a large part of learning data with the others. We introduce a
robust framework for solving multiple square-root LASSO problems, based on a
sketch of the learning data that uses low-rank approximations. Our approach
allows a dramatic reduction in computational effort, in effect reducing the
number of observations from (the number of observations to start with) to
(the number of singular values retained in the low-rank model), while not
sacrificing---sometimes even improving---the statistical performance.
Theoretical analysis, as well as numerical experiments on both synthetic and
real data, illustrate the efficiency of the method in large scale applications
Randomized Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a powerful tool for data mining.
However, the emergence of `big data' has severely challenged our ability to
compute this fundamental decomposition using deterministic algorithms. This
paper presents a randomized hierarchical alternating least squares (HALS)
algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative
input data, a more efficient nonnegative decomposition can be computed. Our
algorithm scales to big data applications while attaining a near-optimal
factorization. The proposed algorithm is evaluated using synthetic and real
world data and shows substantial speedups compared to deterministic HALS.Comment: This is an extended and revised version of the paper which appeared
in JPR
Global and Local Structure Preserving Sparse Subspace Learning: An Iterative Approach to Unsupervised Feature Selection
As we aim at alleviating the curse of high-dimensionality, subspace learning
is becoming more popular. Existing approaches use either information about
global or local structure of the data, and few studies simultaneously focus on
global and local structures as the both of them contain important information.
In this paper, we propose a global and local structure preserving sparse
subspace learning (GLoSS) model for unsupervised feature selection. The model
can simultaneously realize feature selection and subspace learning. In
addition, we develop a greedy algorithm to establish a generic combinatorial
model, and an iterative strategy based on an accelerated block coordinate
descent is used to solve the GLoSS problem. We also provide whole iterate
sequence convergence analysis of the proposed iterative algorithm. Extensive
experiments are conducted on real-world datasets to show the superiority of the
proposed approach over several state-of-the-art unsupervised feature selection
approaches.Comment: 32 page, 6 figures and 60 reference
Convolutional Neural Networks with Transformed Input based on Robust Tensor Network Decomposition
Tensor network decomposition, originated from quantum physics to model
entangled many-particle quantum systems, turns out to be a promising
mathematical technique to efficiently represent and process big data in
parsimonious manner. In this study, we show that tensor networks can
systematically partition structured data, e.g. color images, for distributed
storage and communication in privacy-preserving manner. Leveraging the sea of
big data and metadata privacy, empirical results show that neighbouring
subtensors with implicit information stored in tensor network formats cannot be
identified for data reconstruction. This technique complements the existing
encryption and randomization techniques which store explicit data
representation at one place and highly susceptible to adversarial attacks such
as side-channel attacks and de-anonymization. Furthermore, we propose a theory
for adversarial examples that mislead convolutional neural networks to
misclassification using subspace analysis based on singular value decomposition
(SVD). The theory is extended to analyze higher-order tensors using
tensor-train SVD (TT-SVD); it helps to explain the level of susceptibility of
different datasets to adversarial attacks, the structural similarity of
different adversarial attacks including global and localized attacks, and the
efficacy of different adversarial defenses based on input transformation. An
efficient and adaptive algorithm based on robust TT-SVD is then developed to
detect strong and static adversarial attacks
Orthogonal Deep Neural Networks
In this paper, we introduce the algorithms of Orthogonal Deep Neural Networks
(OrthDNNs) to connect with recent interest of spectrally regularized deep
learning methods. OrthDNNs are theoretically motivated by generalization
analysis of modern DNNs, with the aim to find solution properties of network
weights that guarantee better generalization. To this end, we first prove that
DNNs are of local isometry on data distributions of practical interest; by
using a new covering of the sample space and introducing the local isometry
property of DNNs into generalization analysis, we establish a new
generalization error bound that is both scale- and range-sensitive to singular
value spectrum of each of networks' weight matrices. We prove that the optimal
bound w.r.t. the degree of isometry is attained when each weight matrix has a
spectrum of equal singular values, among which orthogonal weight matrix or a
non-square one with orthonormal rows or columns is the most straightforward
choice, suggesting the algorithms of OrthDNNs. We present both algorithms of
strict and approximate OrthDNNs, and for the later ones we propose a simple yet
effective algorithm called Singular Value Bounding (SVB), which performs as
well as strict OrthDNNs, but at a much lower computational cost. We also
propose Bounded Batch Normalization (BBN) to make compatible use of batch
normalization with OrthDNNs. We conduct extensive comparative studies by using
modern architectures on benchmark image classification. Experiments show the
efficacy of OrthDNNs.Comment: To Appear in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
A survey of dimensionality reduction techniques
Experimental life sciences like biology or chemistry have seen in the recent
decades an explosion of the data available from experiments. Laboratory
instruments become more and more complex and report hundreds or thousands
measurements for a single experiment and therefore the statistical methods face
challenging tasks when dealing with such high dimensional data. However, much
of the data is highly redundant and can be efficiently brought down to a much
smaller number of variables without a significant loss of information. The
mathematical procedures making possible this reduction are called
dimensionality reduction techniques; they have widely been developed by fields
like Statistics or Machine Learning, and are currently a hot research topic. In
this review we categorize the plethora of dimension reduction techniques
available and give the mathematical insight behind them
Scalable Deep -Subspace Clustering
Subspace clustering algorithms are notorious for their scalability issues
because building and processing large affinity matrices are demanding. In this
paper, we introduce a method that simultaneously learns an embedding space
along subspaces within it to minimize a notion of reconstruction error, thus
addressing the problem of subspace clustering in an end-to-end learning
paradigm. To achieve our goal, we propose a scheme to update subspaces within a
deep neural network. This in turn frees us from the need of having an affinity
matrix to perform clustering. Unlike previous attempts, our method can easily
scale up to large datasets, making it unique in the context of unsupervised
learning with deep architectures. Our experiments show that our method
significantly improves the clustering accuracy while enjoying cheaper memory
footprints.Comment: To appear in ACCV 201
Integrative Multi-View Reduced-Rank Regression: Bridging Group-Sparse and Low-Rank Models
Multi-view data have been routinely collected in various fields of science
and engineering. A general problem is to study the predictive association
between multivariate responses and multi-view predictor sets, all of which can
be of high dimensionality. It is likely that only a few views are relevant to
prediction, and the predictors within each relevant view contribute to the
prediction collectively rather than sparsely. We cast this new problem under
the familiar multivariate regression framework and propose an integrative
reduced-rank regression (iRRR), where each view has its own low-rank
coefficient matrix. As such, latent features are extracted from each view in a
supervised fashion. For model estimation, we develop a convex composite nuclear
norm penalization approach, which admits an efficient algorithm via alternating
direction method of multipliers. Extensions to non-Gaussian and incomplete data
are discussed. Theoretically, we derive non-asymptotic oracle bounds of iRRR
under a restricted eigenvalue condition. Our results recover oracle bounds of
several special cases of iRRR including Lasso, group Lasso and nuclear norm
penalized regression. Therefore, iRRR seamlessly bridges group-sparse and
low-rank methods and can achieve substantially faster convergence rate under
realistic settings of multi-view learning. Simulation studies and an
application in the Longitudinal Studies of Aging further showcase the efficacy
of the proposed methods
Inferring biological networks by sparse identification of nonlinear dynamics
Inferring the structure and dynamics of network models is critical to
understanding the functionality and control of complex systems, such as
metabolic and regulatory biological networks. The increasing quality and
quantity of experimental data enable statistical approaches based on
information theory for model selection and goodness-of-fit metrics. We propose
an alternative method to infer networked nonlinear dynamical systems by using
sparsity-promoting optimization to select a subset of nonlinear
interactions representing dynamics on a fully connected network. Our method
generalizes the sparse identification of nonlinear dynamics (SINDy) algorithm
to dynamical systems with rational function nonlinearities, such as biological
networks. We show that dynamical systems with rational nonlinearities may be
cast in an implicit form, where the equations may be identified in the
null-space of a library of mixed nonlinearities including the state and
derivative terms; this approach applies more generally to implicit dynamical
systems beyond those containing rational nonlinearities. This method,
implicit-SINDy, succeeds in inferring three canonical biological models:
Michaelis-Menten enzyme kinetics, the regulatory network for competence in
bacteria, and the metabolic network for yeast glycolysis.Comment: 11 pages, 6 figure
- …