2,053 research outputs found
Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery
PCA is one of the most widely used dimension reduction techniques. A related
easier problem is "subspace learning" or "subspace estimation". Given
relatively clean data, both are easily solved via singular value decomposition
(SVD). The problem of subspace learning or PCA in the presence of outliers is
called robust subspace learning or robust PCA (RPCA). For long data sequences,
if one tries to use a single lower dimensional subspace to represent the data,
the required subspace dimension may end up being quite large. For such data, a
better model is to assume that it lies in a low-dimensional subspace that can
change over time, albeit gradually. The problem of tracking such data (and the
subspaces) while being robust to outliers is called robust subspace tracking
(RST). This article provides a magazine-style overview of the entire field of
robust subspace learning and tracking. In particular solutions for three
problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition
(S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an
entire data vector is either an outlier or an inlier. The S+LR formulation
instead assumes that outliers occur on only a few data vector indices and hence
are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201
Detection of non-stationary dynamics using sub-space based representations, cyclic based and variability constraints
La siguiente Tesis de Maestría propone una metodología para el análisis de series de tiempo no-estacionarias con el propósito de filtrado y detección de ruido en reconocimiento de patrones. La metodología se encuentra dividida en dos etapas: el análisis de comportamientos no-estacionarios que recaen en procesos cíclicos y como diferentes componentes no-periódicos afectan el análisis de la señal. El segundo enfoque, está centrado en el problema de extracción de series de tiempo no-estacionarias que afectan procesos estacionarios. Ambos esquemas están basados en restricciones de (ciclo-)estacionariead y representaciones basadas en subespacios de manera que mediante la evaluación de las dinámicas de la señal sea posible identificar las componentes no-estacionarias indeseadas. Los resultados se muestran para cada enfoque de manera independiente por medio de datos sintéticos y reales, el desempeño obtenido muestra una gran capacidad de detección, rechazo y/o extracción de ruido y artefactos en series de tiempo (ciclo-)estacionarias usando restricciones de estacionariedad así como condiciones cíclicas basadas en la naturaleza de la señalAbstract : The present Master’s Thesis proposes a methodology for the non–stationary time-series analysis for filtering and noise rejection purposes in pattern recognition. The methodology is divided into two different approaches: the analysis of periodic non–stationary behavior that relies into a cyclic process and how additional non–cyclic non–stationarities disrupt and affect the signal processing. Second approach deals with the problem of non–stationary signal extraction that affects inherent weak stationary processes. Both frameworks of analysis take base on (cyclo-)stationary constraints and subspace based representations in order to assess and characterize the signals dynamics to facilitate the identification of the undesired non–stationary components. Results are shown over each approach with different real and synthetic data, the obtained performances show high rejection, detection and extraction capabilities for noise and artifacts in (cyclo)–stationary signals using external and internal based constraints of analysis and high separation capability for stationary signalsMaestrí
Principal Components and Long Run Implications of Multivariate Diffusions
We investigate a method for extracting nonlinear principal components. These principal components maximize variation subject to smoothness and orthogonality constraints; but we allow for a general class of constraints and multivariate densities, including densities without compact support and even densities with algebraic tails. We provide primitive sufficient conditions for the existence of these principal components. We characterize the limiting behavior of the associated eigenvalues, the objects used to quantify the incremental importance of the principal components. By exploiting the theory of continuous-time, reversible Markov processes, we give a different interpretation of the principal components and the smoothness constraints. When the diffusion matrix is used to enforce smoothness, the principal components maximize long-run variation relative to the overall variation subject to orthogonality constraints. Moreover, the principal components behave as scalar autoregressions with heteroskedastic innovations; this supports semiparametric identification of a multivariate reversible diffusion process and tests of the overidentifying restrictions implied by such a process from low frequency data. We also explore implications for stationary, possibly non-reversible diffusion processes.Nonlinear principal components, Discrete spectrum, Eigenvalue decay rates, Multivariate diffusion, Quadratic form, Conditional expectations operator
Coarse Molecular Dynamics of a Peptide Fragment: Free Energy, Kinetics, and Long-Time Dynamics Computations
We present a ``coarse molecular dynamics'' approach and apply it to studying
the kinetics and thermodynamics of a peptide fragment dissolved in water. Short
bursts of appropriately initialized simulations are used to infer the
deterministic and stochastic components of the peptide motion parametrized by
an appropriate set of coarse variables. Techniques from traditional numerical
analysis (Newton-Raphson, coarse projective integration) are thus enabled;
these techniques help analyze important features of the free-energy landscape
(coarse transition states, eigenvalues and eigenvectors, transition rates,
etc.). Reverse integration of (irreversible) expected coarse variables backward
in time can assist escape from free energy minima and trace low-dimensional
free energy surfaces. To illustrate the ``coarse molecular dynamics'' approach,
we combine multiple short (0.5-ps) replica simulations to map the free energy
surface of the ``alanine dipeptide'' in water, and to determine the ~ 1/(1000
ps) rate of interconversion between the two stable configurational basins
corresponding to the alpha-helical and extended minima.Comment: The article has been submitted to "The Journal of Chemical Physics.
Macrostate Data Clustering
We develop an effective nonhierarchical data clustering method using an
analogy to the dynamic coarse graining of a stochastic system. Analyzing the
eigensystem of an interitem transition matrix identifies fuzzy clusters
corresponding to the metastable macroscopic states (macrostates) of a diffusive
system. A "minimum uncertainty criterion" determines the linear transformation
from eigenvectors to cluster-defining window functions. Eigenspectrum gap and
cluster certainty conditions identify the proper number of clusters. The
physically motivated fuzzy representation and associated uncertainty analysis
distinguishes macrostate clustering from spectral partitioning methods.
Macrostate data clustering solves a variety of test cases that challenge other
methods.Comment: keywords: cluster analysis, clustering, pattern recognition, spectral
graph theory, dynamic eigenvectors, machine learning, macrostates,
classificatio
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
- …