3,146 research outputs found
Principal manifolds and graphs in practice: from molecular biology to dynamical systems
We present several applications of non-linear data modeling, using principal
manifolds and principal graphs constructed using the metaphor of elasticity
(elastic principal graph approach). These approaches are generalizations of the
Kohonen's self-organizing maps, a class of artificial neural networks. On
several examples we show advantages of using non-linear objects for data
approximation in comparison to the linear ones. We propose four numerical
criteria for comparing linear and non-linear mappings of datasets into the
spaces of lower dimension. The examples are taken from comparative political
science, from analysis of high-throughput data in molecular biology, from
analysis of dynamical systems.Comment: 12 pages, 9 figure
Representing complex data using localized principal components with application to astronomical data
Often the relation between the variables constituting a multivariate data
space might be characterized by one or more of the terms: ``nonlinear'',
``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or,
more general, ``complex''. In these cases, simple principal component analysis
(PCA) as a tool for dimension reduction can fail badly. Of the many alternative
approaches proposed so far, local approximations of PCA are among the most
promising. This paper will give a short review of localized versions of PCA,
focusing on local principal curves and local partitioning algorithms.
Furthermore we discuss projections other than the local principal components.
When performing local dimension reduction for regression or classification
problems it is important to focus not only on the manifold structure of the
covariates, but also on the response variable(s). Local principal components
only achieve the former, whereas localized regression approaches concentrate on
the latter. Local projection directions derived from the partial least squares
(PLS) algorithm offer an interesting trade-off between these two objectives. We
apply these methods to several real data sets. In particular, we consider
simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and
Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds),
Lecture Notes in Computational Science and Engineering, Springer, 2007, pp.
180--204,
http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-
PHom-GeM: Persistent Homology for Generative Models
Generative neural network models, including Generative Adversarial Network
(GAN) and Auto-Encoders (AE), are among the most popular neural network models
to generate adversarial data. The GAN model is composed of a generator that
produces synthetic data and of a discriminator that discriminates between the
generator's output and the true data. AE consist of an encoder which maps the
model distribution to a latent manifold and of a decoder which maps the latent
manifold to a reconstructed distribution. However, generative models are known
to provoke chaotically scattered reconstructed distribution during their
training, and consequently, incomplete generated adversarial distributions.
Current distance measures fail to address this problem because they are not
able to acknowledge the shape of the data manifold, i.e. its topological
features, and the scale at which the manifold should be analyzed. We propose
Persistent Homology for Generative Models, PHom-GeM, a new methodology to
assess and measure the distribution of a generative model. PHom-GeM minimizes
an objective function between the true and the reconstructed distributions and
uses persistent homology, the study of the topological features of a space at
different spatial resolutions, to compare the nature of the true and the
generated distributions. Our experiments underline the potential of persistent
homology for Wasserstein GAN in comparison to Wasserstein AE and Variational
AE. The experiments are conducted on a real-world data set particularly
challenging for traditional distance measures and generative neural network
models. PHom-GeM is the first methodology to propose a topological distance
measure, the bottleneck distance, for generative models used to compare
adversarial samples in the context of credit card transactions
Dimension reduction for linear separation with curvilinear distances
Any high dimensional data in its original raw form may contain obviously classifiable clusters which are difficult to identify given the high-dimension representation. In reducing the dimensions it may be possible to perform a simple classification technique to extract this cluster information whilst retaining the overall topology of the data set. The supervised method presented here takes a high dimension data set consisting of multiple clusters and employs curvilinear distance as a relation between points, projecting in a lower dimension according to this relationship. This representation allows for linear separation of the non-separable high dimensional cluster data and the classification to a cluster of any successive unseen data point extracted from the same higher dimension
Visualization of AE's Training on Credit Card Transactions with Persistent Homology
Auto-encoders are among the most popular neural network architecture for
dimension reduction. They are composed of two parts: the encoder which maps the
model distribution to a latent manifold and the decoder which maps the latent
manifold to a reconstructed distribution. However, auto-encoders are known to
provoke chaotically scattered data distribution in the latent manifold
resulting in an incomplete reconstructed distribution. Current distance
measures fail to detect this problem because they are not able to acknowledge
the shape of the data manifolds, i.e. their topological features, and the scale
at which the manifolds should be analyzed. We propose Persistent Homology for
Wasserstein Auto-Encoders, called PHom-WAE, a new methodology to assess and
measure the data distribution of a generative model. PHom-WAE minimizes the
Wasserstein distance between the true distribution and the reconstructed
distribution and uses persistent homology, the study of the topological
features of a space at different spatial resolutions, to compare the nature of
the latent manifold and the reconstructed distribution. Our experiments
underline the potential of persistent homology for Wasserstein Auto-Encoders in
comparison to Variational Auto-Encoders, another type of generative model. The
experiments are conducted on a real-world data set particularly challenging for
traditional distance measures and auto-encoders. PHom-WAE is the first
methodology to propose a topological distance measure, the bottleneck distance,
for Wasserstein Auto-Encoders used to compare decoded samples of high quality
in the context of credit card transactions.Comment: arXiv admin note: substantial text overlap with arXiv:1905.0989
- …