492 research outputs found
Parametric t-Distributed Stochastic Exemplar-centered Embedding
Parametric embedding methods such as parametric t-SNE (pt-SNE) have been
widely adopted for data visualization and out-of-sample data embedding without
further computationally expensive optimization or approximation. However, the
performance of pt-SNE is highly sensitive to the hyper-parameter batch size due
to conflicting optimization goals, and often produces dramatically different
embeddings with different choices of user-defined perplexities. To effectively
solve these issues, we present parametric t-distributed stochastic
exemplar-centered embedding methods. Our strategy learns embedding parameters
by comparing given data only with precomputed exemplars, resulting in a cost
function with linear computational and memory complexity, which is further
reduced by noise contrastive samples. Moreover, we propose a shallow embedding
network with high-order feature interactions for data visualization, which is
much easier to tune but produces comparable performance in contrast to a deep
neural network employed by pt-SNE. We empirically demonstrate, using several
benchmark datasets, that our proposed methods significantly outperform pt-SNE
in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo
Nonlinear Dimensionality Reduction Methods in Climate Data Analysis
Linear dimensionality reduction techniques, notably principal component
analysis, are widely used in climate data analysis as a means to aid in the
interpretation of datasets of high dimensionality. These linear methods may not
be appropriate for the analysis of data arising from nonlinear processes
occurring in the climate system. Numerous techniques for nonlinear
dimensionality reduction have been developed recently that may provide a
potentially useful tool for the identification of low-dimensional manifolds in
climate data sets arising from nonlinear dynamics. In this thesis I apply three
such techniques to the study of El Nino/Southern Oscillation variability in
tropical Pacific sea surface temperatures and thermocline depth, comparing
observational data with simulations from coupled atmosphere-ocean general
circulation models from the CMIP3 multi-model ensemble.
The three methods used here are a nonlinear principal component analysis
(NLPCA) approach based on neural networks, the Isomap isometric mapping
algorithm, and Hessian locally linear embedding. I use these three methods to
examine El Nino variability in the different data sets and assess the
suitability of these nonlinear dimensionality reduction approaches for climate
data analysis.
I conclude that although, for the application presented here, analysis using
NLPCA, Isomap and Hessian locally linear embedding does not provide additional
information beyond that already provided by principal component analysis, these
methods are effective tools for exploratory data analysis.Comment: 273 pages, 76 figures; University of Bristol Ph.D. thesis; version
with high-resolution figures available from
http://www.skybluetrades.net/thesis/ian-ross-thesis.pdf (52Mb download
Point-set manifold processing for computational mechanics: thin shells, reduced order modeling, cell motility and molecular conformations
In many applications, one would like to perform calculations on smooth manifolds of dimension d embedded in a high-dimensional space of dimension D. Often, a continuous description of such manifold is not known, and instead it is sampled by a set of scattered points in high dimensions. This poses a serious challenge. In this thesis, we approximate the point-set manifold as an overlapping set of smooth parametric descriptions, whose geometric structure is revealed by statistical learning methods, and then parametrized by meshfree methods. This approach avoids any global parameterization, and hence is applicable to manifolds of any genus and complex geometry. It combines four ingredients: (1) partitioning of the point set into subregions of trivial topology, (2) the automatic detection of the local geometric structure of the manifold by nonlinear dimensionality reduction techniques, (3) the local parameterization of the manifold using smooth meshfree (here local maximum-entropy) approximants, and (4) patching together the local representations by means of a partition of unity.
In this thesis we show the generality, flexibility, and accuracy of the method in four different problems. First, we exercise it in the context of Kirchhoff-Love thin shells, (d=2, D=3). We test our methodology against classical linear and non linear benchmarks in thin-shell analysis, and highlight its ability to handle point-set surfaces of complex topology and geometry. We then tackle problems of much higher dimensionality. We perform reduced order modeling in the context of finite deformation elastodynamics, considering a nonlinear reduced configuration space, in contrast with classical linear approaches based on Principal Component Analysis (d=2, D=10000's). We further quantitatively unveil the geometric structure of the motility strategy of a family of micro-organisms called Euglenids from experimental videos (d=1, D~30000's). Finally, in the context of enhanced sampling in molecular dynamics, we automatically construct collective variables for the molecular conformational dynamics (d=1...6, D~30,1000's)
- …