492 research outputs found

    Parametric t-Distributed Stochastic Exemplar-centered Embedding

    Full text link
    Parametric embedding methods such as parametric t-SNE (pt-SNE) have been widely adopted for data visualization and out-of-sample data embedding without further computationally expensive optimization or approximation. However, the performance of pt-SNE is highly sensitive to the hyper-parameter batch size due to conflicting optimization goals, and often produces dramatically different embeddings with different choices of user-defined perplexities. To effectively solve these issues, we present parametric t-distributed stochastic exemplar-centered embedding methods. Our strategy learns embedding parameters by comparing given data only with precomputed exemplars, resulting in a cost function with linear computational and memory complexity, which is further reduced by noise contrastive samples. Moreover, we propose a shallow embedding network with high-order feature interactions for data visualization, which is much easier to tune but produces comparable performance in contrast to a deep neural network employed by pt-SNE. We empirically demonstrate, using several benchmark datasets, that our proposed methods significantly outperform pt-SNE in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo

    Nonlinear Dimensionality Reduction Methods in Climate Data Analysis

    Full text link
    Linear dimensionality reduction techniques, notably principal component analysis, are widely used in climate data analysis as a means to aid in the interpretation of datasets of high dimensionality. These linear methods may not be appropriate for the analysis of data arising from nonlinear processes occurring in the climate system. Numerous techniques for nonlinear dimensionality reduction have been developed recently that may provide a potentially useful tool for the identification of low-dimensional manifolds in climate data sets arising from nonlinear dynamics. In this thesis I apply three such techniques to the study of El Nino/Southern Oscillation variability in tropical Pacific sea surface temperatures and thermocline depth, comparing observational data with simulations from coupled atmosphere-ocean general circulation models from the CMIP3 multi-model ensemble. The three methods used here are a nonlinear principal component analysis (NLPCA) approach based on neural networks, the Isomap isometric mapping algorithm, and Hessian locally linear embedding. I use these three methods to examine El Nino variability in the different data sets and assess the suitability of these nonlinear dimensionality reduction approaches for climate data analysis. I conclude that although, for the application presented here, analysis using NLPCA, Isomap and Hessian locally linear embedding does not provide additional information beyond that already provided by principal component analysis, these methods are effective tools for exploratory data analysis.Comment: 273 pages, 76 figures; University of Bristol Ph.D. thesis; version with high-resolution figures available from http://www.skybluetrades.net/thesis/ian-ross-thesis.pdf (52Mb download

    Point-set manifold processing for computational mechanics: thin shells, reduced order modeling, cell motility and molecular conformations

    Get PDF
    In many applications, one would like to perform calculations on smooth manifolds of dimension d embedded in a high-dimensional space of dimension D. Often, a continuous description of such manifold is not known, and instead it is sampled by a set of scattered points in high dimensions. This poses a serious challenge. In this thesis, we approximate the point-set manifold as an overlapping set of smooth parametric descriptions, whose geometric structure is revealed by statistical learning methods, and then parametrized by meshfree methods. This approach avoids any global parameterization, and hence is applicable to manifolds of any genus and complex geometry. It combines four ingredients: (1) partitioning of the point set into subregions of trivial topology, (2) the automatic detection of the local geometric structure of the manifold by nonlinear dimensionality reduction techniques, (3) the local parameterization of the manifold using smooth meshfree (here local maximum-entropy) approximants, and (4) patching together the local representations by means of a partition of unity. In this thesis we show the generality, flexibility, and accuracy of the method in four different problems. First, we exercise it in the context of Kirchhoff-Love thin shells, (d=2, D=3). We test our methodology against classical linear and non linear benchmarks in thin-shell analysis, and highlight its ability to handle point-set surfaces of complex topology and geometry. We then tackle problems of much higher dimensionality. We perform reduced order modeling in the context of finite deformation elastodynamics, considering a nonlinear reduced configuration space, in contrast with classical linear approaches based on Principal Component Analysis (d=2, D=10000's). We further quantitatively unveil the geometric structure of the motility strategy of a family of micro-organisms called Euglenids from experimental videos (d=1, D~30000's). Finally, in the context of enhanced sampling in molecular dynamics, we automatically construct collective variables for the molecular conformational dynamics (d=1...6, D~30,1000's)
    • …
    corecore