311,286 research outputs found

    Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections

    Full text link
    Data visualisation helps understanding data represented by multiple variables, also called features, stored in a large matrix where individuals are stored in lines and variable values in columns. These data structures are frequently called multidimensional spaces.In this paper, we illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework. Some of the common mathematical common to these approaches are Laplacian matrices, Euclidian distance, Cosine distance, and statistical methods such as Kullback-Leibler divergence, employed to fit probability distributions and reduce dimensions. Two of the relevant algorithms in the data visualisation field are t-distributed stochastic neighbourhood embedding (t-SNE) and Least-Square Projection (LSP). These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. In this article, mathematical parameters of underlying techniques such as Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods behind LSP are adjusted to reflect the properties afforded by the mathematical formulation. The results, supported by illustrative methods of the processes of LSP and t-SNE, are meant to inspire students in understanding the mathematics behind such methods, in order to apply them in effective data analysis tasks in multiple applications

    PCA Based Bayesian Approach for Automatic Multiple Sclerosis Lesion Detection

    Get PDF
    The classical Bayes rule plays very important role in the field of lesion identification. However, the Bayesian approach is very difficult in high dimensional spaces for lesion detection. An alternative approach is Principle Component Analysis (PCA) for automatic multiple sclerosis lesion detection problems in high dimensional spaces. In this study, PCA based Bayesian approach is explained for automatic multiple sclerosis lesion detection using Markov Random Fields (MRF)and Singular Value Decomposition (SVD). It is shown that PCA approach provides better understanding of data. Although Bayesian approach gives effective results, itis not easy to use in high dimensional spaces. Therefore, PCA based Bayesian detection will give much more accurate results for automatic multiple sclerosis (MS)lesion detection

    On the number of representations providing noiseless subsystems

    Full text link
    This paper studies the combinatoric structure of the set of all representations, up to equivalence, of a finite-dimensional semisimple Lie algebra. This has intrinsic interest as a previously unsolved problem in representation theory, and also has applications to the understanding of quantum decoherence. We prove that for Hilbert spaces of sufficiently high dimension, decoherence-free subspaces exist for almost all representations of the error algebra. For decoherence-free subsystems, we plot the function fd(n)f_d(n) which is the fraction of all dd-dimensional quantum systems which preserve nn bits of information through DF subsystems, and note that this function fits an inverse beta distribution. The mathematical tools which arise include techniques from classical number theory.Comment: 17 pp, 4 figs, accepted for Physical Review

    Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data

    Get PDF
    The utilization of statistical methods an their applications within the new field of study known as Topological Data Analysis has has tremendous potential for broadening our exploration and understanding of complex, high-dimensional data spaces. This paper provides an introductory overview of the mathematical underpinnings of Topological Data Analysis, the workflow to convert samples of data to topological summary statistics, and some of the statistical methods developed for performing inference on these topological summary statistics. The intention of this non-technical overview is to motivate statisticians who are interested in learning more about the subject.Comment: 15 pages, 7 Figures, 27th Annual Conference on Applied Statistics in Agricultur

    The Riemannian Geometry of Deep Generative Models

    Full text link
    Deep generative models learn a mapping from a low dimensional latent space to a high-dimensional data space. Under certain regularity conditions, these models parameterize nonlinear manifolds in the data space. In this paper, we investigate the Riemannian geometry of these generated manifolds. First, we develop efficient algorithms for computing geodesic curves, which provide an intrinsic notion of distance between points on the manifold. Second, we develop an algorithm for parallel translation of a tangent vector along a path on the manifold. We show how parallel translation can be used to generate analogies, i.e., to transport a change in one data point into a semantically similar change of another data point. Our experiments on real image data show that the manifolds learned by deep generative models, while nonlinear, are surprisingly close to zero curvature. The practical implication is that linear paths in the latent space closely approximate geodesics on the generated manifold. However, further investigation into this phenomenon is warranted, to identify if there are other architectures or datasets where curvature plays a more prominent role. We believe that exploring the Riemannian geometry of deep generative models, using the tools developed in this paper, will be an important step in understanding the high-dimensional, nonlinear spaces these models learn.Comment: 9 page
    • …
    corecore