311,286 research outputs found
Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections
Data visualisation helps understanding data represented by multiple
variables, also called features, stored in a large matrix where individuals are
stored in lines and variable values in columns. These data structures are
frequently called multidimensional spaces.In this paper, we illustrate ways of
employing the visual results of multidimensional projection algorithms to
understand and fine-tune the parameters of their mathematical framework. Some
of the common mathematical common to these approaches are Laplacian matrices,
Euclidian distance, Cosine distance, and statistical methods such as
Kullback-Leibler divergence, employed to fit probability distributions and
reduce dimensions. Two of the relevant algorithms in the data visualisation
field are t-distributed stochastic neighbourhood embedding (t-SNE) and
Least-Square Projection (LSP). These algorithms can be used to understand
several ranges of mathematical functions including their impact on datasets. In
this article, mathematical parameters of underlying techniques such as
Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods
behind LSP are adjusted to reflect the properties afforded by the mathematical
formulation. The results, supported by illustrative methods of the processes of
LSP and t-SNE, are meant to inspire students in understanding the mathematics
behind such methods, in order to apply them in effective data analysis tasks in
multiple applications
PCA Based Bayesian Approach for Automatic Multiple Sclerosis Lesion Detection
The classical Bayes rule plays very important role in the field of lesion identification. However, the Bayesian approach is very difficult in high dimensional spaces for lesion detection. An alternative approach is Principle Component Analysis (PCA) for automatic multiple sclerosis lesion detection problems in high dimensional spaces. In this study, PCA based Bayesian approach is explained for automatic multiple sclerosis lesion detection using Markov Random Fields (MRF)and Singular Value Decomposition (SVD). It is shown that PCA approach provides better understanding of data. Although Bayesian approach gives effective results, itis not easy to use in high dimensional spaces. Therefore, PCA based Bayesian detection will give much more accurate results for automatic multiple sclerosis (MS)lesion detection
On the number of representations providing noiseless subsystems
This paper studies the combinatoric structure of the set of all
representations, up to equivalence, of a finite-dimensional semisimple Lie
algebra. This has intrinsic interest as a previously unsolved problem in
representation theory, and also has applications to the understanding of
quantum decoherence. We prove that for Hilbert spaces of sufficiently high
dimension, decoherence-free subspaces exist for almost all representations of
the error algebra. For decoherence-free subsystems, we plot the function
which is the fraction of all -dimensional quantum systems which
preserve bits of information through DF subsystems, and note that this
function fits an inverse beta distribution. The mathematical tools which arise
include techniques from classical number theory.Comment: 17 pp, 4 figs, accepted for Physical Review
Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data
The utilization of statistical methods an their applications within the new
field of study known as Topological Data Analysis has has tremendous potential
for broadening our exploration and understanding of complex, high-dimensional
data spaces. This paper provides an introductory overview of the mathematical
underpinnings of Topological Data Analysis, the workflow to convert samples of
data to topological summary statistics, and some of the statistical methods
developed for performing inference on these topological summary statistics. The
intention of this non-technical overview is to motivate statisticians who are
interested in learning more about the subject.Comment: 15 pages, 7 Figures, 27th Annual Conference on Applied Statistics in
Agricultur
The Riemannian Geometry of Deep Generative Models
Deep generative models learn a mapping from a low dimensional latent space to
a high-dimensional data space. Under certain regularity conditions, these
models parameterize nonlinear manifolds in the data space. In this paper, we
investigate the Riemannian geometry of these generated manifolds. First, we
develop efficient algorithms for computing geodesic curves, which provide an
intrinsic notion of distance between points on the manifold. Second, we develop
an algorithm for parallel translation of a tangent vector along a path on the
manifold. We show how parallel translation can be used to generate analogies,
i.e., to transport a change in one data point into a semantically similar
change of another data point. Our experiments on real image data show that the
manifolds learned by deep generative models, while nonlinear, are surprisingly
close to zero curvature. The practical implication is that linear paths in the
latent space closely approximate geodesics on the generated manifold. However,
further investigation into this phenomenon is warranted, to identify if there
are other architectures or datasets where curvature plays a more prominent
role. We believe that exploring the Riemannian geometry of deep generative
models, using the tools developed in this paper, will be an important step in
understanding the high-dimensional, nonlinear spaces these models learn.Comment: 9 page
- …