4,258 research outputs found
Study of manifold geometry using non-negative kernel graphs
Amb l'augment de la mida de les dades, els sistemes efectius de reducció de la dimensionalitat s'han tornat necessaris per una gran varietat de tasques. Un conjunt de dades es pot caracteritzar per les seves propietats geomètriques, entre les quals es troben la densitat dels punts que hi té, la seva curvatura, i la dimensionalitat. En aquest context, la dimensió intrÃnseca (ID) fa referència al nombre mÃnim de parà metres necessaris per caracteritzar un conjunt de dades. S'han proposat moltes eines per a l'estimació de DI, i les que aconsegueixen els millors resultats estan molt enfocades a resoldre aquest objectiu. Aquests estimadors altament especialitzats no permeten la interpretació de la geometria local de les dades en altres aspectes a part de la ID. A més, els mètodes que si ho permeten no són capaços d'estimar la ID de manera fiable. Proposem l'ús de grafs de kernel no negatiu (NNK), una aproximació a la construcció de grafs que caracteritza la geometria local de les dades, per estudiar la dimensió i la forma de les superfÃcies mutlidimensionals de dades a múltiples escales. Proposem l'ús d'una sèrie de propietats relacionades amb els grafs NNK per obtenir informació sobre diversos conjunts de dades. En particular, observem el nombre de veïns en un graf NNK, la dimensió de les aproximacions per anà lisi de components principals tant per als grafs K-nearest neighbor (KNN) com NNK, el dià metre dels polÃtops definits pels grafs NNK i els angles principals entre les aproximacions per anà lisi de components principals dels grafs NNK. A més, estudiem aquestes propietats a múltiples escales utilitzant un algorisme que fa que les dades siguin més disperses fusionant punts en funció d'una tria de similitud. Utilitzant una similitud basada en els conjunts de veïns NNK, podem submostrejar conjunts de dades preservant les propietats geomètriques del conjunt de dades inicial.Given the increasing amounts of data being measured and recorded, effective dimensionality reduction systems have become necessary for a wide variety of tasks. A dataset can be characterized by its geometrical properties, including its point density, curvature, and dimensionality. In this context, the intrinsic dimension (ID) refers to the minimum number of parameters required to characterize a dataset. Many tools have been proposed for the estimation of ID, and the ones that achieve the best results are narrowly focused on solving this goal. These highly specialized estimators don't allow for the interpretation of the local geometry of the data in other aspects besides ID. Moreover, methods that do make this possible are not able to estimate ID reliably. We propose the use of non-negative kernel (NNK) graphs, an approach to graph construction that characterizes the local geometry of the data, to study the dimension and shape of data manifolds at multiple scales. We propose the use of a series of properties related to NNK graphs to gain insight into manifold datasets. In particular, we look at the number of neighbors in an NNK graph, the dimension of the low-rank approximations for both K-nearest neighbor (KNN) and NNK graphs, the diameter of the polytopes defined by NNK graphs, and the principal angles between the low-rank approximations of NNK graphs. Moreover, we study these properties at multiple scales using an algorithm that makes data sparse by merging points based on a choice of similarity. By using similarity based on local NNK neighborhoods we can subsample datasets preserving the geometrical properties of the initial dataset
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Intrinsic Universal Measurements of Non-linear Embeddings
A basic problem in machine learning is to find a mapping from a low
dimensional latent space to a high dimensional observation space. Equipped with
the representation power of non-linearity, a learner can easily find a mapping
which perfectly fits all the observations. However such a mapping is often not
considered as good as it is not simple enough and over-fits. How to define
simplicity? This paper tries to make such a formal definition of the amount of
information imposed by a non-linear mapping. This definition is based on
information geometry and is independent of observations, nor specific
parametrizations. We prove these basic properties and discuss relationships
with parametric and non-parametric embeddings.Comment: work in progres
- …