2,910 research outputs found
A survey of dimensionality reduction techniques
Experimental life sciences like biology or chemistry have seen in the recent
decades an explosion of the data available from experiments. Laboratory
instruments become more and more complex and report hundreds or thousands
measurements for a single experiment and therefore the statistical methods face
challenging tasks when dealing with such high dimensional data. However, much
of the data is highly redundant and can be efficiently brought down to a much
smaller number of variables without a significant loss of information. The
mathematical procedures making possible this reduction are called
dimensionality reduction techniques; they have widely been developed by fields
like Statistics or Machine Learning, and are currently a hot research topic. In
this review we categorize the plethora of dimension reduction techniques
available and give the mathematical insight behind them
Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment
Nonlinear manifold learning from unorganized data points is a very
challenging unsupervised learning and data visualization problem with a great
variety of applications. In this paper we present a new algorithm for manifold
learning and nonlinear dimension reduction. Based on a set of unorganized data
points sampled with noise from the manifold, we represent the local geometry of
the manifold using tangent spaces learned by fitting an affine subspace in a
neighborhood of each data point. Those tangent spaces are aligned to give the
internal global coordinates of the data points with respect to the underlying
manifold by way of a partial eigendecomposition of the neighborhood connection
matrix. We present a careful error analysis of our algorithm and show that the
reconstruction errors are of second-order accuracy. We illustrate our algorithm
using curves and surfaces both in
2D/3D and higher dimensional Euclidean spaces, and 64-by-64 pixel face images
with various pose and lighting conditions. We also address several theoretical
and algorithmic issues for further research and improvements
Parametrization of white matter manifold-like structures using principal surfaces
In this manuscript, we are concerned with data generated from a diffusion
tensor imaging (DTI) experiment. The goal is to parameterize manifold-like
white matter tracts, such as the corpus callosum, using principal surfaces. We
approach the problem by finding a geometrically motivated surface-based
representation of the corpus callosum and visualize the fractional anisotropy
(FA) values projected onto the surface; the method applies to any other
diffusion summary as well as to other white matter tracts. We provide an
algorithm that 1) constructs the principal surface of a corpus callosum; 2)
flattens the surface into a parametric 2D map; 3) projects associated FA values
on the map. The algorithm was applied to a longitudinal study containing 466
diffusion tensor images of 176 multiple sclerosis (MS) patients observed at
multiple visits. For each subject and visit the study contains a registered DTI
scan of the corpus callosum at roughly 20,000 voxels. Extensive simulation
studies demonstrate fast convergence and robust performance of the algorithm
under a variety of challenging scenarios.Comment: 27 pages, 5 figures and 1 tabl
Auto-associative models, nonlinear Principal component analysis, manifolds and projection pursuit
In this paper, auto-associative models are proposed as candidates to the
generalization of Principal Component Analysis. We show that these models are
dedicated to the approximation of the dataset by a manifold. Here, the word
"manifold" refers to the topology properties of the structure. The
approximating manifold is built by a projection pursuit algorithm. At each step
of the algorithm, the dimension of the manifold is incremented. Some
theoretical properties are provided. In particular, we can show that, at each
step of the algorithm, the mean residuals norm is not increased. Moreover, it
is also established that the algorithm converges in a finite number of steps.
Some particular auto-associative models are exhibited and compared to the
classical PCA and some neural networks models. Implementation aspects are
discussed. We show that, in numerous cases, no optimization procedure is
required. Some illustrations on simulated and real data are presented
Tangent Bundle Manifold Learning via Grassmann&Stiefel Eigenmaps
One of the ultimate goals of Manifold Learning (ML) is to reconstruct an
unknown nonlinear low-dimensional manifold embedded in a high-dimensional
observation space by a given set of data points from the manifold. We derive a
local lower bound for the maximum reconstruction error in a small neighborhood
of an arbitrary point. The lower bound is defined in terms of the distance
between tangent spaces to the original manifold and the estimated manifold at
the considered point and reconstructed point, respectively. We propose an
amplification of the ML, called Tangent Bundle ML, in which the proximity not
only between the original manifold and its estimator but also between their
tangent spaces is required. We present a new algorithm that solves this problem
and gives a new solution for the ML also.Comment: 25 pages, 6 figure
Computational Machines in a Coexistence with Concrete Universals and Data Streams
We discuss that how the majority of traditional modeling approaches are
following the idealism point of view in scientific modeling, which follow the
set theoretical notions of models based on abstract universals. We show that
while successful in many classical modeling domains, there are fundamental
limits to the application of set theoretical models in dealing with complex
systems with many potential aspects or properties depending on the
perspectives. As an alternative to abstract universals, we propose a conceptual
modeling framework based on concrete universals that can be interpreted as a
category theoretical approach to modeling. We call this modeling framework
pre-specific modeling. We further, discuss how a certain group of mathematical
and computational methods, along with ever-growing data streams are able to
operationalize the concept of pre-specific modeling
NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds
Convolution plays a crucial role in various applications in signal and image
processing, analysis, and recognition. It is also the main building block of
convolution neural networks (CNNs). Designing appropriate convolution neural
networks on manifold-structured point clouds can inherit and empower recent
advances of CNNs to analyzing and processing point cloud data. However, one of
the major challenges is to define a proper way to "sweep" filters through the
point cloud as a natural generalization of the planar convolution and to
reflect the point cloud's geometry at the same time. In this paper, we consider
generalizing convolution by adapting parallel transport on the point cloud.
Inspired by a triangulated surface-based method [Stefan C. Schonsheck, Bin
Dong, and Rongjie Lai, arXiv:1805.07857.], we propose the Narrow-Band Parallel
Transport Convolution (NPTC) using a specifically defined connection on a
voxel-based narrow-band approximation of point cloud data. With that, we
further propose a deep convolutional neural network based on NPTC (called
NPTC-net) for point cloud classification and segmentation. Comprehensive
experiments show that the proposed NPTC-net achieves similar or better results
than current state-of-the-art methods on point cloud classification and
segmentation.Comment: 18 pages, 6 figure
ViDaExpert: user-friendly tool for nonlinear visualization and analysis of multidimensional vectorial data
ViDaExpert is a tool for visualization and analysis of multidimensional
vectorial data. ViDaExpert is able to work with data tables of "object-feature"
type that might contain numerical feature values as well as textual labels for
rows (objects) and columns (features). ViDaExpert implements several
statistical methods such as standard and weighted Principal Component Analysis
(PCA) and the method of elastic maps (non-linear version of PCA), Linear
Discriminant Analysis (LDA), multilinear regression, K-Means clustering, a
variant of decision tree construction algorithm. Equipped with several
user-friendly dialogs for configuring data point representations (size, shape,
color) and fast 3D viewer, ViDaExpert is a handy tool allowing to construct an
interactive 3D-scene representing a table of data in multidimensional space and
perform its quick and insightfull statistical analysis, from basic to advanced
methods
EasiCS: the objective and fine-grained classification method of cervical spondylosis dysfunction
The precise diagnosis is of great significance in developing precise
treatment plans to restore neck function and reduce the burden posed by the
cervical spondylosis (CS). However, the current available neck function
assessment method are subjective and coarse-grained. In this paper, based on
the relationship among CS, cervical structure, cervical vertebra function, and
surface electromyography (sEMG), we seek to develop a clustering algorithms on
the sEMG data set collected from the clinical environment and implement the
division. We proposed and developed the framework EasiCS, which consists of
dimension reduction, clustering algorithm EasiSOM, spectral clustering
algorithm EasiSC. The EasiCS outperform the commonly used seven algorithms
overall
Geodesic convolutional neural networks on Riemannian manifolds
Feature descriptors play a crucial role in a wide range of geometry analysis
and processing applications, including shape correspondence, retrieval, and
segmentation. In this paper, we introduce Geodesic Convolutional Neural
Networks (GCNN), a generalization of the convolutional networks (CNN) paradigm
to non-Euclidean manifolds. Our construction is based on a local geodesic
system of polar coordinates to extract "patches", which are then passed through
a cascade of filters and linear and non-linear operators. The coefficients of
the filters and linear combination weights are optimization variables that are
learned to minimize a task-specific cost function. We use GCNN to learn
invariant shape features, allowing to achieve state-of-the-art performance in
problems such as shape description, retrieval, and correspondence
- …