462 research outputs found
On landmark selection and sampling in high-dimensional data analysis
In recent years, the spectral analysis of appropriately defined kernel
matrices has emerged as a principled way to extract the low-dimensional
structure often prevalent in high-dimensional data. Here we provide an
introduction to spectral methods for linear and nonlinear dimension reduction,
emphasizing ways to overcome the computational limitations currently faced by
practitioners with massive datasets. In particular, a data subsampling or
landmark selection process is often employed to construct a kernel based on
partial information, followed by an approximate spectral analysis termed the
Nystrom extension. We provide a quantitative framework to analyse this
procedure, and use it to demonstrate algorithmic performance bounds on a range
of practical approaches designed to optimize the landmark selection process. We
compare the practical implications of these bounds by way of real-world
examples drawn from the field of computer vision, whereby low-dimensional
manifold structure is shown to emerge from high-dimensional video data streams.Comment: 18 pages, 6 figures, submitted for publicatio
Sparse Projections of Medical Images onto Manifolds
Manifold learning has been successfully applied to a variety of medical imaging problems. Its use in real-time applications requires fast projection onto the low-dimensional space. To this end, out-of-sample extensions are applied by constructing an interpolation function that maps from the input space to the low-dimensional manifold. Commonly used approaches such as the Nyström extension and kernel ridge regression require using all training points. We propose an interpolation function that only depends on a small subset of the input training data. Consequently, in the testing phase each new point only needs to be compared against a small number of input training data in order to project the point onto the low-dimensional space. We interpret our method as an out-of-sample extension that approximates kernel ridge regression. Our method involves solving a simple convex optimization problem and has the attractive property of guaranteeing an upper bound on the approximation error, which is crucial for medical applications. Tuning this error bound controls the sparsity of the resulting interpolation function. We illustrate our method in two clinical applications that require fast mapping of input images onto a low-dimensional space.National Alliance for Medical Image Computing (U.S.) (grant NIH NIBIB NAMIC U54-EB005149)National Institutes of Health (U.S.) (grant NIH NCRR NAC P41-RR13218)National Institutes of Health (U.S.) (grant NIH NIBIB NAC P41-EB-015902
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies
In motion analysis and understanding it is important to be able to fit a
suitable model or structure to the temporal series of observed data, in order
to describe motion patterns in a compact way, and to discriminate between them.
In an unsupervised context, i.e., no prior model of the moving object(s) is
available, such a structure has to be learned from the data in a bottom-up
fashion. In recent times, volumetric approaches in which the motion is captured
from a number of cameras and a voxel-set representation of the body is built
from the camera views, have gained ground due to attractive features such as
inherent view-invariance and robustness to occlusions. Automatic, unsupervised
segmentation of moving bodies along entire sequences, in a temporally-coherent
and robust way, has the potential to provide a means of constructing a
bottom-up model of the moving body, and track motion cues that may be later
exploited for motion classification. Spectral methods such as locally linear
embedding (LLE) can be useful in this context, as they preserve "protrusions",
i.e., high-curvature regions of the 3D volume, of articulated shapes, while
improving their separation in a lower dimensional space, making them in this
way easier to cluster. In this paper we therefore propose a spectral approach
to unsupervised and temporally-coherent body-protrusion segmentation along time
sequences. Volumetric shapes are clustered in an embedding space, clusters are
propagated in time to ensure coherence, and merged or split to accommodate
changes in the body's topology. Experiments on both synthetic and real
sequences of dense voxel-set data are shown. This supports the ability of the
proposed method to cluster body-parts consistently over time in a totally
unsupervised fashion, its robustness to sampling density and shape quality, and
its potential for bottom-up model constructionComment: 31 pages, 26 figure
A Statistical Toolbox For Mining And Modeling Spatial Data
Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free) software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP), valuable in exploratory spatial data analysis
- …