2,595 research outputs found
A survey of kernel and spectral methods for clustering
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved
Structure in the 3D Galaxy Distribution: I. Methods and Example Results
Three methods for detecting and characterizing structure in point data, such
as that generated by redshift surveys, are described: classification using
self-organizing maps, segmentation using Bayesian blocks, and density
estimation using adaptive kernels. The first two methods are new, and allow
detection and characterization of structures of arbitrary shape and at a wide
range of spatial scales. These methods should elucidate not only clusters, but
also the more distributed, wide-ranging filaments and sheets, and further allow
the possibility of detecting and characterizing an even broader class of
shapes. The methods are demonstrated and compared in application to three data
sets: a carefully selected volume-limited sample from the Sloan Digital Sky
Survey redshift data, a similarly selected sample from the Millennium
Simulation, and a set of points independently drawn from a uniform probability
distribution -- a so-called Poisson distribution. We demonstrate a few of the
many ways in which these methods elucidate large scale structure in the
distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written
introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures
please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd
Clustering comparison of point processes with applications to random geometric models
In this chapter we review some examples, methods, and recent results
involving comparison of clustering properties of point processes. Our approach
is founded on some basic observations allowing us to consider void
probabilities and moment measures as two complementary tools for capturing
clustering phenomena in point processes. As might be expected, smaller values
of these characteristics indicate less clustering. Also, various global and
local functionals of random geometric models driven by point processes admit
more or less explicit bounds involving void probabilities and moment measures,
thus aiding the study of impact of clustering of the underlying point process.
When stronger tools are needed, directional convex ordering of point processes
happens to be an appropriate choice, as well as the notion of (positive or
negative) association, when comparison to the Poisson point process is
considered. We explain the relations between these tools and provide examples
of point processes admitting them. Furthermore, we sketch some recent results
obtained using the aforementioned comparison tools, regarding percolation and
coverage properties of the Boolean model, the SINR model, subgraph counts in
random geometric graphs, and more generally, U-statistics of point processes.
We also mention some results on Betti numbers for \v{C}ech and Vietoris-Rips
random complexes generated by stationary point processes. A general observation
is that many of the results derived previously for the Poisson point process
generalise to some "sub-Poisson" processes, defined as those clustering less
than the Poisson process in the sense of void probabilities and moment
measures, negative association or dcx-ordering.Comment: 44 pages, 4 figure
Towards Stratification Learning through Homology Inference
A topological approach to stratification learning is developed for point
cloud data drawn from a stratified space. Given such data, our objective is to
infer which points belong to the same strata. First we define a multi-scale
notion of a stratified space, giving a stratification for each radius level. We
then use methods derived from kernel and cokernel persistent homology to
cluster the data points into different strata, and we prove a result which
guarantees the correctness of our clustering, given certain topological
conditions; some geometric intuition for these topological conditions is also
provided. Our correctness result is then given a probabilistic flavor: we give
bounds on the minimum number of sample points required to infer, with
probability, which points belong to the same strata. Finally, we give an
explicit algorithm for the clustering, prove its correctness, and apply it to
some simulated data.Comment: 48 page
Statistical properties of determinantal point processes in high-dimensional Euclidean spaces
The goal of this paper is to quantitatively describe some statistical
properties of higher-dimensional determinantal point processes with a primary
focus on the nearest-neighbor distribution functions. Toward this end, we
express these functions as determinants of matrices and then
extrapolate to . This formulation allows for a quick and accurate
numerical evaluation of these quantities for point processes in Euclidean
spaces of dimension . We also implement an algorithm due to Hough \emph{et.
al.} \cite{hough2006dpa} for generating configurations of determinantal point
processes in arbitrary Euclidean spaces, and we utilize this algorithm in
conjunction with the aforementioned numerical results to characterize the
statistical properties of what we call the Fermi-sphere point process for to 4. This homogeneous, isotropic determinantal point process, discussed
also in a companion paper \cite{ToScZa08}, is the high-dimensional
generalization of the distribution of eigenvalues on the unit circle of a
random matrix from the circular unitary ensemble (CUE). In addition to the
nearest-neighbor probability distribution, we are able to calculate Voronoi
cells and nearest-neighbor extrema statistics for the Fermi-sphere point
process and discuss these as the dimension is varied. The results in this
paper accompany and complement analytical properties of higher-dimensional
determinantal point processes developed in \cite{ToScZa08}.Comment: 42 pages, 17 figure
A Comparative Study of Density Field Estimation for Galaxies: New Insights into the Evolution of Galaxies with Environment in COSMOS out to z~3
It is well-known that galaxy environment has a fundamental effect in shaping
its properties. We study the environmental effects on galaxy evolution, with an
emphasis on the environment defined as the local number density of galaxies.
The density field is estimated with different estimators (weighted adaptive
kernel smoothing, 10 and 5 nearest neighbors, Voronoi and
Delaunay tessellation) for a K24 sample of 190,000 galaxies in the
COSMOS field at 0.1z3.1. The performance of each estimator is evaluated
with extensive simulations. We show that overall, there is a good agreement
between the estimated density fields using different methods over 2 dex
in overdensity values. However, our simulations show that adaptive kernel and
Voronoi tessellation outperform other methods. Using the Voronoi tessellation
method, we assign surface densities to a mass complete sample of quiescent and
star-forming galaxies out to z3. We show that at a fixed stellar mass,
the median color of quiescent galaxies does not depend on their host
environment out to z3. We find that the number and stellar mass density
of massive (10M) star-forming galaxies have not
significantly changed since z3, regardless of their environment. However,
for massive quiescent systems at lower redshifts (z1.3), we find a
significant evolution in the number and stellar mass densities in denser
environments compared to lower density regions. Our results suggest that the
relation between stellar mass and local density is more fundamental than the
color-density relation and that environment plays a significant role in
quenching star formation activity in galaxies at z1.Comment: 20 pages, 11 figures, main figures 4,5,8 and 1
- …