2,595 research outputs found

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

    Structure in the 3D Galaxy Distribution: I. Methods and Example Results

    Full text link
    Three methods for detecting and characterizing structure in point data, such as that generated by redshift surveys, are described: classification using self-organizing maps, segmentation using Bayesian blocks, and density estimation using adaptive kernels. The first two methods are new, and allow detection and characterization of structures of arbitrary shape and at a wide range of spatial scales. These methods should elucidate not only clusters, but also the more distributed, wide-ranging filaments and sheets, and further allow the possibility of detecting and characterizing an even broader class of shapes. The methods are demonstrated and compared in application to three data sets: a carefully selected volume-limited sample from the Sloan Digital Sky Survey redshift data, a similarly selected sample from the Millennium Simulation, and a set of points independently drawn from a uniform probability distribution -- a so-called Poisson distribution. We demonstrate a few of the many ways in which these methods elucidate large scale structure in the distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd

    Clustering comparison of point processes with applications to random geometric models

    Full text link
    In this chapter we review some examples, methods, and recent results involving comparison of clustering properties of point processes. Our approach is founded on some basic observations allowing us to consider void probabilities and moment measures as two complementary tools for capturing clustering phenomena in point processes. As might be expected, smaller values of these characteristics indicate less clustering. Also, various global and local functionals of random geometric models driven by point processes admit more or less explicit bounds involving void probabilities and moment measures, thus aiding the study of impact of clustering of the underlying point process. When stronger tools are needed, directional convex ordering of point processes happens to be an appropriate choice, as well as the notion of (positive or negative) association, when comparison to the Poisson point process is considered. We explain the relations between these tools and provide examples of point processes admitting them. Furthermore, we sketch some recent results obtained using the aforementioned comparison tools, regarding percolation and coverage properties of the Boolean model, the SINR model, subgraph counts in random geometric graphs, and more generally, U-statistics of point processes. We also mention some results on Betti numbers for \v{C}ech and Vietoris-Rips random complexes generated by stationary point processes. A general observation is that many of the results derived previously for the Poisson point process generalise to some "sub-Poisson" processes, defined as those clustering less than the Poisson process in the sense of void probabilities and moment measures, negative association or dcx-ordering.Comment: 44 pages, 4 figure

    Towards Stratification Learning through Homology Inference

    Full text link
    A topological approach to stratification learning is developed for point cloud data drawn from a stratified space. Given such data, our objective is to infer which points belong to the same strata. First we define a multi-scale notion of a stratified space, giving a stratification for each radius level. We then use methods derived from kernel and cokernel persistent homology to cluster the data points into different strata, and we prove a result which guarantees the correctness of our clustering, given certain topological conditions; some geometric intuition for these topological conditions is also provided. Our correctness result is then given a probabilistic flavor: we give bounds on the minimum number of sample points required to infer, with probability, which points belong to the same strata. Finally, we give an explicit algorithm for the clustering, prove its correctness, and apply it to some simulated data.Comment: 48 page

    Statistical properties of determinantal point processes in high-dimensional Euclidean spaces

    Full text link
    The goal of this paper is to quantitatively describe some statistical properties of higher-dimensional determinantal point processes with a primary focus on the nearest-neighbor distribution functions. Toward this end, we express these functions as determinants of N×NN\times N matrices and then extrapolate to NN\to\infty. This formulation allows for a quick and accurate numerical evaluation of these quantities for point processes in Euclidean spaces of dimension dd. We also implement an algorithm due to Hough \emph{et. al.} \cite{hough2006dpa} for generating configurations of determinantal point processes in arbitrary Euclidean spaces, and we utilize this algorithm in conjunction with the aforementioned numerical results to characterize the statistical properties of what we call the Fermi-sphere point process for d=1d = 1 to 4. This homogeneous, isotropic determinantal point process, discussed also in a companion paper \cite{ToScZa08}, is the high-dimensional generalization of the distribution of eigenvalues on the unit circle of a random matrix from the circular unitary ensemble (CUE). In addition to the nearest-neighbor probability distribution, we are able to calculate Voronoi cells and nearest-neighbor extrema statistics for the Fermi-sphere point process and discuss these as the dimension dd is varied. The results in this paper accompany and complement analytical properties of higher-dimensional determinantal point processes developed in \cite{ToScZa08}.Comment: 42 pages, 17 figure

    A Comparative Study of Density Field Estimation for Galaxies: New Insights into the Evolution of Galaxies with Environment in COSMOS out to z~3

    Get PDF
    It is well-known that galaxy environment has a fundamental effect in shaping its properties. We study the environmental effects on galaxy evolution, with an emphasis on the environment defined as the local number density of galaxies. The density field is estimated with different estimators (weighted adaptive kernel smoothing, 10th^{th} and 5th^{th} nearest neighbors, Voronoi and Delaunay tessellation) for a Ks<_{s}<24 sample of \sim190,000 galaxies in the COSMOS field at 0.1<<z<<3.1. The performance of each estimator is evaluated with extensive simulations. We show that overall, there is a good agreement between the estimated density fields using different methods over \sim2 dex in overdensity values. However, our simulations show that adaptive kernel and Voronoi tessellation outperform other methods. Using the Voronoi tessellation method, we assign surface densities to a mass complete sample of quiescent and star-forming galaxies out to z\sim3. We show that at a fixed stellar mass, the median color of quiescent galaxies does not depend on their host environment out to z\sim3. We find that the number and stellar mass density of massive (>>1011^{11}M_{\odot}) star-forming galaxies have not significantly changed since z\sim3, regardless of their environment. However, for massive quiescent systems at lower redshifts (z\lesssim1.3), we find a significant evolution in the number and stellar mass densities in denser environments compared to lower density regions. Our results suggest that the relation between stellar mass and local density is more fundamental than the color-density relation and that environment plays a significant role in quenching star formation activity in galaxies at z\lesssim1.Comment: 20 pages, 11 figures, main figures 4,5,8 and 1
    corecore