157,836 research outputs found

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

    Speaker segmentation and clustering

    Get PDF
    This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

    SDSS-III Baryon Oscillation Spectroscopic Survey data release 12 : galaxy target selection and large-scale structure catalogues

    Get PDF
    The Baryon Oscillation Spectroscopic Survey (BOSS), part of the Sloan Digital Sky Survey (SDSS) III project, has provided the largest survey of galaxy redshifts available to date, in terms of both the number of galaxy redshifts measured by a single survey, and the effective cosmological volume covered. Key to analysing the clustering of these data to provide cosmological measurements is understanding the detailed properties of this sample. Potential issues include variations in the target catalogue caused by changes either in the targeting algorithm or properties of the data used, the pattern of spectroscopic observations, the spatial distribution of targets for which redshifts were not obtained, and variations in the target sky density due to observational systematics. We document here the target selection algorithms used to create the galaxy samples that comprise BOSS. We also present the algorithms used to create large-scale structure catalogues for the final Data Release (DR12) samples and the associated random catalogues that quantify the survey mask. The algorithms are an evolution of those used by the BOSS team to construct catalogues from earlier data, and have been designed to accurately quantify the galaxy sample. The code used, designated mksample, is released with this paper.Publisher PDFPeer reviewe
    corecore