70,173 research outputs found

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

    A kernelized genetic algorithm decision tree with information criteria

    Get PDF
    Decision trees are one of the most widely used data mining models with a long history in machine learning, statistics, and pattern recognition. A main advantage of the decision trees is that the resulting data partitioning model can be easily understood by both the data analyst and customer. This is in comparison to some more powerful kernel related models such as Radial Basis Function (RBF) Networks and Support Vector Machines. In recent literature, the decision tree has been used as part of a two-step training algorithm for RBF networks. However, the primary function of the decision tree is not model visualization but dividing the input data into initial potential radial basis spaces. In this dissertation, the kernel trick using Mercer\u27s condition is applied during the splitting of the input data through the guidance of a decision tree. This allows the algorithm to search for the best split using the projected feature space information while remaining in the current data space. The decision tree will capture the information of the linear split in the projected feature space and present the corresponding non-linear split of the input data space. Using a genetic search algorithm, Bozdogan\u27s Information Complexity criterion (ICOMP) performs as a fitness function to determine the best splits, control model complexity, subset input variables, and decide the optimal choice of kernel function. The decision tree is then applied to radial basis function networks in the areas of regression, nominal classification, and ordinal prediction

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
    corecore