7,113 research outputs found

    Unsupervised and semi-supervised fuzzy clustering with multiple kernels.

    Get PDF
    For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Recently, kernel-based clustering has been proposed to perform clustering in a higher-dimensional feature space spanned by embedding maps and corresponding kernel functions. Although good results were obtained using the Gaussian kernel function, its performance depends on the selection of the scaling parameter among an extensive range of possibilities. This step is often heavily influenced by prior knowledge about the data and by the patterns we expect to discover. Unfortunately, it is often unclear which kernels are more suitable for a particular task. The problem is aggravated for many real-world clustering applications, in which the distributions of the different clusters in the feature space exhibit large variations. Thus, in the absence of a priori knowledge, a single kernel selected from a predefined group is sometimes insufficient to represent the data. One way to learn optimal scaling parameters is through an exhaustive search of one optimal scaling parameter for each cluster. However, this approach is not practical since it is computationally expensive, especially when the data includes a large number of clusters and when the dynamic range of possible values of the scaling parameters is large. Moreover, the evaluation of the resulting partition in order to select the optimal parameters is not an easy task. To overcome the above drawbacks, we introduce two novel fuzzy clustering techniques that use Multiple Kernel Learning to provide an elegant solution for parameter selection. The Fuzzy C-Means with Multiple Kernels algorithm (FCMK) simultaneously finds the optimal partition and the cluster-dependent kernel combination weights that reflect the intrinsic structure of the data. The Relational Fuzzy Clustering with Multiple Kernels (RFCMK) learns the kernel combination weights by optimizing the relational dissimilarities. Consequently, the learned kernel combination weights reflect the relative density, size, and position of each cluster with respect to the other clusters. We also extended FCMK and RFCMK to the semi-supervised paradigms. We show that the incorporation of prior knowledge in the unsupervised clustering task in the form of a small set of constraints on which instances should or should not reside in the same cluster, guides the unsupervised approaches to a better partitioning of the data and avoid local minima, especially for high dimensional real world data. All of the proposed algorithms are optimized iteratively by dynamically updating the partition and the kernel combination weights in each iteration. This makes these algorithms simple and fast. Moreover, our algorithms are formulated to work on both vector and relational data. This makes them applicable to data where objects cannot be represented by vectors or when clusters of similar objects cannot be represented efficiently by a single prototype. We also introduced two relational fuzzy clustering with multiple kernel algorithms for large data to deal with the scalability issue of RFCMK. The random sample and extend RFCMK (rseRFCMK) computes cluster prototypes from a smaller sample of randomly selected objects, and then extends the partition to the remainder of the data. The single pass RFCMK (spRFCMK) sequentially loads manageable sized chunks, clustering the chunks in a single pass, and then combining the results from each chunk. Our extensive experiments show that RFCMK and SS-RFCMK outperform existing algorithms. In particular, we show that when data include clusters with various intrinsic structures and densities, learning kernel weights that vary over clusters is crucial in obtaining a good partition

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

    Self-weighted Multiple Kernel Learning for Graph-based Clustering and Semi-supervised Classification

    Full text link
    Multiple kernel learning (MKL) method is generally believed to perform better than single kernel method. However, some empirical studies show that this is not always true: the combination of multiple kernels may even yield an even worse performance than using a single kernel. There are two possible reasons for the failure: (i) most existing MKL methods assume that the optimal kernel is a linear combination of base kernels, which may not hold true; and (ii) some kernel weights are inappropriately assigned due to noises and carelessly designed algorithms. In this paper, we propose a novel MKL framework by following two intuitive assumptions: (i) each kernel is a perturbation of the consensus kernel; and (ii) the kernel that is close to the consensus kernel should be assigned a large weight. Impressively, the proposed method can automatically assign an appropriate weight to each kernel without introducing additional parameters, as existing methods do. The proposed framework is integrated into a unified framework for graph-based clustering and semi-supervised classification. We have conducted experiments on multiple benchmark datasets and our empirical results verify the superiority of the proposed framework.Comment: Accepted by IJCAI 2018, Code is availabl
    corecore