32,297 research outputs found

    Clustering Methods and Their Applications to Adolescent Healthcare Data

    Get PDF
    Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data

    Clustering and Community Detection in Directed Networks: A Survey

    Full text link
    Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved
    corecore