92,873 research outputs found

    On Randomly Projected Hierarchical Clustering with Guarantees

    Full text link
    Hierarchical clustering (HC) algorithms are generally limited to small data instances due to their runtime costs. Here we mitigate this shortcoming and explore fast HC algorithms based on random projections for single (SLC) and average (ALC) linkage clustering as well as for the minimum spanning tree problem (MST). We present a thorough adaptive analysis of our algorithms that improve prior work from O(N2)O(N^2) by up to a factor of N/(logN)2N/(\log N)^2 for a dataset of NN points in Euclidean space. The algorithms maintain, with arbitrary high probability, the outcome of hierarchical clustering as well as the worst-case running-time guarantees. We also present parameter-free instances of our algorithms.Comment: This version contains the conference paper "On Randomly Projected Hierarchical Clustering with Guarantees'', SIAM International Conference on Data Mining (SDM), 2014 and, additionally, proofs omitted in the conference versio

    Applying Cluster Ensemble to Adaptive Tree Structured Clustering

    Get PDF
    Adaptive tree structured clustering (ATSC) is our proposed divisive hierarchical clustering method that recursively divides a data set into 2 subsets using self-organizing feature map (SOM). In each partition, the data set is quantized by SOM and the quantized data is divided using agglomerative hierarchical clustering. ATSC can divide data sets regardless of data size in feasible time. On the other hand clustering result stability of ATSC is equally unstable as other divisive hierarchical clustering and partitioned clustering methods. In this paper, we apply cluster ensemble for each data partition of ATSC in order to improve stability. Cluster ensemble is a framework for improving partitioned clustering stability. As a result of applying cluster ensemble, ATSC yields unique clustering results that could not be yielded by previous hierarchical clustering methods. This is because a different class distances function is used in each division in ATSC

    Adaptive firefly algorithm for hierarchical text clustering

    Get PDF
    Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes

    Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities

    Full text link
    Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)/2 similarities. First, we show that if the intracluster similarities exceed intercluster similarities, then it is possible to correctly determine the hierarchical clustering from as few as 3N log N similarities. We demonstrate this order of magnitude savings in the number of pairwise similarities necessitates sequentially selecting which similarities to obtain in an adaptive fashion, rather than picking them at random. We then propose an active clustering method that is robust to a limited fraction of anomalous similarities, and show how even in the presence of these noisy similarity values we can resolve the hierarchical clustering using only O(N log^2 N) pairwise similarities

    Distributed Clustering Based on Node Density and Distance in Wireless Sensor Networks

    Get PDF
    Wireless Sensor Networks (WSNs) are special type of network with sensing and monitoring the physical parameters with the property of autonomous in nature. To implement this autonomy and network management the common method used is hierarchical clustering. Hierarchical clustering helps for ease access to data collection and forwarding the same to the base station. The proposed Distributed Self-organizing Load Balancing Clustering Algorithm (DSLBCA) for WSNs designed considering the parameters of neighbor distance, residual energy, and node density.  The validity of the DSLBCA has been shown by comparing the network lifetime and energy dissipation with Low Energy Adaptive Clustering Hierarchy (LEACH), and Hybrid Energy Efficient Distributed Clustering (HEED). The proposed algorithm shows improved result in enhancing the life time of the network in both stationary and mobile environment

    Analysis of clustering algorithms for spike sorting of multiunit extracellular recordings

    Get PDF
    Various techniques have been considered in the past to identify distinct spike shapes from mulitunit extracellular recording. These techniques involve adaptive filtering techniques or template matching techniques or hierarchical clustering techniques. In this investigation, we have used Principal Component Analysis followed by various clustering techniques to identify distinct spike shapes. The amplitude filter is used to separate spikes from background neuronal activity. The correlation matrix of the spike data is used to compute principal component wave forms. Each spike is thus represented by the coefficients of principal components. Then, We have used agglomorative hierarchical clustering algorithm to perform the initial clustering of the data set. The clustering results are then refined by the application of the Estimation Maximization Algorithm. The Bayesian Information Criteria(BIC) is used to find out best fit of the model to the data set
    corecore