234,278 research outputs found

    Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.</p> <p>Results</p> <p>In this article we provide a general selection scheme, the <it>level independent clustering selection method</it>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <it>cohesive clusters</it>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection.</p> <p>Conclusion</p> <p>Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.</p

    Distribution-Based Trajectory Clustering

    Full text link
    Trajectory clustering enables the discovery of common patterns in trajectory data. Current methods of trajectory clustering rely on a distance measure between two points in order to measure the dissimilarity between two trajectories. The distance measures employed have two challenges: high computational cost and low fidelity. Independent of the distance measure employed, existing clustering algorithms have another challenge: either effectiveness issues or high time complexity. In this paper, we propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges. The new IDK-based clustering algorithm, called TIDKC, makes full use of the distributional kernel for trajectory similarity measuring and clustering. TIDKC identifies non-linearly separable clusters with irregular shapes and varied densities in linear time. It does not rely on random initialisation and is robust to outliers. An extensive evaluation on 7 large real-world trajectory datasets confirms that IDK is more effective in capturing complex structures in trajectories than traditional and deep learning-based distance measures. Furthermore, the proposed TIDKC has superior clustering performance and efficiency to existing trajectory clustering algorithms

    A static test compaction technique for combinational circuits based on independent fault clustering

    Get PDF
    Testing system-on-chip involves applying huge amounts of test data, which is stored in the tester memory and then transferred to the circuit under test during test application. Therefore, practical techniques, such as test compression and compaction, are required to reduce the amount of test data in order to reduce both the total testing time and the memory requirements for the tester. In this paper, a new static compaction algorithm for combinational circuits is presented. The algorithm is referred to as independent fault clustering. It is based on a new concept called test vector decomposition. Experimental results for benchmark circuits demonstrate the effectiveness of the new static compaction algorithm

    Scalable Hierarchical Clustering with Tree Grafting

    Full text link
    We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.Comment: 23 pages (appendix included), published at KDD 201
    corecore