234,278 research outputs found
Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods
<p>Abstract</p> <p>Background</p> <p>Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.</p> <p>Results</p> <p>In this article we provide a general selection scheme, the <it>level independent clustering selection method</it>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <it>cohesive clusters</it>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection.</p> <p>Conclusion</p> <p>Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.</p
Distribution-Based Trajectory Clustering
Trajectory clustering enables the discovery of common patterns in trajectory
data. Current methods of trajectory clustering rely on a distance measure
between two points in order to measure the dissimilarity between two
trajectories. The distance measures employed have two challenges: high
computational cost and low fidelity. Independent of the distance measure
employed, existing clustering algorithms have another challenge: either
effectiveness issues or high time complexity. In this paper, we propose to use
a recent Isolation Distributional Kernel (IDK) as the main tool to meet all
three challenges. The new IDK-based clustering algorithm, called TIDKC, makes
full use of the distributional kernel for trajectory similarity measuring and
clustering. TIDKC identifies non-linearly separable clusters with irregular
shapes and varied densities in linear time. It does not rely on random
initialisation and is robust to outliers. An extensive evaluation on 7 large
real-world trajectory datasets confirms that IDK is more effective in capturing
complex structures in trajectories than traditional and deep learning-based
distance measures. Furthermore, the proposed TIDKC has superior clustering
performance and efficiency to existing trajectory clustering algorithms
A static test compaction technique for combinational circuits based on independent fault clustering
Testing system-on-chip involves applying huge amounts of test data, which is stored in the tester memory and then transferred to the circuit under test during test application. Therefore, practical techniques, such as test compression and compaction, are required to reduce the amount of test data in order to reduce both the total testing time and the memory requirements for the tester. In this paper, a new static compaction algorithm for combinational circuits is presented. The algorithm is referred to as independent fault clustering. It is based on a new concept called test vector decomposition. Experimental results for benchmark circuits demonstrate the effectiveness of the new static compaction algorithm
Scalable Hierarchical Clustering with Tree Grafting
We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical
clustering with general linkage functions that compute arbitrary similarity
between two point sets. The key components of Grinch are its rotate and graft
subroutines that efficiently reconfigure the hierarchy as new points arrive,
supporting discovery of clusters with complex structure. Grinch is motivated by
a new notion of separability for clustering with linkage functions: we prove
that when the model is consistent with a ground-truth clustering, Grinch is
guaranteed to produce a cluster tree containing the ground-truth, independent
of data arrival order. Our empirical results on benchmark and author
coreference datasets (with standard and learned linkage functions) show that
Grinch is more accurate than other scalable methods, and orders of magnitude
faster than hierarchical agglomerative clustering.Comment: 23 pages (appendix included), published at KDD 201
Recommended from our members
The role of human factors in stereotyping behavior and perception of digital library users: A robust clustering approach
To deliver effective personalization for digital library users, it is necessary to identify which human factors are most relevant in determining the behavior and perception of these users. This paper examines three key human factors: cognitive styles, levels of expertise and gender differences, and utilizes three individual clustering techniques: k-means, hierarchical clustering and fuzzy clustering to understand user behavior and perception. Moreover, robust clustering, capable of correcting the bias of individual clustering techniques, is used to obtain a deeper understanding. The robust clustering approach produced results that highlighted the relevance of cognitive style for user behavior, i.e., cognitive style dominates and justifies each of the robust clusters created. We also found that perception was mainly determined by the level of expertise of a user. We conclude that robust clustering is an effective technique to analyze user behavior and perception
- …