Search CORE

234,278 research outputs found

Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

Author: Hári Péter
Katona Gyula Y
Málnási-Csizmadia András
Zahoránszky László A
Zahoránszky-Köhalmi Gergely
Zweig Katharina A
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results In this article we provide a general selection scheme, the <it>level independent clustering selection method</it>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <it>cohesive clusters</it>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ELTE Digital Institutional Repository (EDIT)

Distribution-Based Trajectory Clustering

Author: Ting Kai Ming
Wang Zi Jing
Zhu Ye
Publication venue
Publication date: 30/10/2023
Field of study

Trajectory clustering enables the discovery of common patterns in trajectory data. Current methods of trajectory clustering rely on a distance measure between two points in order to measure the dissimilarity between two trajectories. The distance measures employed have two challenges: high computational cost and low fidelity. Independent of the distance measure employed, existing clustering algorithms have another challenge: either effectiveness issues or high time complexity. In this paper, we propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges. The new IDK-based clustering algorithm, called TIDKC, makes full use of the distributional kernel for trajectory similarity measuring and clustering. TIDKC identifies non-linearly separable clusters with irregular shapes and varied densities in linear time. It does not rely on random initialisation and is robust to outliers. An extensive evaluation on 7 large real-world trajectory datasets confirms that IDK is more effective in capturing complex structures in trajectories than traditional and deep learning-based distance measures. Furthermore, the proposed TIDKC has superior clustering performance and efficiency to existing trajectory clustering algorithms

arXiv.org e-Print Archive

A static test compaction technique for combinational circuits based on independent fault clustering

Author: El-Maleh A.H.
Osais Y.E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2003
Field of study

Testing system-on-chip involves applying huge amounts of test data, which is stored in the tester memory and then transferred to the circuit under test during test application. Therefore, practical techniques, such as test compression and compaction, are required to reduce the amount of test data in order to reduce both the total testing time and the memory requirements for the tester. In this paper, a new static compaction algorithm for combinational circuits is presented. The algorithm is referred to as independent fault clustering. It is based on a new concept called test vector decomposition. Experimental results for benchmark circuits demonstrate the effectiveness of the new static compaction algorithm

KFUPM ePrints

Scalable Hierarchical Clustering with Tree Grafting

Author: Glass Michael
Kobren Ari
Krishnamurthy Akshay
McCallum Andrew
Monath Nicholas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/12/2019
Field of study

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.Comment: 23 pages (appendix included), published at KDD 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

The role of human factors in stereotyping behavior and perception of digital library users: A robust clustering approach

Author: Chen SY
Frias-Martinez E
Liu X
Macredie RD
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 03/04/2007
Field of study

To deliver effective personalization for digital library users, it is necessary to identify which human factors are most relevant in determining the behavior and perception of these users. This paper examines three key human factors: cognitive styles, levels of expertise and gender differences, and utilizes three individual clustering techniques: k-means, hierarchical clustering and fuzzy clustering to understand user behavior and perception. Moreover, robust clustering, capable of correcting the bias of individual clustering techniques, is used to obtain a deeper understanding. The robust clustering approach produced results that highlighted the relevance of cognitive style for user behavior, i.e., cognitive style dominates and justifies each of the robust clusters created. We also found that perception was mainly determined by the level of expertise of a user. We conclude that robust clustering is an effective technique to analyze user behavior and perception

Brunel University Research Archive