41,507 research outputs found
IMPACT OF DATA TRANSFORMATION ON THE PERFORMANCE OF DIFFERENT CLUSTERING METHODS AND CLUSTER NUMBER DETERMINATION STATISTICS FOR ANALYZING GENE EXPRESSION PROFILE DATA
We have assessed the impact of 13 different data transformation methods on the performance of four types of clustering methods (partitioning (K-mean), hierarchical distance (Average Linkage), multivariate normal mixture, and non-parametric kernel density) and four cluster number determination statistics (CNDS) (Pseudo F, Pseudo t2, Cubic Clustering Criterion (CCC), and Bayesian Information Criterion (BIC), using both simulated and real gene expression profile data. We found that Square Root, Cubic Root, and Spacing transformations have mostly positive impacts on the performance of the four types of clustering methods whereas Tukey\u27s Bisquare and Interquantile Range have mostly negative impacts. The impacts from other transformation methods are clustering method-specific and data type-specific. The performance of CNDS improves with appropriately transformed data. Multivariate Mixture Clustering and Kernel Density Clustering perform better than K-mean and Average Linkage in grouping both simulated and real gene expression profile data
Application of Cluster Analysis Using Agglomerative Method
Improving the quality of human resources is the main supporting factor in increasing national productivity in various fields and development sectors. The government's productive investment activities that spur the nation's competitiveness in the global era prioritize Indonesia's education development. This study aims to cluster provinces in Indonesia based on educational indicators using the Agglomerative method consisting of the Average Linkage and Ward methods. Data collection is based on documentation techniques obtained from Statistics Indonesia in 2018. Data analysis used hierarchical cluster analysis consisting of data standardization, determining the size of the similarity or dissimilarity between data, the clustering process with a distance matrix, and seeing the characteristics of the cluster results formed. The second clustering method is by doing the initial grouping and determining the excellent cluster based on the average standard deviation ratio to the standard deviation between groups. Clustering results show the Ward method with the number of collections as many as 4 clusters and produces a ratio with a value of 0.01 smaller than the Average Linkage method. It shows that the cluster analysis method using the Ward method has better group accuracy quality than the Average Linkage method
Anytime Hierarchical Clustering
We propose a new anytime hierarchical clustering method that iteratively
transforms an arbitrary initial hierarchy on the configuration of measurements
along a sequence of trees we prove for a fixed data set must terminate in a
chain of nested partitions that satisfies a natural homogeneity requirement.
Each recursive step re-edits the tree so as to improve a local measure of
cluster homogeneity that is compatible with a number of commonly used (e.g.,
single, average, complete) linkage functions. As an alternative to the standard
batch algorithms, we present numerical evidence to suggest that appropriate
adaptations of this method can yield decentralized, scalable algorithms
suitable for distributed/parallel computation of clustering hierarchies and
online tracking of clustering trees applicable to large, dynamically changing
databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a
conferenc
Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods
We quantify the amount of information filtered by different hierarchical
clustering methods on correlations between stock returns comparing it with the
underlying industrial activity structure. Specifically, we apply, for the first
time to financial data, a novel hierarchical clustering approach, the Directed
Bubble Hierarchical Tree and we compare it with other methods including the
Linkage and k-medoids. In particular, by taking the industrial sector
classification of stocks as a benchmark partition, we evaluate how the
different methods retrieve this classification. The results show that the
Directed Bubble Hierarchical Tree can outperform other methods, being able to
retrieve more information with fewer clusters. Moreover, we show that the
economic information is hidden at different levels of the hierarchical
structures depending on the clustering method. The dynamical analysis on a
rolling window also reveals that the different methods show different degrees
of sensitivity to events affecting financial markets, like crises. These
results can be of interest for all the applications of clustering methods to
portfolio optimization and risk hedging.Comment: 31 pages, 17 figure
Hierarchical clustering of speakers into accents with the ACCDIST metric
Hierarchical clustering of speakers by their pronunciation patterns could be a useful technique for the discovery of accents and the relationships between accents and sociological variables. However it is first necessary to ensure that the clustering is not influenced by the physical characteristics of the speakers. In this study a number of approaches to agglomerative hierarchical clustering of 275 speakers from 14 regional accent groups of the British Isles are formally evaluated. The ACCDIST metric is shown to have superior performance both in terms of accent purity in the cluster tree and in terms of the interpretability of the higher-levels of the tree. Although operating from robust spectral envelope features, the ACCDIST measure also showed the least sensitivity to speaker gender. The conclusion is that, if performed with care, hierarchical clustering could be a useful technique for discovery of accent groups from the bottom up
- …