41,507 research outputs found

    IMPACT OF DATA TRANSFORMATION ON THE PERFORMANCE OF DIFFERENT CLUSTERING METHODS AND CLUSTER NUMBER DETERMINATION STATISTICS FOR ANALYZING GENE EXPRESSION PROFILE DATA

    Get PDF
    We have assessed the impact of 13 different data transformation methods on the performance of four types of clustering methods (partitioning (K-mean), hierarchical distance (Average Linkage), multivariate normal mixture, and non-parametric kernel density) and four cluster number determination statistics (CNDS) (Pseudo F, Pseudo t2, Cubic Clustering Criterion (CCC), and Bayesian Information Criterion (BIC), using both simulated and real gene expression profile data. We found that Square Root, Cubic Root, and Spacing transformations have mostly positive impacts on the performance of the four types of clustering methods whereas Tukey\u27s Bisquare and Interquantile Range have mostly negative impacts. The impacts from other transformation methods are clustering method-specific and data type-specific. The performance of CNDS improves with appropriately transformed data. Multivariate Mixture Clustering and Kernel Density Clustering perform better than K-mean and Average Linkage in grouping both simulated and real gene expression profile data

    Application of Cluster Analysis Using Agglomerative Method

    Get PDF
    Improving the quality of human resources is the main supporting factor in increasing national productivity in various fields and development sectors. The government's productive investment activities that spur the nation's competitiveness in the global era prioritize Indonesia's education development. This study aims to cluster provinces in Indonesia based on educational indicators using the Agglomerative method consisting of the Average Linkage and Ward methods. Data collection is based on documentation techniques obtained from Statistics Indonesia in 2018. Data analysis used hierarchical cluster analysis consisting of data standardization, determining the size of the similarity or dissimilarity between data, the clustering process with a distance matrix, and seeing the characteristics of the cluster results formed. The second clustering method is by doing the initial grouping and determining the excellent cluster based on the average standard deviation ratio to the standard deviation between groups. Clustering results show the Ward method with the number of collections as many as 4 clusters and produces a ratio with a value of 0.01 smaller than the Average Linkage method. It shows that the cluster analysis method using the Ward method has better group accuracy quality than the Average Linkage method

    Anytime Hierarchical Clustering

    Get PDF
    We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

    Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

    Get PDF
    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing it with the underlying industrial activity structure. Specifically, we apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. In particular, by taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging.Comment: 31 pages, 17 figure

    Hierarchical clustering of speakers into accents with the ACCDIST metric

    Get PDF
    Hierarchical clustering of speakers by their pronunciation patterns could be a useful technique for the discovery of accents and the relationships between accents and sociological variables. However it is first necessary to ensure that the clustering is not influenced by the physical characteristics of the speakers. In this study a number of approaches to agglomerative hierarchical clustering of 275 speakers from 14 regional accent groups of the British Isles are formally evaluated. The ACCDIST metric is shown to have superior performance both in terms of accent purity in the cluster tree and in terms of the interpretability of the higher-levels of the tree. Although operating from robust spectral envelope features, the ACCDIST measure also showed the least sensitivity to speaker gender. The conclusion is that, if performed with care, hierarchical clustering could be a useful technique for discovery of accent groups from the bottom up
    • …
    corecore