182,293 research outputs found

    On Randomly Projected Hierarchical Clustering with Guarantees

    Full text link
    Hierarchical clustering (HC) algorithms are generally limited to small data instances due to their runtime costs. Here we mitigate this shortcoming and explore fast HC algorithms based on random projections for single (SLC) and average (ALC) linkage clustering as well as for the minimum spanning tree problem (MST). We present a thorough adaptive analysis of our algorithms that improve prior work from O(N2)O(N^2) by up to a factor of N/(logN)2N/(\log N)^2 for a dataset of NN points in Euclidean space. The algorithms maintain, with arbitrary high probability, the outcome of hierarchical clustering as well as the worst-case running-time guarantees. We also present parameter-free instances of our algorithms.Comment: This version contains the conference paper "On Randomly Projected Hierarchical Clustering with Guarantees'', SIAM International Conference on Data Mining (SDM), 2014 and, additionally, proofs omitted in the conference versio

    Algorithms of maximum likelihood data clustering with applications

    Full text link
    We address the problem of data clustering by introducing an unsupervised, parameter free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that i) it is parameter free, ii) the number of clusters need not be fixed in advance and iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: Time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.Comment: Accepted by Physica A; 12 pag., 5 figures. More information at: http://www.sissa.it/dataclusterin

    A Novel Clustering Algorithm Based on Quantum Games

    Full text link
    Enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum game with the problem of data clustering, and then develop a quantum-game-based clustering algorithm, in which data points in a dataset are considered as players who can make decisions and implement quantum strategies in quantum games. After each round of a quantum game, each player's expected payoff is calculated. Later, he uses a link-removing-and-rewiring (LRR) function to change his neighbors and adjust the strength of links connecting to them in order to maximize his payoff. Further, algorithms are discussed and analyzed in two cases of strategies, two payoff matrixes and two LRR functions. Consequently, the simulation results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms have fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.Comment: 19 pages, 5 figures, 5 table

    Review on recent developments in jet finding

    Full text link
    We review recent developments related to jet clustering algorithms and jet finding. These include fast implementations of sequential recombination algorithms, new IRC safe algorithms, quantitative determination of jet areas and quality measures for jet finding, among many others. We also briefly discuss the status of jet finding in heavy ion collisions, where full QCD jets have been measured for the first time at RHIC.Comment: 5 pages, 5 figures, proceedings of the International Symposium on Multiparticle Dynamics 08, 15-20 september 2008, DES
    corecore