144,126 research outputs found

    Hearing the clusters in a graph: A distributed algorithm

    Full text link
    We propose a novel distributed algorithm to cluster graphs. The algorithm recovers the solution obtained from spectral clustering without the need for expensive eigenvalue/vector computations. We prove that, by propagating waves through the graph, a local fast Fourier transform yields the local component of every eigenvector of the Laplacian matrix, thus providing clustering information. For large graphs, the proposed algorithm is orders of magnitude faster than random walk based approaches. We prove the equivalence of the proposed algorithm to spectral clustering and derive convergence rates. We demonstrate the benefit of using this decentralized clustering algorithm for community detection in social graphs, accelerating distributed estimation in sensor networks and efficient computation of distributed multi-agent search strategies

    BigFCM: Fast, Precise and Scalable FCM on Hadoop

    Full text link
    Clustering plays an important role in mining big data both as a modeling technique and a preprocessing step in many data mining process implementations. Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing each data record to belong to more than one cluster to some degree. However, a serious challenge in fuzzy clustering is the lack of scalability. Massive datasets in emerging fields such as geosciences, biology and networking do require parallel and distributed computations with high performance to solve real-world problems. Although some clustering methods are already improved to execute on big data platforms, but their execution time is highly increased for large datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering named BigFCM is proposed and designed for the Hadoop distributed data platform. Based on the map-reduce programming model, it exploits several mechanisms including an efficient caching design to achieve several orders of magnitude reduction in execution time. Extensive evaluation over multi-gigabyte datasets shows that BigFCM is scalable while it preserves the quality of clustering

    Non-global logarithms in inter-jet energy flow with kt clustering requirement

    Get PDF
    Recent work in inter-jet energy flow has identified a class of leading logarithms previously not considered in the literature. These so-called non-global logarithms have been shown to have significant numerical impact on gaps-between-jets calculations at the energies of current particle colliders. Here we calculate, at fixed order and to all orders, the effect of applying clustering to the gluonic final state responsible for these logarithms for a trivial colour flow 2 jet system. Such a clustering algorithm has already been used for experimental measurements at HERA. We find that the impact of the non-global logarithms is reduced, but not removed, when clustering is demanded, a result which is of considerable interest for energy flow observable calculations.Comment: 13 pages, 4 figure
    corecore