17,173 research outputs found

    Balanced Partitions of Trees and Applications

    Get PDF
    We study the k-BALANCED PARTITIONING problem in which the vertices of a graph are to be partitioned into k sets of size at most ceil(n/k) while minimising the cut size, which is the number of edges connecting vertices in different sets. The problem is well studied for general graphs, for which it cannot be approximated within any factor in polynomial time. However, little is known about restricted graph classes. We show that for trees k-BALANCED PARTITIONING remains surprisingly hard. In particular, approximating the cut size is APX-hard even if the maximum degree of the tree is constant. If instead the diameter of the tree is bounded by a constant, we show that it is NP-hard to approximate the cut size within n^c, for any constant c<1. In the face of the hardness results, we show that allowing near-balanced solutions, in which there are at most (1+eps)ceil(n/k) vertices in any of the k sets, admits a PTAS for trees. Remarkably, the computed cut size is no larger than that of an optimal balanced solution. In the final section of our paper, we harness results on embedding graph metrics into tree metrics to extend our PTAS for trees to general graphs. In addition to being conceptually simpler and easier to analyse, our scheme improves the best factor known on the cut size of near-balanced solutions from O(log^{1.5}(n)/eps^2) [Andreev and Räcke TCS 2006] to 0(log n), for weighted graphs. This also settles a question posed by Andreev and Räcke of whether an algorithm with approximation guarantees on the cut size independent from eps exists.ISSN:1868-896

    Dynamic load balancing in parallel KD-tree k-means

    Get PDF
    One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy
    • …
    corecore