10,114 research outputs found

    Dynamic load balancing in parallel KD-tree k-means

    Get PDF
    One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

    Self-Stabilizing Token Distribution with Constant-Space for Trees

    Get PDF
    Self-stabilizing and silent distributed algorithms for token distribution in rooted tree networks are given. Initially, each process of a graph holds at most l tokens. Our goal is to distribute the tokens in the whole network so that every process holds exactly k tokens. In the initial configuration, the total number of tokens in the network may not be equal to nk where n is the number of processes in the network. The root process is given the ability to create a new token or remove a token from the network. We aim to minimize the convergence time, the number of token moves, and the space complexity. A self-stabilizing token distribution algorithm that converges within O(n l) asynchronous rounds and needs Theta(nh epsilon) redundant (or unnecessary) token moves is given, where epsilon = min(k,l-k) and h is the height of the tree network. Two novel ideas to reduce the number of redundant token moves are presented. One reduces the number of redundant token moves to O(nh) without any additional costs while the other reduces the number of redundant token moves to O(n), but increases the convergence time to O(nh l). All algorithms given have constant memory at each process and each link register

    Improved Analysis of Deterministic Load-Balancing Schemes

    Full text link
    We consider the problem of deterministic load balancing of tokens in the discrete model. A set of nn processors is connected into a dd-regular undirected network. In every time step, each processor exchanges some of its tokens with each of its neighbors in the network. The goal is to minimize the discrepancy between the number of tokens on the most-loaded and the least-loaded processor as quickly as possible. Rabani et al. (1998) present a general technique for the analysis of a wide class of discrete load balancing algorithms. Their approach is to characterize the deviation between the actual loads of a discrete balancing algorithm with the distribution generated by a related Markov chain. The Markov chain can also be regarded as the underlying model of a continuous diffusion algorithm. Rabani et al. showed that after time T=O(log(Kn)/μ)T = O(\log (Kn)/\mu), any algorithm of their class achieves a discrepancy of O(dlogn/μ)O(d\log n/\mu), where μ\mu is the spectral gap of the transition matrix of the graph, and KK is the initial load discrepancy in the system. In this work we identify some natural additional conditions on deterministic balancing algorithms, resulting in a class of algorithms reaching a smaller discrepancy. This class contains well-known algorithms, eg., the Rotor-Router. Specifically, we introduce the notion of cumulatively fair load-balancing algorithms where in any interval of consecutive time steps, the total number of tokens sent out over an edge by a node is the same (up to constants) for all adjacent edges. We prove that algorithms which are cumulatively fair and where every node retains a sufficient part of its load in each step, achieve a discrepancy of O(min{dlogn/μ,dn})O(\min\{d\sqrt{\log n/\mu},d\sqrt{n}\}) in time O(T)O(T). We also show that in general neither of these assumptions may be omitted without increasing discrepancy. We then show by a combinatorial potential reduction argument that any cumulatively fair scheme satisfying some additional assumptions achieves a discrepancy of O(d)O(d) almost as quickly as the continuous diffusion process. This positive result applies to some of the simplest and most natural discrete load balancing schemes.Comment: minor corrections; updated literature overvie

    Fast Discrete Consensus Based on Gossip for Makespan Minimization in Networked Systems

    Get PDF
    In this paper we propose a novel algorithm to solve the discrete consensus problem, i.e., the problem of distributing evenly a set of tokens of arbitrary weight among the nodes of a networked system. Tokens are tasks to be executed by the nodes and the proposed distributed algorithm minimizes monotonically the makespan of the assigned tasks. The algorithm is based on gossip-like asynchronous local interactions between the nodes. The convergence time of the proposed algorithm is superior with respect to the state of the art of discrete and quantized consensus by at least a factor O(n) in both theoretical and empirical comparisons
    corecore