16,433 research outputs found

    A Practical Approach of Diffusion Load Balancing Algorithms

    Get PDF
    In this paper, a practical approach of diffusion load balancing algorithms and its implementation are studied. Three problems are investigated. The first is the determination of the load balancing parameters without any global knowledge. The second problem consists in estimating the cost and the benefit of a load exchange. The last one studies the convergence detection of the load balancing algorithm. For this last point we give an algorithm based on simulated annealing to reduce the convergence towards a load repartition in steps that can be done with discrete loads. Several simulations close this paper and illustrate the impact of the various methods and algorithms introduced

    Recent Advances in Graph Partitioning

    Full text link
    We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions

    Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures

    Get PDF
    A new solver featuring time-space adaptation and error control has been recently introduced to tackle the numerical solution of stiff reaction-diffusion systems. Based on operator splitting, finite volume adaptive multiresolution and high order time integrators with specific stability properties for each operator, this strategy yields high computational efficiency for large multidimensional computations on standard architectures such as powerful workstations. However, the data structure of the original implementation, based on trees of pointers, provides limited opportunities for efficiency enhancements, while posing serious challenges in terms of parallel programming and load balancing. The present contribution proposes a new implementation of the whole set of numerical methods including Radau5 and ROCK4, relying on a fully different data structure together with the use of a specific library, TBB, for shared-memory, task-based parallelism with work-stealing. The performance of our implementation is assessed in a series of test-cases of increasing difficulty in two and three dimensions on multi-core and many-core architectures, demonstrating high scalability

    Asymptotically Optimal Load Balancing Topologies

    Full text link
    We consider a system of NN servers inter-connected by some underlying graph topology GNG_N. Tasks arrive at the various servers as independent Poisson processes of rate λ\lambda. Each incoming task is irrevocably assigned to whichever server has the smallest number of tasks among the one where it appears and its neighbors in GNG_N. Tasks have unit-mean exponential service times and leave the system upon service completion. The above model has been extensively investigated in the case GNG_N is a clique. Since the servers are exchangeable in that case, the queue length process is quite tractable, and it has been proved that for any λ<1\lambda < 1, the fraction of servers with two or more tasks vanishes in the limit as N→∞N \to \infty. For an arbitrary graph GNG_N, the lack of exchangeability severely complicates the analysis, and the queue length process tends to be worse than for a clique. Accordingly, a graph GNG_N is said to be NN-optimal or N\sqrt{N}-optimal when the occupancy process on GNG_N is equivalent to that on a clique on an NN-scale or N\sqrt{N}-scale, respectively. We prove that if GNG_N is an Erd\H{o}s-R\'enyi random graph with average degree d(N)d(N), then it is with high probability NN-optimal and N\sqrt{N}-optimal if d(N)→∞d(N) \to \infty and d(N)/(Nlog⁥(N))→∞d(N) / (\sqrt{N} \log(N)) \to \infty as N→∞N \to \infty, respectively. This demonstrates that optimality can be maintained at NN-scale and N\sqrt{N}-scale while reducing the number of connections by nearly a factor NN and N/log⁥(N)\sqrt{N} / \log(N) compared to a clique, provided the topology is suitably random. It is further shown that if GNG_N contains Θ(N)\Theta(N) bounded-degree nodes, then it cannot be NN-optimal. In addition, we establish that an arbitrary graph GNG_N is NN-optimal when its minimum degree is N−o(N)N - o(N), and may not be NN-optimal even when its minimum degree is cN+o(N)c N + o(N) for any 0<c<1/20 < c < 1/2.Comment: A few relevant results from arXiv:1612.00723 are included for convenienc

    Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

    Full text link
    The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page
    • 

    corecore