16,433 research outputs found
A Practical Approach of Diffusion Load Balancing Algorithms
In this paper, a practical approach of diffusion load balancing algorithms and its implementation are studied. Three problems are investigated. The first is the determination of the load balancing parameters without any global knowledge. The second problem consists in estimating the cost and the benefit of a load exchange. The last one studies the convergence detection of the load balancing algorithm. For this last point we give an algorithm based on simulated annealing to reduce the convergence towards a load repartition in steps that can be done with discrete loads. Several simulations close this paper and illustrate the impact of the various methods and algorithms introduced
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures
A new solver featuring time-space adaptation and error control has been
recently introduced to tackle the numerical solution of stiff
reaction-diffusion systems. Based on operator splitting, finite volume adaptive
multiresolution and high order time integrators with specific stability
properties for each operator, this strategy yields high computational
efficiency for large multidimensional computations on standard architectures
such as powerful workstations. However, the data structure of the original
implementation, based on trees of pointers, provides limited opportunities for
efficiency enhancements, while posing serious challenges in terms of parallel
programming and load balancing. The present contribution proposes a new
implementation of the whole set of numerical methods including Radau5 and
ROCK4, relying on a fully different data structure together with the use of a
specific library, TBB, for shared-memory, task-based parallelism with
work-stealing. The performance of our implementation is assessed in a series of
test-cases of increasing difficulty in two and three dimensions on multi-core
and many-core architectures, demonstrating high scalability
Asymptotically Optimal Load Balancing Topologies
We consider a system of servers inter-connected by some underlying graph
topology . Tasks arrive at the various servers as independent Poisson
processes of rate . Each incoming task is irrevocably assigned to
whichever server has the smallest number of tasks among the one where it
appears and its neighbors in . Tasks have unit-mean exponential service
times and leave the system upon service completion.
The above model has been extensively investigated in the case is a
clique. Since the servers are exchangeable in that case, the queue length
process is quite tractable, and it has been proved that for any ,
the fraction of servers with two or more tasks vanishes in the limit as . For an arbitrary graph , the lack of exchangeability severely
complicates the analysis, and the queue length process tends to be worse than
for a clique. Accordingly, a graph is said to be -optimal or
-optimal when the occupancy process on is equivalent to that on
a clique on an -scale or -scale, respectively.
We prove that if is an Erd\H{o}s-R\'enyi random graph with average
degree , then it is with high probability -optimal and
-optimal if and as , respectively. This demonstrates that optimality can
be maintained at -scale and -scale while reducing the number of
connections by nearly a factor and compared to a
clique, provided the topology is suitably random. It is further shown that if
contains bounded-degree nodes, then it cannot be -optimal.
In addition, we establish that an arbitrary graph is -optimal when its
minimum degree is , and may not be -optimal even when its minimum
degree is for any .Comment: A few relevant results from arXiv:1612.00723 are included for
convenienc
Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster
The availability of large number of processing nodes in a parallel and
distributed computing environment enables sophisticated real time processing
over high speed data streams, as required by many emerging applications.
Sliding window stream joins are among the most important operators in a stream
processing system. In this paper, we consider the issue of parallelizing a
sliding window stream join operator over a shared nothing cluster. We propose a
framework, based on fixed or predefined communication pattern, to distribute
the join processing loads over the shared-nothing cluster. We consider various
overheads while scaling over a large number of nodes, and propose solution
methodologies to cope with the issues. We implement the algorithm over a
cluster using a message passing system, and present the experimental results
showing the effectiveness of the join processing algorithm.Comment: 11 page
- âŠ