Search CORE

29,961 research outputs found

Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures

Author: Descombes Stéphane
Duarte Max
Dumont Thierry
Guillet Thomas
Louvet Violaine
Massot Marc
Publication venue: 'Cellule MathDoc/CEDRAM'
Publication date: 14/10/2016
Field of study

A new solver featuring time-space adaptation and error control has been recently introduced to tackle the numerical solution of stiff reaction-diffusion systems. Based on operator splitting, finite volume adaptive multiresolution and high order time integrators with specific stability properties for each operator, this strategy yields high computational efficiency for large multidimensional computations on standard architectures such as powerful workstations. However, the data structure of the original implementation, based on trees of pointers, provides limited opportunities for efficiency enhancements, while posing serious challenges in terms of parallel programming and load balancing. The present contribution proposes a new implementation of the whole set of numerical methods including Radau5 and ROCK4, relying on a fully different data structure together with the use of a specific library, TBB, for shared-memory, task-based parallelism with work-stealing. The performance of our implementation is assessed in a series of test-cases of increasing difficulty in two and three dimensions on multi-core and many-core architectures, demonstrating high scalability

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL-UJM

The SMAI journal of computational mathematics

Numérisation de Documents Anciens Mathématiques

Hal-Diderot

HAL-Polytechnique

HAL-Rennes 1

WebWave: Globally Load Balanced Fully Distributed Caching of Hot Published Documents

Author: Heddaya Abdelsalam
Mirdad Sulaiman
Publication venue: Boston University Computer Science Department
Publication date: 10/10/1996
Field of study

Document publication service over such a large network as the Internet challenges us to harness available server and network resources to meet fast growing demand. In this paper, we show that large-scale dynamic caching can be employed to globally minimize server idle time, and hence maximize the aggregate server throughput of the whole service. To be efficient, scalable and robust, a successful caching mechanism must have three properties: (1) maximize the global throughput of the system, (2) find cache copies without recourse to a directory service, or to a discovery protocol, and (3) be completely distributed in the sense of operating only on the basis of local information. In this paper, we develop a precise definition, which we call tree load-balance (TLB), of what it means for a mechanism to satisfy these three goals. We present an algorithm that computes TLB off-line, and a distributed protocol that induces a load distribution that converges quickly to a TLB one. Both algorithms place cache copies of immutable documents, on the routing tree that connects the cached document's home server to its clients, thus enabling requests to stumble on cache copies en route to the home server.Harvard University; The Saudi Cultural Mission to the U.S.A

Boston University Institutional Repository (OpenBU)

An adaptive hierarchical domain decomposition method for parallel contact dynamics simulations of granular materials

Author: Allen
Anitescu
Brendel
Calvetti
Cundall
Deng
Dietrich E. Wolf
Fleissner
Haff
Iglberger
Iglberger
Jean
Joer
Jourdan
János Török
Kadau
Kaufman
Knudsen
Lothar Brendel
Luding
Lötstedt
M. Reza Shaebani
McNamara
Miller
Miller
Moreau
Mueth
Nassi
Nyland
Plimpton
Plimpton
Press
Radjai
Radjai
Rapaport
Renouf
Revathi
Rock
Shaebani
Shaebani
Stewart
Stewart
Stewart
Unger
Unger
Unger
Wackenhut
Walton
Zahra Shojaaee
Publication venue: 'Elsevier BV'
Publication date: 28/12/2011
Field of study

A fully parallel version of the contact dynamics (CD) method is presented in this paper. For large enough systems, 100% efficiency has been demonstrated for up to 256 processors using a hierarchical domain decomposition with dynamic load balancing. The iterative scheme to calculate the contact forces is left domain-wise sequential, with data exchange after each iteration step, which ensures its stability. The number of additional iterations required for convergence by the partially parallel updates at the domain boundaries becomes negligible with increasing number of particles, which allows for an effective parallelization. Compared to the sequential implementation, we found no influence of the parallelization on simulation results.Comment: 19 pages, 15 figures, published in Journal of Computational Physics (2011

arXiv.org e-Print Archive

Crossref

Locally Optimal Load Balancing

Author: A Czygrinow
A Sinclair
B Ghosh
B Ghosh
B Vöcking
D Dhar
D Peleg
D Peleg
LP Kadanoff
NJA Harvey
P Bak
P Floréen
R Anderson
RM Karp
S Boyd
S Khuller
S Muthukrishnan
Y Azar
Publication venue
Publication date: 16/02/2015
Field of study

This work studies distributed algorithms for locally optimal load-balancing: We are given a graph of maximum degree

\Delta

, and each node has up to

L

units of load. The task is to distribute the load more evenly so that the loads of adjacent nodes differ by at most

1

. If the graph is a path (

\Delta = 2

), it is easy to solve the fractional version of the problem in

O(L)

communication rounds, independently of the number of nodes. We show that this is tight, and we show that it is possible to solve also the discrete version of the problem in

O(L)

rounds in paths. For the general case (

\Delta > 2

), we show that fractional load balancing can be solved in

\operatorname{poly}(L,\Delta)

rounds and discrete load balancing in

f(L,\Delta)

rounds for some function

f

, independently of the number of nodes.Comment: 19 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System

Author: Mukherjee Debankur
Stolyar Alexander
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/03/2018
Field of study

We consider the model of a token-based joint auto-scaling and load balancing strategy, proposed in a recent paper by Mukherjee, Dhara, Borst, and van Leeuwaarden (SIGMETRICS '17, arXiv:1703.08373), which offers an efficient scalable implementation and yet achieves asymptotically optimal steady-state delay performance and energy consumption as the number of servers

N\to\infty

. In the above work, the asymptotic results are obtained under the assumption that the queues have fixed-size finite buffers, and therefore the fundamental question of stability of the proposed scheme with infinite buffers was left open. In this paper, we address this fundamental stability question. The system stability under the usual subcritical load assumption is not automatic. Moreover, the stability may not even hold for all

N

. The key challenge stems from the fact that the process lacks monotonicity, which has been the powerful primary tool for establishing stability in load balancing models. We develop a novel method to prove that the subcritically loaded system is stable for large enough

N

, and establish convergence of steady-state distributions to the optimal one, as

N \to \infty

. The method goes beyond the state of the art techniques -- it uses an induction-based idea and a "weak monotonicity" property of the model; this technique is of independent interest and may have broader applicability.Comment: 30 page

arXiv.org e-Print Archive

Pure OAI Repository

Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

Author: Andres-Perez Esther
Caloto Aitor
Widhalm Markus
Publication venue
Publication date: 01/01/2009
Field of study

Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented

Institute of Transport Research:Publications