19 research outputs found
Asynchronous parallel successive overrelaxation for the symmetric linear complementarity problem
Morphology of fruits, seeds, seedlings and saplings of three species of Macrolobium Schreb. (Leguminosae, Caesalpinioideae) in the Brazilian Amazon floodplain
A class of stable difference schemes for linear elliptic PDEs and their asynchronous parallel computation
Super-Scalable Algorithms for Computing on 100,000 Processors
In the next five years, the number of processors in high-end systems for scientific computing is expected to rise to tens and even hundreds of thousands. For example, the IBM Blue Gene/L can have up to 128,000 processors and the delivery of the first system is scheduled for 2005. Existing deficiencies in scalability and fault-tolerance of scientific applications need to be addressed soon. If the number of processors grows by a magnitude and e#ciency drops by a magnitude, the overall e#ective computing performance stays the same. Furthermore, the mean time to interrupt of high-end computer systems decreases with scale and complexity. In a 100,000-processor system, failures may occur every couple of minutes and traditional checkpointing may no longer be feasible. With this paper, we summarize our recent research in super-scalable algorithms for computing on 100,000 processors. We introduce the algorithm properties of scale invariance and natural fault tolerance, and discuss how they can be applied to two di#erent classes of algorithms. We also describe a super-scalable diskless checkpointing algorithm for problems that can't be transformed into a super-scalable variant, or where other solutions are more e#cient. Finally, a 100,000-processor simulator is presented as a platform for testing and experimentation
ASYNC Loop Constructs for Relaxed Synchronization
Abstract. Conventional iterative solvers for partial differential equations impose strict data dependencies between each solution point and its neighbors. When implemented in OpenMP, they repeatedly execute barrier synchronization in each iterative step to ensure that data dependencies are strictly satisfied. We propose new parallel annotations to support an asynchronous computation model for iterative solvers. ASYNC DO annotates a loop whose iterations can be executed by multiple processors, as OpenMP parallel DO loops in Fortran (or parallel for loops in C), but it does not require barrier synchronization. ASYNC REDUCTION annotates a loop which performs parallel reduction operations but uses a relaxed tree barrier, instead of the conventional barrier, to synchronize the processors. When a number of ASYNC DO and ASYNC REDUCTION loops are embedded in an iterative loop annotated by ASYNC REGION, the iterative solver allows each data point to be updated using the value of its neighbors which may not be the most current, instead of forcing the processor to wait for the new value to arrive. We discuss how the compiler can transform an ASYNC REGION (with embedded ASYNC DO and ASYNC REDUCTION) into an OpenMP parallel section with relaxed synchronization. We present experimental results to show the benefit of using ASYNC loop constructs in 2D and 3D multigrid methods as well as an SOR-preconditioned conjugate gradient linear system solver. 2
Area-time optimal VLSI circuits for convolution
SIGLEAvailable from CEN Saclay, Service de Documentation, 91191 Gif-sur-Yvette Cedex (France) / INIST-CNRS - Institut de l'Information Scientifique et TechniqueFRFranc