19 research outputs found

    Experiments on Asynchronous Partial Gauss-Seidel Method

    No full text

    Super-Scalable Algorithms for Computing on 100,000 Processors

    No full text
    In the next five years, the number of processors in high-end systems for scientific computing is expected to rise to tens and even hundreds of thousands. For example, the IBM Blue Gene/L can have up to 128,000 processors and the delivery of the first system is scheduled for 2005. Existing deficiencies in scalability and fault-tolerance of scientific applications need to be addressed soon. If the number of processors grows by a magnitude and e#ciency drops by a magnitude, the overall e#ective computing performance stays the same. Furthermore, the mean time to interrupt of high-end computer systems decreases with scale and complexity. In a 100,000-processor system, failures may occur every couple of minutes and traditional checkpointing may no longer be feasible. With this paper, we summarize our recent research in super-scalable algorithms for computing on 100,000 processors. We introduce the algorithm properties of scale invariance and natural fault tolerance, and discuss how they can be applied to two di#erent classes of algorithms. We also describe a super-scalable diskless checkpointing algorithm for problems that can't be transformed into a super-scalable variant, or where other solutions are more e#cient. Finally, a 100,000-processor simulator is presented as a platform for testing and experimentation

    Optimal Search on Some Game Trees

    No full text

    PS-Algorithms and Stochastic Computations

    No full text

    ASYNC Loop Constructs for Relaxed Synchronization

    No full text
    Abstract. Conventional iterative solvers for partial differential equations impose strict data dependencies between each solution point and its neighbors. When implemented in OpenMP, they repeatedly execute barrier synchronization in each iterative step to ensure that data dependencies are strictly satisfied. We propose new parallel annotations to support an asynchronous computation model for iterative solvers. ASYNC DO annotates a loop whose iterations can be executed by multiple processors, as OpenMP parallel DO loops in Fortran (or parallel for loops in C), but it does not require barrier synchronization. ASYNC REDUCTION annotates a loop which performs parallel reduction operations but uses a relaxed tree barrier, instead of the conventional barrier, to synchronize the processors. When a number of ASYNC DO and ASYNC REDUCTION loops are embedded in an iterative loop annotated by ASYNC REGION, the iterative solver allows each data point to be updated using the value of its neighbors which may not be the most current, instead of forcing the processor to wait for the new value to arrive. We discuss how the compiler can transform an ASYNC REGION (with embedded ASYNC DO and ASYNC REDUCTION) into an OpenMP parallel section with relaxed synchronization. We present experimental results to show the benefit of using ASYNC loop constructs in 2D and 3D multigrid methods as well as an SOR-preconditioned conjugate gradient linear system solver. 2

    Area-time optimal VLSI circuits for convolution

    No full text
    SIGLEAvailable from CEN Saclay, Service de Documentation, 91191 Gif-sur-Yvette Cedex (France) / INIST-CNRS - Institut de l'Information Scientifique et TechniqueFRFranc

    Energy-Privacy Trade-Offs in VLSI Computations

    No full text
    corecore