5,087 research outputs found

    Parallel Computation of Finite Element Navier-Stokes codes using MUMPS Solver

    Get PDF
    The study deals with the parallelization of 2D and 3D finite element based Navier-Stokes codes using direct solvers. Development of sparse direct solvers using multifrontal solvers has significantly reduced the computational time of direct solution methods. Although limited by its stringent memory requirements, multifrontal solvers can be computationally efficient. First the performance of MUltifrontal Massively Parallel Solver (MUMPS) is evaluated for both 2D and 3D codes in terms of memory requirements and CPU times. The scalability of both Newton and modified Newton algorithms is tested

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    GeNN: a code generation framework for accelerated brain simulations

    Get PDF
    Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/

    Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

    Get PDF
    The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and StarPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its native internal scheduler, PaRSEC, and StarPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014

    Computational methods and software systems for dynamics and control of large space structures

    Get PDF
    Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers

    An assessment of the connection machine

    Get PDF
    The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered as well as important issues in the architecture and programming environment of connection machines in general. These are contrasted to the same issues in Multiple Instruction/Multiple Data (MIMD) microprocessors and multicomputers

    A massively parallel multi-level approach to a domain decomposition method for the optical flow estimation with varying illumination

    Get PDF
    We consider a variational method to solve the optical flow problem with varying illumination. We apply an adaptive control of the regularization parameter which allows us to preserve the edges and fine features of the computed flow. To reduce the complexity of the estimation for high resolution images and the time of computations, we implement a multi-level parallel approach based on the domain decomposition with the Schwarz overlapping method. The second level of parallelism uses the massively parallel solver MUMPS. We perform some numerical simulations to show the efficiency of our approach and to validate it on classical and real-world image sequences

    Domain Decomposition Based High Performance Parallel Computing\ud

    Get PDF
    The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested
    • …
    corecore