86,351 research outputs found

    Multi-GPU aggregation-based AMG preconditioner for iterative linear solvers

    Full text link
    We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting NVIDIA Graphics Processing Unit (GPU) accelerators. The work extends our previous efforts in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the single GPU kernels. Strong and weak scalability results on well-known benchmark test cases of the new version of the library are discussed. Comparisons with the Nvidia AmgX solution show an improvement of up to 2.0x in the solve phase

    A GPU-based, three-dimensional level set solver with curvature flow

    Get PDF
    technical reportLevel set methods are a powerful tool for implicitly representing deformable surfaces. Since their inception, these techniques have been used to solve prob- lems in fields as varied as computer vision, scientific visualization, computer graphics and computational physics. With the power and flexibility of this approach; however, comes a large computational burden. In the level set ap- proach, surface motion is computed via a partial differential equation (PDE) framework. One possibility for accelerating level-set based applications is to map the solver kernel onto a commodity graphics processing unit (GPU). GPUs are parallel, vector computers whose power is currently increasing at a faster rate than that of CPUs. in this work, we demonstrate a GPU-based, three- dimensional level set solver that is capable of computing curvature flow as well as other speed terms. Results are shown for this solver segmenting the brain surface from an MRI data set

    PC-CUBE: A Personal Computer Based Hypercube

    Get PDF
    PC-CUBE is an ensemble of IBM PCs or close compatibles connected in the hypercube topology with ordinary computer cables. Communication occurs at the rate of 115.2 K-band via the RS-232 serial links. Available for PC-CUBE is the Crystalline Operating System III (CrOS III), Mercury Operating System, CUBIX and PLOTIX which are parallel I/O and graphics libraries. A CrOS performance monitor was developed to facilitate the measurement of communication and computation time of a program and their effects on performance. Also available are CXLISP, a parallel version of the XLISP interpreter; GRAFIX, some graphics routines for the EGA and CGA; and a general execution profiler for determining execution time spent by program subroutines. PC-CUBE provides a programming environment similar to all hypercube systems running CrOS III, Mercury and CUBIX. In addition, every node (personal computer) has its own graphics display monitor and storage devices. These allow data to be displayed or stored at every processor, which has much instructional value and enables easier debugging of applications. Some application programs which are taken from the book Solving Problems on Concurrent Processors (Fox 88) were implemented with graphics enhancement on PC-CUBE. The applications range from solving the Mandelbrot set, Laplace equation, wave equation, long range force interaction, to WaTor, an ecological simulation

    Overlapping Multi-Processing and Graphics Hardware Acceleration: Performance Evaluation

    Get PDF
    Colloque avec actes et comité de lecture.Recently, multi-processing has been shown to deliver good performance in rendering. However, in some applications, processors spend too much time executing tasks that could be more efficiently done through intensive use of new graphics hardware. We present in this paper a novel solution combining multi-processing and advanced graphics hardware, where graphics pipelines are used both for classical visualization tasks and to advantageously perform geometric calculations while remaining computations are handled by multi-processors. The experiment is based on an implementation of a new parallel wavelet radiosity algorithm. The application is executed on the SGI Origin2000 connected to the SGI InfiniteReality2 rendering pipeline. A performance evaluation is presented. Keeping in mind that the approach can benefit all available workstations and super-computers, from small scale (2 processors and 1 graphics pipeline) to large scale (pp processors and nn graphics pipelines), we highlight some important bottlenecks that impede performance. However, our results show that this approach could be a promising avenue for scientific and engineering simulation and visualization applications that need intensive geometric calculations

    Scalable Interactive Volume Rendering Using Off-the-shelf Components

    Get PDF
    This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators
    corecore