2,394 research outputs found

    A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

    Get PDF
    We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandybridge 8-core CPU by a factor of 3.4

    BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images

    Full text link
    In cryo-electron microscopy (EM), molecular structures are determined from large numbers of projection images of individual particles. To harness the full power of this single-molecule information, we use the Bayesian inference of EM (BioEM) formalism. By ranking structural models using posterior probabilities calculated for individual images, BioEM in principle addresses the challenge of working with highly dynamic or heterogeneous systems not easily handled in traditional EM reconstruction. However, the calculation of these posteriors for large numbers of particles and models is computationally demanding. Here we present highly parallelized, GPU-accelerated computer software that performs this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI parallelization combined with both CPU and GPU computing. The resulting BioEM software scales nearly ideally both on pure CPU and on CPU+GPU architectures, thus enabling Bayesian analysis of tens of thousands of images in a reasonable time. The general mathematical framework and robust algorithms are not limited to cryo-electron microscopy but can be generalized for electron tomography and other imaging experiments

    Master slave en-face OCT/SLO

    Get PDF
    Master Slave optical coherence tomography (MS-OCT) is an OCT method that does not require resampling of data and can be used to deliver en-face images from several depths simultaneously. As the MS-OCT method requires important computational resources, the number of multiple depth en-face images that can be produced in real-time is limited. Here, we demonstrate progress in taking advantage of the parallel processing feature of the MS-OCT technology. Harnessing the capabilities of graphics processing units (GPU)s, information from 384 depth positions is acquired in one raster with real time display of up to 40 en-face OCT images. These exhibit comparable resolution and sensitivity to the images produced using the conventional Fourier domain based method. The GPU facilitates versatile real time selection of parameters, such as the depth positions of the 40 images out of the set of 384 depth locations, as well as their axial resolution. In each updated displayed frame, in parallel with the 40 en-face OCT images, a scanning laser ophthalmoscopy (SLO) lookalike image is presented together with two B-scan OCT images oriented along rectangular directions. The thickness of the SLO lookalike image is dynamically determined by the choice of number of en-face OCT images displayed in the frame and the choice of differential axial distance between them

    Interactive ray shading of FRep objects

    Get PDF
    In this paper we present a method for interactive rendering general procedurally defined functionally represented (FRep) objects using the acceleration with graphics hardware, namely Graphics Processing Units (GPU). We obtain interactive rates by using GPU acceleration for all computations in rendering algorithm, such as ray-surface intersection, function evaluation and normal computations. We compute primary rays as well as secondary rays for shadows, reflection and refraction for obtaining high quality of the output visualization and further extension to ray-tracing of FRep objects. The algorithm is well-suited for modern GPUs and provides acceptable interactive rates with good quality of the results. A wide range of objects can be rendered including traditional skeletal implicit surfaces, constructive solids, and purely procedural objects such as 3D fractals

    Fast Reliable Ray-tracing of Procedurally Defined Implicit Surfaces Using Revised Affine Arithmetic

    Get PDF
    Fast and reliable rendering of implicit surfaces is an important area in the field of implicit modelling. Direct rendering, namely ray-tracing, is shown to be a suitable technique for obtaining good-quality visualisations of implicit surfaces. We present a technique for reliable ray-tracing of arbitrary procedurally defined implicit surfaces by using a modification of Affine Arithmetic called Revised Affine Arithmetic. A wide range of procedurally defined implicit objects can be rendered using this technique including polynomial surfaces, constructive solids, pseudo-random objects, procedurally defined microstructures, and others. We compare our technique with other reliable techniques based on Interval and Affine Arithmetic to show that our technique provides the fastest, while still reliable, ray-surface intersections and ray-tracing. We also suggest possible modifications for the GPU implementation of this technique for real-time rendering of relatively simple implicit models and for near real-time for complex implicit models

    Accelerated High-Resolution Photoacoustic Tomography via Compressed Sensing

    Get PDF
    Current 3D photoacoustic tomography (PAT) systems offer either high image quality or high frame rates but are not able to deliver high spatial and temporal resolution simultaneously, which limits their ability to image dynamic processes in living tissue. A particular example is the planar Fabry-Perot (FP) scanner, which yields high-resolution images but takes several minutes to sequentially map the photoacoustic field on the sensor plane, point-by-point. However, as the spatio-temporal complexity of many absorbing tissue structures is rather low, the data recorded in such a conventional, regularly sampled fashion is often highly redundant. We demonstrate that combining variational image reconstruction methods using spatial sparsity constraints with the development of novel PAT acquisition systems capable of sub-sampling the acoustic wave field can dramatically increase the acquisition speed while maintaining a good spatial resolution: First, we describe and model two general spatial sub-sampling schemes. Then, we discuss how to implement them using the FP scanner and demonstrate the potential of these novel compressed sensing PAT devices through simulated data from a realistic numerical phantom and through measured data from a dynamic experimental phantom as well as from in-vivo experiments. Our results show that images with good spatial resolution and contrast can be obtained from highly sub-sampled PAT data if variational image reconstruction methods that describe the tissues structures with suitable sparsity-constraints are used. In particular, we examine the use of total variation regularization enhanced by Bregman iterations. These novel reconstruction strategies offer new opportunities to dramatically increase the acquisition speed of PAT scanners that employ point-by-point sequential scanning as well as reducing the channel count of parallelized schemes that use detector arrays.Comment: submitted to "Physics in Medicine and Biology
    corecore