967 research outputs found

    BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images

    Full text link
    In cryo-electron microscopy (EM), molecular structures are determined from large numbers of projection images of individual particles. To harness the full power of this single-molecule information, we use the Bayesian inference of EM (BioEM) formalism. By ranking structural models using posterior probabilities calculated for individual images, BioEM in principle addresses the challenge of working with highly dynamic or heterogeneous systems not easily handled in traditional EM reconstruction. However, the calculation of these posteriors for large numbers of particles and models is computationally demanding. Here we present highly parallelized, GPU-accelerated computer software that performs this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI parallelization combined with both CPU and GPU computing. The resulting BioEM software scales nearly ideally both on pure CPU and on CPU+GPU architectures, thus enabling Bayesian analysis of tens of thousands of images in a reasonable time. The general mathematical framework and robust algorithms are not limited to cryo-electron microscopy but can be generalized for electron tomography and other imaging experiments

    Efficient algorithms for the fast computation of space charge effects caused by charged particles in particle accelerators

    Get PDF
    In this dissertation, a Poisson solver is improved with three parts: the efficient integrated Green's function; the discrete cosine transform of the efficient integrated Green's function values; the implicitly zero-padded fast Fourier transform for charge density. In addition, the high performance computing technology is utilized for the further improvement of efficiency, such as: OpenMP API, OpenMP+CUDA, MPI, and MPI+OpenMP parallelizations. The examples and simulation results are matched with the results of the commonly used Poisson solver to demonstrate the accuracy performance

    GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

    Full text link
    X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical concern in most image guided radiation therapy procedures. It is the goal of this paper to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. For this purpose, we have developed an iterative tight frame (TF) based CBCT reconstruction algorithm. A condition that a real CBCT image has a sparse representation under a TF basis is imposed in the iteration process as regularization to the solution. To speed up the computation, a multi-grid method is employed. Our GPU implementation has achieved high computational efficiency and a CBCT image of resolution 512\times512\times70 can be reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom and a physical Catphan phantom. It is found that our TF-based algorithm is able to reconstrct CBCT in the context of undersampling and low mAs levels. We have also quantitatively analyzed the reconstructed CBCT image quality in terms of modulation-transfer-function and contrast-to-noise ratio under various scanning conditions. The results confirm the high CBCT image quality obtained from our TF algorithm. Moreover, our algorithm has also been validated in a real clinical context using a head-and-neck patient case. Comparisons of the developed TF algorithm and the current state-of-the-art TV algorithm have also been made in various cases studied in terms of reconstructed image quality and computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio

    A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes

    Get PDF
    pre-printStencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. Previous work on stencil computations has focused on structured meshes, while giving little attention to unstructured meshes. Performing stencil operations over an unstructured mesh requires sampling of heterogeneous elements which often leads to inefficient memory access patterns and limits data locality/reuse. In this paper, we present an efficient method for performing stencil computations over unstructured meshes which increases data-locality and cache efficiency, and a scalable approach for stencil tiling and concurrent execution. We provide experimental results in the context of post-processing of dG solutions that demonstrate the effectiveness of our approach

    The State of the Art in Flow Visualization: Dense and Texture-Based Techniques

    Get PDF
    Flow visualization has been a very attractive component of scientific visualization research for a long time. Usually very large multivariate datasets require processing. These datasets often consist of a large number of sample locations and several time steps. The steadily increasing performance of computers has recently become a driving factor for a reemergence in flow visualization research, especially in texture-based techniques. In this paper, dense, texture-based flow visualization techniques are discussed. This class of techniques attempts to provide a complete, dense representation of the flow field with high spatio-temporal coherency. An attempt of categorizing closely related solutions is incorporated and presented. Fundamentals are shortly addressed as well as advantages and disadvantages of the methods. Categories and Subject Descriptors (according to ACM CCS): I.3 [Computer Graphics]: visualization, flow visualization, computational flow visualizatio

    Three-Dimensional Photoacoustic Computed Tomography: Imaging Models and Reconstruction Algorithms

    Get PDF
    Photoacoustic computed tomography: PACT), also known as optoacoustic tomography, is a rapidly emerging imaging modality that holds great promise for a wide range of biomedical imaging applications. Much effort has been devoted to the investigation of imaging physics and the optimization of experimental designs. Meanwhile, a variety of image reconstruction algorithms have been developed for the purpose of computed tomography. Most of these algorithms assume full knowledge of the acoustic pressure function on a measurement surface that either encloses the object or extends to infinity, which poses many difficulties for practical applications. To overcome these limitations, iterative image reconstruction algorithms have been actively investigated. However, little work has been conducted on imaging models that incorporate the characteristics of data acquisition systems. Moreover, when applying to experimental data, most studies simplify the inherent three-dimensional wave propagation as two-dimensional imaging models by introducing heuristic assumptions on the transducer responses and/or the object structures. One important reason is because three-dimensional image reconstruction is computationally burdensome. The inaccurate imaging models severely limit the performance of iterative image reconstruction algorithms in practice. In the dissertation, we propose a framework to construct imaging models that incorporate the characteristics of ultrasonic transducers. Based on the imaging models, we systematically investigate various iterative image reconstruction algorithms, including advanced algorithms that employ total variation-norm regularization. In order to accelerate three-dimensional image reconstruction, we develop parallel implementations on graphic processing units. In addition, we derive a fast Fourier-transform based analytical image reconstruction formula. By use of iterative image reconstruction algorithms based on the proposed imaging models, PACT imaging scanners can have a compact size while maintaining high spatial resolution. The research demonstrates, for the first time, the feasibility and advantages of iterative image reconstruction algorithms in three-dimensional PACT

    Doctor of Philosophy

    Get PDF
    dissertationMemory access irregularities are a major bottleneck for bandwidth limited problems on Graphics Processing Unit (GPU) architectures. GPU memory systems are designed to allow consecutive memory accesses to be coalesced into a single memory access. Noncontiguous accesses within a parallel group of threads working in lock step may cause serialized memory transfers. Irregular algorithms may have data-dependent control flow and memory access, which requires runtime information to be evaluated. Compile time methods for evaluating parallelism, such as static dependence graphs, are not capable of evaluating irregular algorithms. The goals of this dissertation are to study irregularities within the context of unstructured mesh and sparse matrix problems, analyze the impact of vectorization widths on irregularities, and present data-centric methods that improve control flow and memory access irregularity within those contexts. Reordering associative operations has often been exploited for performance gains in parallel algorithms. This dissertation presents a method for associative reordering of stencil computations over unstructured meshes that increases data reuse through caching. This novel parallelization scheme offers considerable speedups over standard methods. Vectorization widths can have significant impact on performance in vectorized computations. Although the hardware vector width is generally fixed, the logical vector width used within a computation can range from one up to the width of the computation. Significant performance differences can occur due to thread scheduling and resource limitations. This dissertation analyzes the impact of vectorization widths on dense numerical computations such as 3D dG postprocessing. It is difficult to efficiently perform dynamic updates on traditional sparse matrix formats. Explicitly controlling memory segmentation allows for in-place dynamic updates in sparse matrices. Dynamically updating the matrix without rebuilding or sorting greatly improves processing time and overall throughput. This dissertation presents a new sparse matrix format, dynamic compressed sparse row (DCSR), which allows for dynamic streaming updates to a sparse matrix. A new method for parallel sparse matrix-matrix multiplication (SpMM) that uses dynamic updates is also presented

    Doctor of Philosophy

    Get PDF
    dissertationVisualizing surfaces is a fundamental technique in computer science and is frequently used across a wide range of fields such as computer graphics, biology, engineering, and scientific visualization. In many cases, visualizing an interface between boundaries can provide meaningful analysis or simplification of complex data. Some examples include physical simulation for animation, multimaterial mesh extraction in biophysiology, flow on airfoils in aeronautics, and integral surfaces. However, the quest for high-quality visualization, coupled with increasingly complex data, comes with a high computational cost. Therefore, new techniques are needed to solve surface visualization problems within a reasonable amount of time while also providing sophisticated visuals that are meaningful to scientists and engineers. In this dissertation, novel techniques are presented to facilitate surface visualization. First, a particle system for mesh extraction is parallelized on the graphics processing unit (GPU) with a red-black update scheme to achieve an order of magnitude speed-up over a central processing unit (CPU) implementation. Next, extending the red-black technique to multiple materials showed inefficiencies on the GPU. Therefore, we borrow the underlying data structure from the closest point method, the closest point embedding, and the particle system solver is switched to hierarchical octree-based approach on the GPU. Third, to demonstrate that the closest point embedding is a fast, flexible data structure for surface particles, it is adapted to unsteady surface flow visualization at near-interactive speeds. Finally, the closest point embedding is a three-dimensional dense structure that does not scale well. Therefore, we introduce a closest point sparse octree that allows the closest point embedding to scale to higher resolution. Further, we demonstrate unsteady line integral convolution using the closest point method
    • …