3,323 research outputs found

    GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

    Full text link
    X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical concern in most image guided radiation therapy procedures. It is the goal of this paper to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. For this purpose, we have developed an iterative tight frame (TF) based CBCT reconstruction algorithm. A condition that a real CBCT image has a sparse representation under a TF basis is imposed in the iteration process as regularization to the solution. To speed up the computation, a multi-grid method is employed. Our GPU implementation has achieved high computational efficiency and a CBCT image of resolution 512\times512\times70 can be reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom and a physical Catphan phantom. It is found that our TF-based algorithm is able to reconstrct CBCT in the context of undersampling and low mAs levels. We have also quantitatively analyzed the reconstructed CBCT image quality in terms of modulation-transfer-function and contrast-to-noise ratio under various scanning conditions. The results confirm the high CBCT image quality obtained from our TF algorithm. Moreover, our algorithm has also been validated in a real clinical context using a head-and-neck patient case. Comparisons of the developed TF algorithm and the current state-of-the-art TV algorithm have also been made in various cases studied in terms of reconstructed image quality and computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio

    GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

    Full text link
    High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, on-going effort by the graph analytics community to propose building blocks based on sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linear-algebra-based approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Load-balancing is an important feature for balancing work amongst parallel workers. We describe the important load-balancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first high-performance linear algebra-based graph framework on NVIDIA GPUs that is open-source. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and shared-memory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.Comment: 50 pages, 14 figures, 14 table

    High performance interior point methods for three-dimensional finite element limit analysis

    Get PDF
    The ability to obtain rigorous upper and lower bounds on collapse loads of various structures makes finite element limit analysis an attractive design tool. The increasingly high cost of computing those bounds, however, has limited its application on problems in three dimensions. This work reports on a high-performance homogeneous self-dual primal-dual interior point method developed for three-dimensional finite element limit analysis. This implementation achieves convergence times over 4.5× faster than the leading commercial solver across a set of three-dimensional finite element limit analysis test problems, making investigation of three dimensional limit loads viable. A comparison between a range of iterative linear solvers and direct methods used to determine the search direction is also provided, demonstrating the superiority of direct methods for this application. The components of the interior point solver considered include the elimination of and options for handling remaining free variables, multifrontal and supernodal Cholesky comparison for computing the search direction, differences between approximate minimum degree [1] and nested dissection [13] orderings, dealing with dense columns and fixed variables, and accelerating the linear system solver through parallelization. Each of these areas resulted in an improvement on at least one of the problems in the test set, with many achieving gains across the whole set. The serial implementation achieved runtime performance 1.7× faster than the commercial solver Mosek [5]. Compared with the parallel version of Mosek, the use of parallel BLAS routines in the supernodal solver saw a 1.9× speedup, and with a modified version of the GPU-enabled CHOLMOD [11] and a single NVIDIA Tesla K20c this speedup increased to 4.65×

    Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

    Full text link
    In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
    corecore