3,323 research outputs found
GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization
X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical
concern in most image guided radiation therapy procedures. It is the goal of
this paper to develop a fast GPU-based algorithm to reconstruct high quality
CBCT images from undersampled and noisy projection data so as to lower the
imaging dose. For this purpose, we have developed an iterative tight frame (TF)
based CBCT reconstruction algorithm. A condition that a real CBCT image has a
sparse representation under a TF basis is imposed in the iteration process as
regularization to the solution. To speed up the computation, a multi-grid
method is employed. Our GPU implementation has achieved high computational
efficiency and a CBCT image of resolution 512\times512\times70 can be
reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom
and a physical Catphan phantom. It is found that our TF-based algorithm is able
to reconstrct CBCT in the context of undersampling and low mAs levels. We have
also quantitatively analyzed the reconstructed CBCT image quality in terms of
modulation-transfer-function and contrast-to-noise ratio under various scanning
conditions. The results confirm the high CBCT image quality obtained from our
TF algorithm. Moreover, our algorithm has also been validated in a real
clinical context using a head-and-neck patient case. Comparisons of the
developed TF algorithm and the current state-of-the-art TV algorithm have also
been made in various cases studied in terms of reconstructed image quality and
computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
High performance interior point methods for three-dimensional finite element limit analysis
The ability to obtain rigorous upper and lower bounds on collapse loads of various structures makes finite element limit analysis an attractive design tool. The increasingly high cost of computing those bounds, however, has limited its application on problems in three dimensions. This work reports on a high-performance homogeneous self-dual primal-dual interior point method developed for three-dimensional finite element limit analysis. This implementation achieves convergence times over 4.5× faster than the leading commercial solver across a set of three-dimensional finite element limit analysis test problems, making investigation of three dimensional limit loads viable. A comparison between a range of iterative linear solvers and direct methods used to determine the search direction is also provided, demonstrating the superiority of direct methods for this application. The components of the interior point solver considered include the elimination of and options for handling remaining free variables, multifrontal and supernodal Cholesky comparison for computing the search direction, differences between approximate minimum degree [1] and nested dissection [13] orderings, dealing with dense columns and fixed variables, and accelerating the linear system solver through parallelization. Each of these areas resulted in an improvement on at least one of the problems in the test set, with many achieving gains across the whole set. The serial implementation achieved runtime performance 1.7× faster than the commercial solver Mosek [5]. Compared with the parallel version of Mosek, the use of parallel BLAS routines in the supernodal solver saw a 1.9× speedup, and with a modified version of the GPU-enabled CHOLMOD [11] and a single NVIDIA Tesla K20c this speedup increased to 4.65×
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
- …