Search CORE

6,827 research outputs found

Theano: new features and speed improvements

Author: Bastien Frédéric
Bengio Yoshua
Bergeron Arnaud
Bergstra James
Bouchard Nicolas
Goodfellow Ian
Lamblin Pascal
Pascanu Razvan
Warde-Farley David
Publication venue
Publication date: 23/11/2012
Field of study

Theano is a linear algebra compiler that optimizes a user's symbolically-specified mathematical computations to produce efficient low-level implementations. In this paper, we present new features and efficiency improvements to Theano, and benchmarks demonstrating Theano's performance relative to Torch7, a recently introduced machine learning library, and to RNNLM, a C++ library targeted at recurrent neural networks.Comment: Presented at the Deep Learning Workshop, NIPS 201

arXiv.org e-Print Archive

CiteSeerX

Gunrock: GPU Graph Analytics

Author: Davidson Andrew
Liu Weitang
Osama Muhammad
Owens John D.
Pan Yuechao
Riffel Andy T.
Wang Leyuan
Wang Yangzihao
Wu Yuduo
Yang Carl
Yuan Chenshan
Publication venue
Publication date: 04/01/2017
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

arXiv.org e-Print Archive

eScholarship - University of California

FigShare

Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

Author: Cooper Lee
Kong Jun
Kurc Tahsin
Pan Tony
Saltz Joel
Teodoro George
Publication venue
Publication date: 14/09/2012
Field of study

In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.Comment: 37 pages, 16 figure

arXiv.org e-Print Archive

CiteSeerX

Gunrock: A High-Performance Graph Processing Library on the GPU

Author: Cederman D.
Goel A.
Gonzalez J. E.
Gregor D.
Jia Y.
Low Y.
Pande P. R.
Siek J. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/01/2016
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

Author: Bin Dong
Cai J
Cho S
Dong B
Gu X
Gu X
Han G Liang Z You J
Hestenes M R
Jacobs F
Jia X
Li M
Men C
Men C H
Meyer Y
NVIDIA
Sharp G C
Shen Z W
Shen Z W Toh K C Yun S
Sidky E Y
Sidky E Y
Steve B Jiang
Tang J
Xu F
Xun Jia
Yan G R
Yifei Lou
Publication venue: 'IOP Publishing'
Publication date: 05/05/2011
Field of study

X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical concern in most image guided radiation therapy procedures. It is the goal of this paper to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. For this purpose, we have developed an iterative tight frame (TF) based CBCT reconstruction algorithm. A condition that a real CBCT image has a sparse representation under a TF basis is imposed in the iteration process as regularization to the solution. To speed up the computation, a multi-grid method is employed. Our GPU implementation has achieved high computational efficiency and a CBCT image of resolution 512\times512\times70 can be reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom and a physical Catphan phantom. It is found that our TF-based algorithm is able to reconstrct CBCT in the context of undersampling and low mAs levels. We have also quantitatively analyzed the reconstructed CBCT image quality in terms of modulation-transfer-function and contrast-to-noise ratio under various scanning conditions. The results confirm the high CBCT image quality obtained from our TF algorithm. Moreover, our algorithm has also been validated in a real clinical context using a head-and-neck patient case. Comparisons of the developed TF algorithm and the current state-of-the-art TV algorithm have also been made in various cases studied in terms of reconstructed image quality and computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio

arXiv.org e-Print Archive

Crossref

Graphics processing unit accelerating compressed sensing photoacoustic computed tomography with total variation

Author: Bai Yuanyuan
Gao Mingjie
Liu Chengbo
Meng Jing
Si Guangtao
Wang Lihong V.
Publication venue: Optical Society of America
Publication date: 20/01/2020
Field of study

Photoacoustic computed tomography with compressed sensing (CS-PACT) is a commonly used imaging strategy for sparse-sampling PACT. However, it is very time-consuming because of the iterative process involved in the image reconstruction. In this paper, we present a graphics processing unit (GPU)-based parallel computation framework for total-variation-based CS-PACT and adapted into a custom-made PACT system. Specifically, five compute-intensive operators are extracted from the iteration algorithm and are redesigned for parallel performance on a GPU. We achieved an image reconstruction speed 24–31 times faster than the CPU performance. We performed in vivo experiments on human hands to verify the feasibility of our developed method

Caltech Authors

Blur resolved OCT: full-range interferometric synthetic aperture microscopy through dispersion encoding

Author: A.M. Wing
F.E. Pollick
G.J.P. Savelsbergh
J.B.J. Smeets
M. Jeannerod
M. Jeannerod
N. Teasdale
T.G. Fikes
Publication venue: 'The Optical Society'
Publication date: 01/01/2001
Field of study

We present a computational method for full-range interferometric synthetic aperture microscopy (ISAM) under dispersion encoding. With this, one can effectively double the depth range of optical coherence tomography (OCT), whilst dramatically enhancing the spatial resolution away from the focal plane. To this end, we propose a model-based iterative reconstruction (MBIR) method, where ISAM is directly considered in an optimization approach, and we make the discovery that sparsity promoting regularization effectively recovers the full-range signal. Within this work, we adopt an optimal nonuniform discrete fast Fourier transform (NUFFT) implementation of ISAM, which is both fast and numerically stable throughout iterations. We validate our method with several complex samples, scanned with a commercial SD-OCT system with no hardware modification. With this, we both demonstrate full-range ISAM imaging, and significantly outperform combinations of existing methods.Comment: 17 pages, 7 figures. The images have been compressed for arxiv - please follow DOI for full resolutio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Explorer

Enlighten