451 research outputs found
ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere
We describe a hybrid Fourier/direct space convolution algorithm for compact
radial (azimuthally symmetric) kernels on the sphere. For high resolution maps
covering a large fraction of the sky, our implementation takes advantage of the
inexpensive massive parallelism afforded by consumer graphics processing units
(GPUs). Applications involve modeling of instrumental beam shapes in terms of
compact kernels, computation of fine-scale wavelet transformations, and optimal
filtering for the detection of point sources. Our algorithm works for any
pixelization where pixels are grouped into isolatitude rings. Even for kernels
that are not bandwidth limited, ringing features are completely absent on an
ECP grid. We demonstrate that they can be highly suppressed on the popular
HEALPix pixelization, for which we develop a freely available implementation of
the algorithm. As an example application, we show that running on a high-end
consumer graphics card our method speeds up beam convolution for simulations of
a characteristic Planck high frequency instrument channel by two orders of
magnitude compared to the commonly used HEALPix implementation on one CPU core
while maintaining at typical a fractional RMS accuracy of about 1 part in 10^5.Comment: 10 pages, 6 figures. Submitted to Astronomy and Astrophysics.
Replaced to match published version. Code can be downloaded at
https://github.com/elsner/arkco
Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization
With the bottom-line goal of increasing the
throughput of a GPU-accelerated JPEG 2000 encoder, this paper
evaluates whether the post-compression rate control and
packetization routines should be carried out on the CPU or on
the GPU. Three co-processing models that differ in how the
workload is split among the CPU and GPU are introduced. Both
routines are discussed and algorithms for executing them in
parallel are presented. Experimental results for compressing a
detail-rich UHD sequence to 4 bits/sample indicate speed-ups of
200x for the rate control and 100x for the packetization
compared to the single-threaded implementation in the
commercial Kakadu library. These two routines executed on the
CPU take 4x as long as all remaining coding steps on the GPU
and therefore present a bottleneck. Even if the CPU bottleneck
could be avoided with multi-threading, it is still beneficial to
execute all coding steps on the GPU as this minimizes the
required device-to-host transfer and thereby speeds up the
critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to
22.4 fps for 0.16 bits/sample
Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions
We present a general method for accelerating by more than an order of
magnitude the convolution of pixelated functions on the sphere with a
radially-symmetric kernel. Our method splits the kernel into a compact
real-space component and a compact spherical harmonic space component. These
components can then be convolved in parallel using an inexpensive commodity GPU
and a CPU. We provide models for the computational cost of both real-space and
Fourier space convolutions and an estimate for the approximation error. Using
these models we can determine the optimum split that minimizes the wall clock
time for the convolution while satisfying the desired error bounds. We apply
this technique to the problem of simulating a cosmic microwave background (CMB)
anisotropy sky map at the resolution typical of the high resolution maps
produced by the Planck mission. For the main Planck CMB science channels we
achieve a speedup of over a factor of ten, assuming an acceptable fractional
rms error of order 1.e-5 in the power spectrum of the output map.Comment: 9 pages, 11 figures, 1 table, accepted by Astronomy & Computing w/
minor revisions. arXiv admin note: substantial text overlap with
arXiv:1211.355
Mengenal pasti tahap pengetahuan pelajar tahun akhir Ijazah Sarjana Muda Kejuruteraan di KUiTTHO dalam bidang keusahawanan dari aspek pengurusan modal
Malaysia ialah sebuah negara membangun di dunia. Dalam proses pembangunan
ini, hasrat negara untuk melahirkan bakal usahawan beijaya tidak boleh dipandang
ringan. Oleh itu, pengetahuan dalam bidang keusahawanan perlu diberi perhatian
dengan sewajarnya; antara aspek utama dalam keusahawanan ialah modal. Pengurusan
modal yang tidak cekap menjadi punca utama kegagalan usahawan. Menyedari hakikat
ini, kajian berkaitan Pengurusan Modal dijalankan ke atas 100 orang pelajar Tahun
Akhir Kejuruteraan di KUiTTHO. Sampel ini dipilih kerana pelajar-pelajar ini akan
menempuhi alam pekeijaan di mana mereka boleh memilih keusahawanan sebagai satu
keijaya. Walau pun mereka bukanlah pelajar dari jurusan perniagaan, namun mereka
mempunyai kemahiran dalam mereka cipta produk yang boleh dikomersialkan. Hasil
dapatan kajian membuktikan bahawa pelajar-pelajar ini berminat dalam bidang
keusahawanan namun masih kurang pengetahuan tentang pengurusan modal
terutamanya dalam menentukan modal permulaan, pengurusan modal keija dan caracara
menentukan pembiayaan kewangan menggunakan kaedah jualan harian. Oleh itu,
satu garis panduan Pengurusan Modal dibina untuk memberi pendedahan kepada
mereka
GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000
Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
Efficient reconfigurable architectures for 3D medical image compression
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities,
such as magnetic resonance imaging (MRI), computed tomography (CT), positron
emission tomography (PET), and ultrasound (US) have generated a massive amount
of volumetric data. These have provided an impetus to the development of other
applications, in particular telemedicine and teleradiology. In these fields, medical
image compression is important since both efficient storage and transmission of data
through high-bandwidth digital communication lines are of crucial importance.
Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow
for quick upgradeability with real-time applications. Moreover, in order to obtain
efficient solutions for large medical volumes data, an efficient implementation of
these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system
building block in the construction of high-performance systems at an economical price.
Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent
advantages such as massive parallelism capabilities, multimillion gate counts, and
special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are
optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits
promising results in reducing Gaussian white noise in medical images. In terms of
hardware implementation, promising trade-offs on maximum frequency, throughput
and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC)
has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete
wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that
3-D IT demonstrates better computational complexity than the 3-D DWT, whilst
the 3-D DWT with LS exhibits a lossless compression that is significantly useful for
medical image compression. Additionally, an architecture of CAVLC that is capable
of compressing high-definition (HD) images in real-time without any buffer between
the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the
slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE),
Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci
- …