380 research outputs found
Accelerated coplanar facet radio synthesis imaging
Imaging in radio astronomy entails the Fourier inversion of the relation between the sampled spatial coherence of an electromagnetic field and the intensity of its emitting source. This inversion is normally computed by performing a convolutional resampling step and applying the Inverse Fast Fourier Transform, because this leads to computational savings. Unfortunately, the resulting planar approximation of the sky is only valid over small regions. When imaging over wider fields of view, and in particular using telescope arrays with long non-East-West components, significant distortions are introduced in the computed image. We propose a coplanar faceting algorithm, where the sky is split up into many smaller images. Each of these narrow-field images are further corrected using a phase-correcting tech- nique known as w-projection. This eliminates the projection error along the edges of the facets and ensures approximate coplanarity. The combination of faceting and w-projection approaches alleviates the memory constraints of previous w-projection implementations. We compared the scaling performance of both single and double precision resampled images in both an optimized multi-threaded CPU implementation and a GPU implementation that uses a memory-access- limiting work distribution strategy. We found that such a w-faceting approach scales slightly better than a traditional w-projection approach on GPUs. We also found that double precision resampling on GPUs is about 71% slower than its single precision counterpart, making double precision resampling on GPUs less power efficient than CPU-based double precision resampling. Lastly, we have seen that employing only single precision in the resampling summations produces significant error in continuum images for a MeerKAT-sized array over long observations, especially when employing the large convolution filters necessary to create large images
Radio-Astronomical Imaging on Accelerators
Imaging is considered the most compute-intensive and therefore most challenging part of a radio-astronomical data-processing pipeline. To reach the high dynamic ranges imposed by the high sensitivity and large field of view of the new generation of radio telescopes such as the Square Kilometre Array (SKA), we need to be able to correct for direction-independent effects (DIEs) such as the curvature of the earth as well as for direction-dependent time-varying effects (DDEs) such as those caused by the ionosphere during imaging. The novel Image-Domain gridding (IDG) algorithm was designed to avoid the performance bottlenecks of traditional imaging algorithms. We implement, optimize, and analyze the performance and energy efficiency of IDG on a variety of hardware platforms to find which platform is the best for IDG. We analyze traditional CPUs, as well as several accelerators architectures. IDG alleviates the limitations of traditional imaging algorithms while it enables the advantages of GPU acceleration: better performance at lower power consumption. The hardware-software co-design has resulted in a highly efficient imager. This makes IDG on GPUs an ideal candidate for meeting the computational and energy efficiency constraints of the SKA. IDG has been integrated with a widely-used astronomical imager (WSClean) and is now being used in production by a variety of different radio observatories such as LOFAR and the MWA. It is not only faster and more energy-efficient than its competitors, but it also produces better quality images
Near Memory Acceleration on High Resolution Radio Astronomy Imaging
Modern radio telescopes like the Square Kilometer Array (SKA) will need to
process in real-time exabytes of radio-astronomical signals to construct a
high-resolution map of the sky. Near-Memory Computing (NMC) could alleviate the
performance bottlenecks due to frequent memory accesses in a state-of-the-art
radio-astronomy imaging algorithm. In this paper, we show that a sub-module
performing a two-dimensional fast Fourier transform (2D FFT) is memory bound
using CPI breakdown analysis on IBM Power9. Then, we present an NMC approach on
FPGA for 2D FFT that outperforms a CPU by up to a factor of 120x and performs
comparably to a high-end GPU, while using less bandwidth and memory
Machine vision and the OMV
The orbital Maneuvering Vehicle (OMV) is intended to close with orbiting targets for relocation or servicing. It will be controlled via video signals and thruster activation based upon Earth or space station directives. A human operator is squarely in the middle of the control loop for close work. Without directly addressing future, more autonomous versions of a remote servicer, several techniques that will doubtless be important in a future increase of autonomy also have some direct application to the current situation, particularly in the area of image enhancement and predictive analysis. Several techniques are presentet, and some few have been implemented, which support a machine vision capability proposed to be adequate for detection, recognition, and tracking. Once feasibly implemented, they must then be further modified to operate together in real time. This may be achieved by two courses, the use of an array processor and some initial steps toward data reduction. The methodology or adapting to a vector architecture is discussed in preliminary form, and a highly tentative rationale for data reduction at the front end is also discussed. As a by-product, a working implementation of the most advanced graphic display technique, ray-casting, is described
Simulation of reaction-diffusion processes in three dimensions using CUDA
Numerical solution of reaction-diffusion equations in three dimensions is one
of the most challenging applied mathematical problems. Since these simulations
are very time consuming, any ideas and strategies aiming at the reduction of
CPU time are important topics of research. A general and robust idea is the
parallelization of source codes/programs. Recently, the technological
development of graphics hardware created a possibility to use desktop video
cards to solve numerically intensive problems. We present a powerful parallel
computing framework to solve reaction-diffusion equations numerically using the
Graphics Processing Units (GPUs) with CUDA. Four different reaction-diffusion
problems, (i) diffusion of chemically inert compound, (ii) Turing pattern
formation, (iii) phase separation in the wake of a moving diffusion front and
(iv) air pollution dispersion were solved, and additionally both the Shared
method and the Moving Tiles method were tested. Our results show that parallel
implementation achieves typical acceleration values in the order of 5-40 times
compared to CPU using a single-threaded implementation on a 2.8 GHz desktop
computer.Comment: 8 figures, 5 table
The Murchison Widefield Array: the Square Kilometre Array Precursor at low radio frequencies
The Murchison Widefield Array (MWA) is one of three Square Kilometre Array
Precursor telescopes and is located at the Murchison Radio-astronomy
Observatory in the Murchison Shire of the mid-west of Western Australia, a
location chosen for its extremely low levels of radio frequency interference.
The MWA operates at low radio frequencies, 80-300 MHz, with a processed
bandwidth of 30.72 MHz for both linear polarisations, and consists of 128
aperture arrays (known as tiles) distributed over a ~3 km diameter area. Novel
hybrid hardware/software correlation and a real-time imaging and calibration
systems comprise the MWA signal processing backend. In this paper the as-built
MWA is described both at a system and sub-system level, the expected
performance of the array is presented, and the science goals of the instrument
are summarised.Comment: Submitted to PASA. 11 figures, 2 table
Distributed and parallel sparse convex optimization for radio interferometry with PURIFY
Next generation radio interferometric telescopes are entering an era of big
data with extremely large data sets. While these telescopes can observe the sky
in higher sensitivity and resolution than before, computational challenges in
image reconstruction need to be overcome to realize the potential of
forthcoming telescopes. New methods in sparse image reconstruction and convex
optimization techniques (cf. compressive sensing) have shown to produce higher
fidelity reconstructions of simulations and real observations than traditional
methods. This article presents distributed and parallel algorithms and
implementations to perform sparse image reconstruction, with significant
practical considerations that are important for implementing these algorithms
for Big Data. We benchmark the algorithms presented, showing that they are
considerably faster than their serial equivalents. We then pre-sample gridding
kernels to scale the distributed algorithms to larger data sizes, showing
application times for 1 Gb to 2.4 Tb data sets over 25 to 100 nodes for up to
50 billion visibilities, and find that the run-times for the distributed
algorithms range from 100 milliseconds to 3 minutes per iteration. This work
presents an important step in working towards computationally scalable and
efficient algorithms and implementations that are needed to image observations
of both extended and compact sources from next generation radio interferometers
such as the SKA. The algorithms are implemented in the latest versions of the
SOPT (https://github.com/astro-informatics/sopt) and PURIFY
(https://github.com/astro-informatics/purify) software packages {(Versions
3.1.0)}, which have been released alongside of this article.Comment: 25 pages, 5 figure
A Multi-GPU Programming Library for Real-Time Applications
We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
- …