380 research outputs found

    Accelerated coplanar facet radio synthesis imaging

    Get PDF
    Imaging in radio astronomy entails the Fourier inversion of the relation between the sampled spatial coherence of an electromagnetic field and the intensity of its emitting source. This inversion is normally computed by performing a convolutional resampling step and applying the Inverse Fast Fourier Transform, because this leads to computational savings. Unfortunately, the resulting planar approximation of the sky is only valid over small regions. When imaging over wider fields of view, and in particular using telescope arrays with long non-East-West components, significant distortions are introduced in the computed image. We propose a coplanar faceting algorithm, where the sky is split up into many smaller images. Each of these narrow-field images are further corrected using a phase-correcting tech- nique known as w-projection. This eliminates the projection error along the edges of the facets and ensures approximate coplanarity. The combination of faceting and w-projection approaches alleviates the memory constraints of previous w-projection implementations. We compared the scaling performance of both single and double precision resampled images in both an optimized multi-threaded CPU implementation and a GPU implementation that uses a memory-access- limiting work distribution strategy. We found that such a w-faceting approach scales slightly better than a traditional w-projection approach on GPUs. We also found that double precision resampling on GPUs is about 71% slower than its single precision counterpart, making double precision resampling on GPUs less power efficient than CPU-based double precision resampling. Lastly, we have seen that employing only single precision in the resampling summations produces significant error in continuum images for a MeerKAT-sized array over long observations, especially when employing the large convolution filters necessary to create large images

    Radio-Astronomical Imaging on Accelerators

    Get PDF
    Imaging is considered the most compute-intensive and therefore most challenging part of a radio-astronomical data-processing pipeline. To reach the high dynamic ranges imposed by the high sensitivity and large field of view of the new generation of radio telescopes such as the Square Kilometre Array (SKA), we need to be able to correct for direction-independent effects (DIEs) such as the curvature of the earth as well as for direction-dependent time-varying effects (DDEs) such as those caused by the ionosphere during imaging. The novel Image-Domain gridding (IDG) algorithm was designed to avoid the performance bottlenecks of traditional imaging algorithms. We implement, optimize, and analyze the performance and energy efficiency of IDG on a variety of hardware platforms to find which platform is the best for IDG. We analyze traditional CPUs, as well as several accelerators architectures. IDG alleviates the limitations of traditional imaging algorithms while it enables the advantages of GPU acceleration: better performance at lower power consumption. The hardware-software co-design has resulted in a highly efficient imager. This makes IDG on GPUs an ideal candidate for meeting the computational and energy efficiency constraints of the SKA. IDG has been integrated with a widely-used astronomical imager (WSClean) and is now being used in production by a variety of different radio observatories such as LOFAR and the MWA. It is not only faster and more energy-efficient than its competitors, but it also produces better quality images

    Near Memory Acceleration on High Resolution Radio Astronomy Imaging

    Full text link
    Modern radio telescopes like the Square Kilometer Array (SKA) will need to process in real-time exabytes of radio-astronomical signals to construct a high-resolution map of the sky. Near-Memory Computing (NMC) could alleviate the performance bottlenecks due to frequent memory accesses in a state-of-the-art radio-astronomy imaging algorithm. In this paper, we show that a sub-module performing a two-dimensional fast Fourier transform (2D FFT) is memory bound using CPI breakdown analysis on IBM Power9. Then, we present an NMC approach on FPGA for 2D FFT that outperforms a CPU by up to a factor of 120x and performs comparably to a high-end GPU, while using less bandwidth and memory

    Machine vision and the OMV

    Get PDF
    The orbital Maneuvering Vehicle (OMV) is intended to close with orbiting targets for relocation or servicing. It will be controlled via video signals and thruster activation based upon Earth or space station directives. A human operator is squarely in the middle of the control loop for close work. Without directly addressing future, more autonomous versions of a remote servicer, several techniques that will doubtless be important in a future increase of autonomy also have some direct application to the current situation, particularly in the area of image enhancement and predictive analysis. Several techniques are presentet, and some few have been implemented, which support a machine vision capability proposed to be adequate for detection, recognition, and tracking. Once feasibly implemented, they must then be further modified to operate together in real time. This may be achieved by two courses, the use of an array processor and some initial steps toward data reduction. The methodology or adapting to a vector architecture is discussed in preliminary form, and a highly tentative rationale for data reduction at the front end is also discussed. As a by-product, a working implementation of the most advanced graphic display technique, ray-casting, is described

    Simulation of reaction-diffusion processes in three dimensions using CUDA

    Get PDF
    Numerical solution of reaction-diffusion equations in three dimensions is one of the most challenging applied mathematical problems. Since these simulations are very time consuming, any ideas and strategies aiming at the reduction of CPU time are important topics of research. A general and robust idea is the parallelization of source codes/programs. Recently, the technological development of graphics hardware created a possibility to use desktop video cards to solve numerically intensive problems. We present a powerful parallel computing framework to solve reaction-diffusion equations numerically using the Graphics Processing Units (GPUs) with CUDA. Four different reaction-diffusion problems, (i) diffusion of chemically inert compound, (ii) Turing pattern formation, (iii) phase separation in the wake of a moving diffusion front and (iv) air pollution dispersion were solved, and additionally both the Shared method and the Moving Tiles method were tested. Our results show that parallel implementation achieves typical acceleration values in the order of 5-40 times compared to CPU using a single-threaded implementation on a 2.8 GHz desktop computer.Comment: 8 figures, 5 table

    The Murchison Widefield Array: the Square Kilometre Array Precursor at low radio frequencies

    Full text link
    The Murchison Widefield Array (MWA) is one of three Square Kilometre Array Precursor telescopes and is located at the Murchison Radio-astronomy Observatory in the Murchison Shire of the mid-west of Western Australia, a location chosen for its extremely low levels of radio frequency interference. The MWA operates at low radio frequencies, 80-300 MHz, with a processed bandwidth of 30.72 MHz for both linear polarisations, and consists of 128 aperture arrays (known as tiles) distributed over a ~3 km diameter area. Novel hybrid hardware/software correlation and a real-time imaging and calibration systems comprise the MWA signal processing backend. In this paper the as-built MWA is described both at a system and sub-system level, the expected performance of the array is presented, and the science goals of the instrument are summarised.Comment: Submitted to PASA. 11 figures, 2 table

    Distributed and parallel sparse convex optimization for radio interferometry with PURIFY

    Full text link
    Next generation radio interferometric telescopes are entering an era of big data with extremely large data sets. While these telescopes can observe the sky in higher sensitivity and resolution than before, computational challenges in image reconstruction need to be overcome to realize the potential of forthcoming telescopes. New methods in sparse image reconstruction and convex optimization techniques (cf. compressive sensing) have shown to produce higher fidelity reconstructions of simulations and real observations than traditional methods. This article presents distributed and parallel algorithms and implementations to perform sparse image reconstruction, with significant practical considerations that are important for implementing these algorithms for Big Data. We benchmark the algorithms presented, showing that they are considerably faster than their serial equivalents. We then pre-sample gridding kernels to scale the distributed algorithms to larger data sizes, showing application times for 1 Gb to 2.4 Tb data sets over 25 to 100 nodes for up to 50 billion visibilities, and find that the run-times for the distributed algorithms range from 100 milliseconds to 3 minutes per iteration. This work presents an important step in working towards computationally scalable and efficient algorithms and implementations that are needed to image observations of both extended and compact sources from next generation radio interferometers such as the SKA. The algorithms are implemented in the latest versions of the SOPT (https://github.com/astro-informatics/sopt) and PURIFY (https://github.com/astro-informatics/purify) software packages {(Versions 3.1.0)}, which have been released alongside of this article.Comment: 25 pages, 5 figure

    A Multi-GPU Programming Library for Real-Time Applications

    Full text link
    We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
    corecore