76 research outputs found
Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
General purpose computing on graphics processing units (GPGPU) is
dramatically changing the landscape of high performance computing in astronomy.
In this paper, we identify and investigate several key decision areas, with a
goal of simplyfing the early adoption of GPGPU in astronomy. We consider the
merits of OpenCL as an open standard in order to reduce risks associated with
coding in a native, vendor-specific programming environment, and present a GPU
programming philosophy based on using brute force solutions. We assert that
effective use of new GPU-based supercomputing facilities will require a change
in approach from astronomers. This will likely include improved programming
training, an increased need for software development best-practice through the
use of profiling and related optimisation tools, and a greater reliance on
third-party code libraries. As with any new technology, those willing to take
the risks, and make the investment of time and effort to become early adopters
of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS
Advanced Architectures for Astrophysical Supercomputing
Astronomers have come to rely on the increasing performance of computers to
reduce, analyze, simulate and visualize their data. In this environment, faster
computation can mean more science outcomes or the opening up of new parameter
spaces for investigation. If we are to avoid major issues when implementing
codes on advanced architectures, it is important that we have a solid
understanding of our algorithms. A recent addition to the high-performance
computing scene that highlights this point is the graphics processing unit
(GPU). The hardware originally designed for speeding-up graphics rendering in
video games is now achieving speed-ups of in general-purpose
computation -- performance that cannot be ignored. We are using a generalized
approach, based on the analysis of astronomy algorithms, to identify the
optimal problem-types and techniques for taking advantage of both current GPU
hardware and future developments in computing architectures.Comment: 4 pages, 1 figure, to appear in the proceedings of ADASS XIX, Oct 4-8
2009, Sapporo, Japan (ASP Conf. Series
Accelerating incoherent dedispersion
Incoherent dedispersion is a computationally intensive problem that appears
frequently in pulsar and transient astronomy. For current and future transient
pipelines, dedispersion can dominate the total execution time, meaning its
computational speed acts as a constraint on the quality and quantity of science
results. It is thus critical that the algorithm be able to take advantage of
trends in commodity computing hardware. With this goal in mind, we present
analysis of the 'direct', 'tree' and 'sub-band' dedispersion algorithms with
respect to their potential for efficient execution on modern graphics
processing units (GPUs). We find all three to be excellent candidates, and
proceed to describe implementations in C for CUDA using insight gained from the
analysis. Using recent CPU and GPU hardware, the transition to the GPU provides
a speed-up of 9x for the direct algorithm when compared to an optimised
quad-core CPU code. For realistic recent survey parameters, these speeds are
high enough that further optimisation is unnecessary to achieve real-time
processing. Where further speed-ups are desirable, we find that the tree and
sub-band algorithms are able to provide 3-7x better performance at the cost of
certain smearing, memory consumption and development time trade-offs. We finish
with a discussion of the implications of these results for future transient
surveys. Our GPU dedispersion code is publicly available as a C library at:
http://dedisp.googlecode.com/Comment: 15 pages, 4 figures, 2 tables, accepted for publication in MNRA
Three-dimensional shapelets and an automated classification scheme for dark matter haloes
We extend the two-dimensional Cartesian shapelet formalism to d-dimensions.
Concentrating on the three-dimensional case, we derive shapelet-based equations
for the mass, centroid, root-mean-square radius, and components of the
quadrupole moment and moment of inertia tensors. Using cosmological N-body
simulations as an application domain, we show that three-dimensional shapelets
can be used to replicate the complex sub-structure of dark matter halos and
demonstrate the basis of an automated classification scheme for halo shapes. We
investigate the shapelet decomposition process from an algorithmic viewpoint,
and consider opportunities for accelerating the computation of shapelet-based
representations using graphics processing units (GPUs).Comment: 19 pages, 11 figures, accepted for publication in MNRA
Spotting Radio Transients with the help of GPUs
Exploration of the time-domain radio sky has huge potential for advancing our
knowledge of the dynamic universe. Past surveys have discovered large numbers
of pulsars, rotating radio transients and other transient radio phenomena;
however, they have typically relied upon off-line processing to cope with the
high data and processing rate. This paradigm rules out the possibility of
obtaining high-resolution base-band dumps of significant events or of
performing immediate follow-up observations, limiting analysis power to what
can be gleaned from detection data alone. To overcome this limitation,
real-time processing and detection of transient radio events is required. By
exploiting the significant computing power of modern graphics processing units
(GPUs), we are developing a transient-detection pipeline that runs in real-time
on data from the Parkes radio telescope. In this paper we discuss the
algorithms used in our pipeline, the details of their implementation on the GPU
and the challenges posed by the presence of radio frequency interference.Comment: 4 Pages. To appear in the proceedings of ADASS XXI, ed. P.Ballester
and D.Egret, ASP Conf. Serie
Real-Time RFI Mitigation for the Apertif Radio Transient System
Current and upcoming radio telescopes are being designed with increasing
sensitivity to detect new and mysterious radio sources of astrophysical origin.
While this increased sensitivity improves the likelihood of discoveries, it
also makes these instruments more susceptible to the deleterious effects of
Radio Frequency Interference (RFI). The challenge posed by RFI is exacerbated
by the high data-rates achieved by modern radio telescopes, which require
real-time processing to keep up with the data. Furthermore, the high data-rates
do not allow for permanent storage of observations at high resolution. Offline
RFI mitigation is therefore not possible anymore. The real-time requirement
makes RFI mitigation even more challenging because, on one side, the techniques
used for mitigation need to be fast and simple, and on the other side they also
need to be robust enough to cope with just a partial view of the data.
The Apertif Radio Transient System (ARTS) is the real-time, time-domain,
transient detection instrument of the Westerbork Synthesis Radio Telescope
(WSRT), processing 73 Gb of data per second. Even with a deep learning
classifier, the ARTS pipeline requires state-of-the-art real-time RFI
mitigation to reduce the number of false-positive detections. Our solution to
this challenge is RFIm, a high-performance, open-source, tuned, and extensible
RFI mitigation library. The goal of this library is to provide users with RFI
mitigation routines that are designed to run in real-time on many-core
accelerators, such as Graphics Processing Units, and that can be highly-tuned
to achieve code and performance portability to different hardware platforms and
scientific use-cases. Results on the ARTS show that we can achieve real-time
RFI mitigation, with a minimal impact on the total execution time of the search
pipeline, and considerably reduce the number of false-positives.Comment: 6 pages, 10 figures. To appear in Proceedings from the 2019 Radio
Frequency Interference workshop (RFI 2019), Toulouse, France (23-26
September
- …