388 research outputs found
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
General purpose computing on graphics processing units (GPGPU) is
dramatically changing the landscape of high performance computing in astronomy.
In this paper, we identify and investigate several key decision areas, with a
goal of simplyfing the early adoption of GPGPU in astronomy. We consider the
merits of OpenCL as an open standard in order to reduce risks associated with
coding in a native, vendor-specific programming environment, and present a GPU
programming philosophy based on using brute force solutions. We assert that
effective use of new GPU-based supercomputing facilities will require a change
in approach from astronomers. This will likely include improved programming
training, an increased need for software development best-practice through the
use of profiling and related optimisation tools, and a greater reliance on
third-party code libraries. As with any new technology, those willing to take
the risks, and make the investment of time and effort to become early adopters
of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS
A GPU-based Correlator X-engine Implemented on the CHIME Pathfinder
We present the design and implementation of a custom GPU-based compute
cluster that provides the correlation X-engine of the CHIME Pathfinder radio
telescope. It is among the largest such systems in operation, correlating
32,896 baselines (256 inputs) over 400MHz of radio bandwidth. Making heavy use
of consumer-grade parts and a custom software stack, the system was developed
at a small fraction of the cost of comparable installations. Unlike existing
GPU backends, this system is built around OpenCL kernels running on
consumer-level AMD GPUs, taking advantage of low-cost hardware and leveraging
packed integer operations to double algorithmic efficiency. The system achieves
the required 105TOPS in a 10kW power envelope, making it among the most
power-efficient X-engines in use today.Comment: 6 pages, 5 figures. Accepted by IEEE ASAP 201
Adaptive Mesh Fluid Simulations on GPU
We describe an implementation of compressible inviscid fluid solvers with
block-structured adaptive mesh refinement on Graphics Processing Units using
NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes
can be mapped naturally on this architecture. Using the method of lines
approach with the second order total variation diminishing Runge-Kutta time
integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer
Riemann solver, we achieve an overall speedup of approximately 10 times faster
execution on one graphics card as compared to a single core on the host
computer. We attain this speedup in uniform grid runs as well as in problems
with deep AMR hierarchies. Our framework can readily be applied to more general
systems of conservation laws and extended to higher order shock capturing
schemes. This is shown directly by an implementation of a magneto-hydrodynamic
solver and comparing its performance to the pure hydrodynamic case. Finally, we
also combined our CUDA parallel scheme with MPI to make the code run on GPU
clusters. Close to ideal speedup is observed on up to four GPUs.Comment: Submitted to New Astronom
PONDER - A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope
This paper describes a new real-time versatile backend, the Pulsar Ooty Radio
Telescope New Digital Efficient Receiver (PONDER), which has been designed to
operate along with the legacy analog system of the Ooty Radio Telescope (ORT).
PONDER makes use of the current state of the art computing hardware, a
Graphical Processing Unit (GPU) and sufficiently large disk storage to support
high time resolution real-time data of pulsar observations, obtained by
coherent dedispersion over a bandpass of 16 MHz. Four different modes for
pulsar observations are implemented in PONDER to provide standard reduced data
products, such as time-stamped integrated profiles and dedispersed time series,
allowing faster avenues to scientific results for a variety of pulsar studies.
Additionally, PONDER also supports general modes of interplanetary
scintillation (IPS) measurements and very long baseline interferometry data
recording. The IPS mode yields a single polarisation correlated time series of
solar wind scintillation over a bandwidth of about four times larger (16 MHz)
than that of the legacy system as well as its fluctuation spectrum with high
temporal and frequency resolutions. The key point is that all the above modes
operate in real time. This paper presents the design aspects of PONDER and
outlines the design methodology for future similar backends. It also explains
the principal operations of PONDER, illustrates its capabilities for a variety
of pulsar and IPS observations and demonstrates its usefulness for a variety of
astrophysical studies using the high sensitivity of the ORT.Comment: 25 pages, 14 figures, Accepted by Experimental Astronom
- …