1,965 research outputs found
DiFX2: A more flexible, efficient, robust and powerful software correlator
Software correlation, where a correlation algorithm written in a high-level
language such as C++ is run on commodity computer hardware, has become
increasingly attractive for small to medium sized and/or bandwidth constrained
radio interferometers. In particular, many long baseline arrays (which
typically have fewer than 20 elements and are restricted in observing bandwidth
by costly recording hardware and media) have utilized software correlators for
rapid, cost-effective correlator upgrades to allow compatibility with new,
wider bandwidth recording systems and improve correlator flexibility. The DiFX
correlator, made publicly available in 2007, has been a popular choice in such
upgrades and is now used for production correlation by a number of
observatories and research groups worldwide. Here we describe the evolution in
the capabilities of the DiFX correlator over the past three years, including a
number of new capabilities, substantial performance improvements, and a large
amount of supporting infrastructure to ease use of the code. New capabilities
include the ability to correlate a large number of phase centers in a single
correlation pass, the extraction of phase calibration tones, correlation of
disparate but overlapping sub-bands, the production of rapidly sampled
filterbank and kurtosis data at minimal cost, and many more. The latest version
of the code is at least 15% faster than the original, and in certain situations
many times this value. Finally, we also present detailed test results
validating the correctness of the new code.Comment: 28 pages, 9 figures, accepted for publication in PAS
Data Provenance and Management in Radio Astronomy: A Stream Computing Approach
New approaches for data provenance and data management (DPDM) are required
for mega science projects like the Square Kilometer Array, characterized by
extremely large data volume and intense data rates, therefore demanding
innovative and highly efficient computational paradigms. In this context, we
explore a stream-computing approach with the emphasis on the use of
accelerators. In particular, we make use of a new generation of high
performance stream-based parallelization middleware known as InfoSphere
Streams. Its viability for managing and ensuring interoperability and integrity
of signal processing data pipelines is demonstrated in radio astronomy. IBM
InfoSphere Streams embraces the stream-computing paradigm. It is a shift from
conventional data mining techniques (involving analysis of existing data from
databases) towards real-time analytic processing. We discuss using InfoSphere
Streams for effective DPDM in radio astronomy and propose a way in which
InfoSphere Streams can be utilized for large antennae arrays. We present a
case-study: the InfoSphere Streams implementation of an autocorrelating
spectrometer, and using this example we discuss the advantages of the
stream-computing approach and the utilization of hardware accelerators
Wavemoth -- Fast spherical harmonic transforms by butterfly matrix compression
We present Wavemoth, an experimental open source code for computing scalar
spherical harmonic transforms (SHTs). Such transforms are ubiquitous in
astronomical data analysis. Our code performs substantially better than
existing publicly available codes due to improvements on two fronts. First, the
computational core is made more efficient by using small amounts of precomputed
data, as well as paying attention to CPU instruction pipelining and cache
usage. Second, Wavemoth makes use of a fast and numerically stable algorithm
based on compressing a set of linear operators in a precomputation step. The
resulting SHT scales as O(L^2 (log L)^2) for the resolution range of practical
interest, where L denotes the spherical harmonic truncation degree. For low and
medium-range resolutions, Wavemoth tends to be twice as fast as libpsht, which
is the current state of the art implementation for the HEALPix grid. At the
resolution of the Planck experiment, L ~ 4000, Wavemoth is between three and
six times faster than libpsht, depending on the computer architecture and the
required precision. Due to the experimental nature of the project, only
spherical harmonic synthesis is currently supported, although adding support or
spherical harmonic analysis should be trivial.Comment: 13 pages, 6 figures, accepted by ApJ
General Purpose Computation on Graphics Processing Units Using OpenCL
Computational Science has emerged as a third pillar of science along with theory and experiment, where the parallelization for scientific computing is promised by different shared and distributed memory architectures such as, super-computer systems, grid and cluster based systems, multi-core and multiprocessor systems etc. In the recent years the use of GPUs (Graphic Processing Units) for General purpose computing commonly known as GPGPU made it an exciting addition to high performance computing systems (HPC) with respect to price and performance ratio. Current GPUs consist of several hundred computing cores arranged in streaming multi-processors so the degree of parallelism is promising. Moreover with the development of new and easy to use interfacing tools and programming languages such as OpenCL and CUDA made the GPUs suitable for different computation demanding applications such as micromagnetic simulations. In micromagnetic simulations, the study of magnetic behavior at very small time and space scale demands a huge computation time, where the calculation of magnetostatic field with complexity of O(Nlog(N)) using FFT algorithm for discrete convolution is the main contribution towards the whole simulation time, and it is computed many times at each time step interval. This study and observation of magnetization behavior at sub-nanosecond time-scales is crucial to a number of areas such as magnetic sensors, non volatile storage devices and magnetic nanowires etc. Since micromagnetic codes in general are suitable for parallel programming as it can be easily divided into independent parts which can run in parallel, therefore current trend for micromagnetic code concerns shifting the computationally intensive parts to GPUs. My PhD work mainly focuses on the development of highly parallel magnetostatic field solver for micromagnetic simulators on GPUs. I am using OpenCL for GPU implementation, with consideration that it is an open standard for parallel programming of heterogeneous systems for cross platform. The magnetostatic field calculation is dominated by the multidimensional FFTs (Fast Fourier Transform) computation. Therefore i have developed the specialized OpenCL based 3D-FFT library for magnetostatic field calculation which made it possible to fully exploit the zero padded input data with out transposition and symmetries inherent in the field calculation. Moreover it also provides a common interface for different vendors' GPUs. In order to fully utilize the GPUs parallel architecture the code needs to handle many hardware specific technicalities such as coalesced memory access, data transfer overhead between GPU and CPU, GPU global memory utilization, arithmetic computation, batch execution etc. In the second step to further increase the level of parallelism and performance, I have developed a parallel magnetostatic field solver on multiple GPUs. Utilizing multiple GPUs avoids dealing with many of the limitations of GPUs (e.g., on-chip memory resources) by exploiting the combined resources of multiple on board GPUs. The GPU implementation have shown an impressive speedup against equivalent OpenMp based parallel implementation on CPU, which means the micromagnetic simulations which require weeks of computation on CPU now can be performed very fast in hours or even in minutes on GPUs. In parallel I also worked on ordered queue management on GPUs. Ordered queue management is used in many applications including real-time systems, operating systems, and discrete event simulations. In most cases, the efficiency of an application itself depends on usage of a sorting algorithm for priority queues. Lately, the usage of graphic cards for general purpose computing has again revisited sorting algorithms. In this work i have presented the analysis of different sorting algorithms with respect to sorting time, sorting rate and speedup on different GPU and CPU architectures and provided a new sorting technique on GPU
On the impact of communication complexity in the design of parallel numerical algorithms
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation
- …