1,965 research outputs found

    DiFX2: A more flexible, efficient, robust and powerful software correlator

    Get PDF
    Software correlation, where a correlation algorithm written in a high-level language such as C++ is run on commodity computer hardware, has become increasingly attractive for small to medium sized and/or bandwidth constrained radio interferometers. In particular, many long baseline arrays (which typically have fewer than 20 elements and are restricted in observing bandwidth by costly recording hardware and media) have utilized software correlators for rapid, cost-effective correlator upgrades to allow compatibility with new, wider bandwidth recording systems and improve correlator flexibility. The DiFX correlator, made publicly available in 2007, has been a popular choice in such upgrades and is now used for production correlation by a number of observatories and research groups worldwide. Here we describe the evolution in the capabilities of the DiFX correlator over the past three years, including a number of new capabilities, substantial performance improvements, and a large amount of supporting infrastructure to ease use of the code. New capabilities include the ability to correlate a large number of phase centers in a single correlation pass, the extraction of phase calibration tones, correlation of disparate but overlapping sub-bands, the production of rapidly sampled filterbank and kurtosis data at minimal cost, and many more. The latest version of the code is at least 15% faster than the original, and in certain situations many times this value. Finally, we also present detailed test results validating the correctness of the new code.Comment: 28 pages, 9 figures, accepted for publication in PAS

    Data Provenance and Management in Radio Astronomy: A Stream Computing Approach

    Get PDF
    New approaches for data provenance and data management (DPDM) are required for mega science projects like the Square Kilometer Array, characterized by extremely large data volume and intense data rates, therefore demanding innovative and highly efficient computational paradigms. In this context, we explore a stream-computing approach with the emphasis on the use of accelerators. In particular, we make use of a new generation of high performance stream-based parallelization middleware known as InfoSphere Streams. Its viability for managing and ensuring interoperability and integrity of signal processing data pipelines is demonstrated in radio astronomy. IBM InfoSphere Streams embraces the stream-computing paradigm. It is a shift from conventional data mining techniques (involving analysis of existing data from databases) towards real-time analytic processing. We discuss using InfoSphere Streams for effective DPDM in radio astronomy and propose a way in which InfoSphere Streams can be utilized for large antennae arrays. We present a case-study: the InfoSphere Streams implementation of an autocorrelating spectrometer, and using this example we discuss the advantages of the stream-computing approach and the utilization of hardware accelerators

    Wavemoth -- Fast spherical harmonic transforms by butterfly matrix compression

    Full text link
    We present Wavemoth, an experimental open source code for computing scalar spherical harmonic transforms (SHTs). Such transforms are ubiquitous in astronomical data analysis. Our code performs substantially better than existing publicly available codes due to improvements on two fronts. First, the computational core is made more efficient by using small amounts of precomputed data, as well as paying attention to CPU instruction pipelining and cache usage. Second, Wavemoth makes use of a fast and numerically stable algorithm based on compressing a set of linear operators in a precomputation step. The resulting SHT scales as O(L^2 (log L)^2) for the resolution range of practical interest, where L denotes the spherical harmonic truncation degree. For low and medium-range resolutions, Wavemoth tends to be twice as fast as libpsht, which is the current state of the art implementation for the HEALPix grid. At the resolution of the Planck experiment, L ~ 4000, Wavemoth is between three and six times faster than libpsht, depending on the computer architecture and the required precision. Due to the experimental nature of the project, only spherical harmonic synthesis is currently supported, although adding support or spherical harmonic analysis should be trivial.Comment: 13 pages, 6 figures, accepted by ApJ

    General Purpose Computation on Graphics Processing Units Using OpenCL

    Get PDF
    Computational Science has emerged as a third pillar of science along with theory and experiment, where the parallelization for scientific computing is promised by different shared and distributed memory architectures such as, super-computer systems, grid and cluster based systems, multi-core and multiprocessor systems etc. In the recent years the use of GPUs (Graphic Processing Units) for General purpose computing commonly known as GPGPU made it an exciting addition to high performance computing systems (HPC) with respect to price and performance ratio. Current GPUs consist of several hundred computing cores arranged in streaming multi-processors so the degree of parallelism is promising. Moreover with the development of new and easy to use interfacing tools and programming languages such as OpenCL and CUDA made the GPUs suitable for different computation demanding applications such as micromagnetic simulations. In micromagnetic simulations, the study of magnetic behavior at very small time and space scale demands a huge computation time, where the calculation of magnetostatic field with complexity of O(Nlog(N)) using FFT algorithm for discrete convolution is the main contribution towards the whole simulation time, and it is computed many times at each time step interval. This study and observation of magnetization behavior at sub-nanosecond time-scales is crucial to a number of areas such as magnetic sensors, non volatile storage devices and magnetic nanowires etc. Since micromagnetic codes in general are suitable for parallel programming as it can be easily divided into independent parts which can run in parallel, therefore current trend for micromagnetic code concerns shifting the computationally intensive parts to GPUs. My PhD work mainly focuses on the development of highly parallel magnetostatic field solver for micromagnetic simulators on GPUs. I am using OpenCL for GPU implementation, with consideration that it is an open standard for parallel programming of heterogeneous systems for cross platform. The magnetostatic field calculation is dominated by the multidimensional FFTs (Fast Fourier Transform) computation. Therefore i have developed the specialized OpenCL based 3D-FFT library for magnetostatic field calculation which made it possible to fully exploit the zero padded input data with out transposition and symmetries inherent in the field calculation. Moreover it also provides a common interface for different vendors' GPUs. In order to fully utilize the GPUs parallel architecture the code needs to handle many hardware specific technicalities such as coalesced memory access, data transfer overhead between GPU and CPU, GPU global memory utilization, arithmetic computation, batch execution etc. In the second step to further increase the level of parallelism and performance, I have developed a parallel magnetostatic field solver on multiple GPUs. Utilizing multiple GPUs avoids dealing with many of the limitations of GPUs (e.g., on-chip memory resources) by exploiting the combined resources of multiple on board GPUs. The GPU implementation have shown an impressive speedup against equivalent OpenMp based parallel implementation on CPU, which means the micromagnetic simulations which require weeks of computation on CPU now can be performed very fast in hours or even in minutes on GPUs. In parallel I also worked on ordered queue management on GPUs. Ordered queue management is used in many applications including real-time systems, operating systems, and discrete event simulations. In most cases, the efficiency of an application itself depends on usage of a sorting algorithm for priority queues. Lately, the usage of graphic cards for general purpose computing has again revisited sorting algorithms. In this work i have presented the analysis of different sorting algorithms with respect to sorting time, sorting rate and speedup on different GPU and CPU architectures and provided a new sorting technique on GPU

    On the impact of communication complexity in the design of parallel numerical algorithms

    Get PDF
    This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation
    • …
    corecore