Search CORE

1,545,778 research outputs found

GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing

Author: Bandura Kevin
Denman Nolan
Klages Peter
Recnik Andre
Sievers Jonathan
Vanderlinde Keith
Publication venue
Publication date: 20/03/2015
Field of study

Interferometric radio telescopes often rely on computationally expensive O(N^2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit width of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With this data and calculation packing scheme, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. The kernel design allows correlations to scale to large numbers of input elements, limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.Comment: 5 pages, 4 figures, submitted to IEEE ASAP 2015 Conferenc

arXiv.org e-Print Archive

Crossref

A parallel grid-based implementation for real time processing of event log data in collaborative applications

Author: Barolli Leonard
Caballé Llobet Santi
Paniagua Macià Claudi
Xhafa Fatos
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2010
Field of study

Collaborative applications usually register user interaction in the form of semi-structured plain text event log data. Extracting and structuring of data is a prerequisite for later key processes such as the analysis of interactions, assessment of group activity, or the provision of awareness and feedback. Yet, in real situations of online collaborative activity, the processing of log data is usually done offline since structuring event log data is, in general, a computationally costly process and the amount of log data tends to be very large. Techniques to speed and scale up the structuring and processing of log data with minimal impact on the performance of the collaborative application are thus desirable to be able to process log data in real time. In this paper, we present a parallel grid-based implementation for processing in real time the event log data generated in collaborative applications. Our results show the feasibility of using grid middleware to speed and scale up the process of structuring and processing semi-structured event log data. The Grid prototype follows the Master-Worker (MW) paradigm. It is implemented using the Globus Toolkit (GT) and is tested on the Planetlab platform

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

The Oberta in open access

UPCommons (Universitat Politècnica de Catalunya)

Synthetic aperture radar signal processing on the MPP

Author: Ramapriyan H. K.
Seiler E. J.
Publication venue
Publication date
Field of study

Satellite-borne Synthetic Aperture Radars (SAR) sense areas of several thousand square kilometers in seconds and transmit phase history signal data several tens of megabits per second. The Shuttle Imaging Radar-B (SIR-B) has a variable swath of 20 to 50 km and acquired data over 100 kms along track in about 13 seconds. With the simplification of separability of the reference function, the processing still requires considerable resources; high speed I/O, large memory and fast computation. Processing systems with regular hardware take hours to process one Seasat image and about one hour for a SIR-B image. Bringing this processing time closer to acquisition times requires an end-to-end system solution. For the purpose of demonstration, software was implemented on the present Massively Parallel Processor (MPP) configuration for processing Seasat and SIR-B data. The software takes advantage of the high processing speed offered by the MPP, the large Staging Buffer, and the high speed I/O between the MPP array unit and the Staging Buffer. It was found that with unoptimized Parallel Pascal code, the processing time on the MPP for a 4096 x 4096 sample subset of signal data ranges between 18 and 30.2 seconds depending on options

NASA Technical Reports Server

GPU-driven recombination and transformation of YCoCg-R video samples

Author: De Neve Wesley
Van de Walle Rik
Van Rijsselbergen Dieter
Publication venue: ACTA Press Anaheim
Publication date: 01/01/2006
Field of study

Common programmable Graphics Processing Units (GPU) are capable of more than just rendering real-time effects for games. They can also be used for image processing and the acceleration of video decoding. This paper describes an extended implementation of the H.264/AVC YCoCg-R to RGB color space transformation on the GPU. Both the color space transformation and recombination of the color samples from a nontrivial data layout are performed by the GPU. Using mid- to high-range GPUs, this extended implementation offers a significant gain in processing speed compared to an existing basic GPU version and an optimized CPU implementation. An ATI X1900 GPU was capable of processing more than 73 high-resolution 1080p YCoCg-R frames per second, which is over twice the speed of the CPU-only transformation using a Pentium D 820

Ghent University Academic Bibliography

Design of a high-speed digital processing element for parallel simulation

Author: Cwynar D. S.
Milner E. J.
Publication venue
Publication date
Field of study

A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design

NASA Technical Reports Server

Probabilistic wind speed forecasting in Hungary

Author: Baran Sándor
Horányi András
Nemoda Dóra
Publication venue: 'Schweizerbart'
Publication date: 07/01/2013
Field of study

Prediction of various weather quantities is mostly based on deterministic numerical weather forecasting models. Multiple runs of these models with different initial conditions result ensembles of forecasts which are applied for estimating the distribution of future weather quantities. However, the ensembles are usually under-dispersive and uncalibrated, so post-processing is required. In the present work Bayesian Model Averaging (BMA) is applied for calibrating ensembles of wind speed forecasts produced by the operational Limited Area Model Ensemble Prediction System of the Hungarian Meteorological Service (HMS). We describe two possible BMA models for wind speed data of the HMS and show that BMA post-processing significantly improves the calibration and precision of forecasts.Comment: 17 pages, 10 figure

arXiv.org e-Print Archive

Crossref

DEA University of Debrecen Electronic Archive

Investigation of the speed-up of a dual microcontroller parallel processing system in the execution of a mathematical operation

Author: Harrington Peter
Ng Wai Pang
Publication venue
Publication date
Field of study

An investigation of the performance of a two microcontroller parallel processing system is presented. A twomicrocontroller parallel processing is developed using low end microcontrollers (PIC 16F877). An 8x8 bit multiply operation and a 16x16 bit multiply operation are executed on a single microcontroller and on the proposed dual microcontroller parallel processing system in order to assess the performance of the proposed system. Results presented show poor performance for the 8x8 bit multiply with an average speed up factor of 0.82 This is due to the time required to transfer data around the dual microcontroller system being significant in comparison to the time required to complete the multiply operation, thus nullifying the potential advantage that might be expected of a dual microcontroller system. The 16x16 multiplier exhibited good performance, with results showing a maximum average speed up factor of 1.7 and an average speed up factor of 1.5. The 16x16 multiplication requires longer time to compute and the data transfer time between microcontrollers whilst still having an impact on the overall computation time is significantly less than for the 8x8 multiplier A formula has been developed to provide an estimate of the possible speed up within a system in relation to the process execution time and the time required to communicate data around the proposed system. The proposed system was developed and tested using the Proteus simulation software

Northumbria University Research Portal

Man-machine interactive imaging and data processing using high-speed digital mass storage

Author: Alsberg H.
Nathan R.
Publication venue
Publication date
Field of study

The role of vision in teleoperation has been recognized as an important element in the man-machine control loop. In most applications of remote manipulation, direct vision cannot be used. To overcome this handicap, the human operator's control capabilities are augmented by a television system. This medium provides a practical and useful link between workspace and the control station from which the operator perform his tasks. Human performance deteriorates when the images are degraded as a result of instrumental and transmission limitations. Image enhancement is used to bring out selected qualities in a picture to increase the perception of the observer. A general purpose digital computer, an extensive special purpose software system is used to perform an almost unlimited repertoire of processing operations

NASA Technical Reports Server

Very high speed direct-readout, control and recording system

Author: Turner J. W.
Publication venue
Publication date: 01/07/1972
Field of study

Characteristics of electronic system for high speed readout, control, and recording of data are discussed. Operation of system is described to show rate of data processing and accuracy obtainable. Primary advantage of system is providing direct recording of parameter value several times per second

NASA Technical Reports Server