Search CORE

2,477 research outputs found

Low-power Programmable Processor for Fast Fourier Transform Based on Transport Triggered Architecture

Author: Takala Jarmo
Žádník Jakub
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/04/2019
Field of study

This paper describes a low-power processor tailored for fast Fourier transform computations where transport triggering template is exploited. The processor is software-programmable while retaining an energy-efficiency comparable to existing fixed-function implementations. The power savings are achieved by compressing the computation kernel into one instruction word. The word is stored in an instruction loop buffer, which is more power-efficient than regular instruction memory storage. The processor supports all power-of-two FFT sizes from 64 to 16384 and given 1 mJ of energy, it can compute 20916 transforms of size 1024.Comment: 5 pages, 4 figures, 1 table, ICASSP 2019 conferenc

arXiv.org e-Print Archive

Crossref

Trepo - Institutional Repository of Tampere University

Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP

Author: Hauck S.A.
Smit Gerardus Johannes Maria
van de Burgwal M.D.
Wolkotte P.T.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2009
Field of study

Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation

CiteSeerX

Crossref

Directory of Open Access Journals

University of Twente Research Information

A Multi-GPU Programming Library for Real-Time Applications

Author: Schaetz Sebastian
Uecker Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure

arXiv.org e-Print Archive

MPG.PuRe

Parallel algorithms for atmospheric modelling

Author: Tett Simon F. B.
Publication venue: The University of Edinburgh
Publication date: 01/01/1992
Field of study

Edinburgh Research Archive

Parallel Integer Polynomial Multiplication

Author: Chen Changbo
Covanov Svyatoslav
Mansouri Farnam
Maza Marc Moreno
Xie Ning
Xie Yuzhen
Publication venue
Publication date: 24/09/2016
Field of study

We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate the advantages of this new approach

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Reducing branch delay to zero in pipelined processors

Author: González Colás Antonio María
Llaberia Griñó José M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1993
Field of study

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Serial-data computation in VLSI

Author: Smith Stewart Gresty
Publication venue: The University of Edinburgh
Publication date: 01/01/1987
Field of study

Edinburgh Research Archive