1,124 research outputs found

    FFT for the APE Parallel Computer

    Get PDF
    We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

    A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM

    No full text
    In this article, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in the OFDM based IEEE 802.11a Wireless LAN (WLAN) baseband processor. The 64-point FFT is realized by decomposing it into a 2-D structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use any 2-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25 ?m BiCMOS technology. The core area of this chip is 6.8 mm2. The average dynamic power consumption is 41 mW @ 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i. e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption

    A Many-Core Overlay for High-Performance Embedded Computing on FPGAs

    Get PDF
    In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

    Frequency shifting approach towards textual transcription of heartbeat sounds

    Get PDF
    Auscultation is an approach for diagnosing many cardiovascular problems. Automatic analysis of heartbeat sounds and extraction of its audio features can assist physicians towards diagnosing diseases. Textual transcription allows recording a continuous heart sound stream using a text format which can be stored in very small memory in comparison with other audio formats. In addition, a text-based data allows applying indexing and searching techniques to access to the critical events. Hence, the transcribed heartbeat sounds provides useful information to monitor the behavior of a patient for the long duration of time. This paper proposes a frequency shifting method in order to improve the performance of the transcription. The main objective of this study is to transfer the heartbeat sounds to the music domain. The proposed technique is tested with 100 samples which were recorded from different heart diseases categories. The observed results show that, the proposed shifting method significantly improves the performance of the transcription

    A System for Compressive Sensing Signal Reconstruction

    Full text link
    An architecture for hardware realization of a system for sparse signal reconstruction is presented. The threshold based reconstruction method is considered, which is further modified in this paper to reduce the system complexity in order to provide easier hardware realization. Instead of using the partial random Fourier transform matrix, the minimization problem is reformulated using only the triangular R matrix from the QR decomposition. The triangular R matrix can be efficiently implemented in hardware without calculating the orthogonal Q matrix. A flexible and scalable realization of matrix R is proposed, such that the size of R changes with the number of available samples and sparsity level.Comment: 6 page

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Fast Fourier Transform algorithm design and tradeoffs

    Get PDF
    The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program

    A general framework for efficient FPGA implementation of matrix product

    Get PDF
    Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

    A VLSI pipeline design of a fast prime factor DFT on a finite field

    Get PDF
    A conventional prime factor discrete Fourier transform (DFT) algorithm is used to realize a discrete Fourier-like transform on the finite field, GF(q sub n). A pipeline structure is used to implement this prime factor DFT over GF(q sub n). This algorithm is developed to compute cyclic convolutions of complex numbers and to decode Reed-Solomon codes. Such a pipeline fast prime factor DFT algorithm over GF(q sub n) is regular, simple, expandable, and naturally suitable for VLSI implementation. An example illustrating the pipeline aspect of a 30-point transform over GF(q sub n) is presented

    Simulation of Parallel Pipeline Radix 2^2 Architecture

    Get PDF
    In popular orthogonal frequency division multiplexing (OFDM) communication system processing is one of the key procedures Fast Fourier transform (FFT) and inversely for that Fast Fourier Transform (IFFT) is one of them. In this VLSI implementation Structured pipeline architectures, low power consumption, high speed and reduced chip area are the important concerns. In this paper, presentation of the worthy implementation of FFT/IFFT processor for OFDM applications is described. We obtain the single-path delay feedback architecture, to get a ROM of smaller size and this proposed architecture applies a reconfigurable complex multiplier. To minimize the error of truncation we apply a fixed width modified booth multiplier. As a result, the proposed radix-2k feed forward architectures even offer an attractive solution for current applications, and also open up a new research line on feed forward structures
    • …
    corecore