80 research outputs found

    One- and two-level filter-bank convolvers

    Get PDF
    In a recent paper, it was shown in detail that in the case of orthonormal and biorthogonal filter banks we can convolve two signals by directly convolving the subband signals and combining the results. In this paper, we further generalize the result. We also derive the statistical coding gain for the generalized subband convolver. As an application, we derive a novel low sensitivity structure for FIR filters from the convolution theorem. We define and derive a deterministic coding gain of the subband convolver over direct convolution for a fixed wordlength implementation. This gain serves as a figure of merit for the low sensitivity structure. Several numerical examples are included to demonstrate the usefulness of these ideas. By using the generalized polyphase representation, we show that the subband convolvers, linear periodically time varying systems, and digital block filtering can be viewed in a unified manner. Furthermore, the scheme called IFIR filtering is shown to be a special case of the convolver

    Orthonormal and biorthonormal filter banks as convolvers, and convolutional coding gain

    Get PDF
    Convolution theorems for filter bank transformers are introduced. Both uniform and nonuniform decimation ratios are considered, and orthonormal as well as biorthonormal cases are addressed. All the theorems are such that the original convolution reduces to a sum of shorter, decoupled convolutions in the subbands. That is, there is no need to have cross convolution between subbands. For the orthonormal case, expressions for optimal bit allocation and the optimized coding gain are derived. The contribution to coding gain comes partly from the nonuniformity of the signal spectrum and partly from nonuniformity of the filter spectrum. With one of the convolved sequences taken to be the unit pulse function,,e coding gain expressions reduce to those for traditional subband and transform coding. The filter-bank convolver has about the same computational complexity as a traditional convolver, if the analysis bank has small complexity compared to the convolution itself

    Mapping systolic FIR filter banks onto fixed-size linear processor arrays

    Get PDF

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Matched Filters for Source Detection in the Poissonian Noise Regime

    Full text link
    A procedure is described for estimating an optimum kernel for the detection by convolution of signals among Poissonian noise. The technique is applied to the detection of x-ray point sources in XMM-Newton data, and is shown to yield an improvement in detection sensitivity of up to 60% over the sliding-box method used in the creation of the 1XMM catalog

    Maximizing CNN Accelerator Efficiency Through Resource Partitioning

    Full text link
    Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x

    FPGA implementations for parallel multidimensional filtering algorithms

    Get PDF
    PhD ThesisOne and multi dimensional raw data collections introduce noise and artifacts, which need to be recovered from degradations by an automated filtering system before, further machine analysis. The need for automating wide-ranged filtering applications necessitates the design of generic filtering architectures, together with the development of multidimensional and extensive convolution operators. Consequently, the aim of this thesis is to investigate the problem of automated construction of a generic parallel filtering system. Serving this goal, performance-efficient FPGA implementation architectures are developed to realize parallel one/multi-dimensional filtering algorithms. The proposed generic architectures provide a mechanism for fast FPGA prototyping of high performance computations to obtain efficiently implemented performance indices of area, speed, dynamic power, throughput and computation rates, as a complete package. These parallel filtering algorithms and their automated generic architectures tackle the major bottlenecks and limitations of existing multiprocessor systems in wordlength, input data segmentation, boundary conditions as well as inter-processor communications, in order to support high data throughput real-time applications of low-power architectures using a Xilinx Virtex-6 FPGA board. For one-dimensional raw signal filtering case, mathematical model and architectural development of the generalized parallel 1-D filtering algorithms are presented using the 1-D block filtering method. Five generic architectures are implemented on a Virtex-6 ML605 board, evaluated and compared. A complete set of results on area, speed, power, throughput and computation rates are obtained and discussed as performance indices for the 1-D convolution architectures. A successful application of parallel 1-D cross-correlation is demonstrated. For two dimensional greyscale/colour image processing cases, new parallel 2-D/3-D filtering algorithms are presented and mathematically modelled using input decimation and output image reconstruction by interpolation. Ten generic architectures are implemented on the Virtex-6 ML605 board, evaluated and compared. Key results on area, speed, power, throughput and computation rate are obtained and discussed as performance indices for the 2-D convolution architectures. 2-D image reconfigurable processors are developed and implemented using single, dual and quad MAC FIR units. 3-D Colour image processors are devised to act as 3-D colour filtering engines. A 2-D cross-correlator parallel engine is successfully developed as a parallel 2-D matched filtering algorithm for locating any MRI slice within a MRI data stack library. Twelve 3-D MRI filtering operators are plugged in and adapted to be suitable for biomedical imaging, including 3-D edge operators and 3-D noise smoothing operators. Since three dimensional greyscale/colour volumetric image applications are computationally intensive, a new parallel 3-D/4-D filtering algorithm is presented and mathematically modelled using volumetric data image segmentation by decimation and output reconstruction by interpolation, after simultaneously and independently performing 3-D filtering. Eight generic architectures are developed and implemented on the Virtex-6 board, including 3-D spatial and FFT convolution architectures. Fourteen 3-D MRI filtering operators are plugged and adapted for this particular biomedical imaging application, including 3-D edge operators and 3-D noise smoothing operators. Three successful applications are presented in 4-D colour MRI (fMRI) filtering processors, k-space MRI volume data filter and 3-D cross-correlator.IRAQI Government
    corecore