192 research outputs found

    Monolithic SAW convolvers using chirp transducers

    Get PDF

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Mapping systolic FIR filter banks onto fixed-size linear processor arrays

    Get PDF

    On the realization of discrete cosine transform using the distributed arithmetic

    Get PDF
    Version of RecordPublishe

    A vision-based system for inspecting painted slates

    Get PDF
    Purpose – This paper describes the development of a novel automated vision system used to detect the visual defects on painted slates. Design/methodology/approach – The vision system that has been developed consists of two major components covering the opto-mechanical and algorithmical aspects of the system. The first component addresses issues including the mechanical implementation and interfacing the inspection system with the development of a fast image processing procedure able to identify visual defects present on the slate surface. Findings – The inspection system was developed on 400 slates to determine the threshold settings that give the best trade-off between no false positive triggers and correct defect identification. The developed system was tested on more than 300 fresh slates and the success rate for correct identification of acceptable and defective slates was 99.32 per cent for defect free slates based on 148 samples and 96.91 per cent for defective slates based on 162 samples. Practical implications – The experimental data indicates that automating the inspection of painted slates can be achieved and installation in a factory is a realistic target. Testing the devised inspection system in a factory-type environment was an important part of the development process as this enabled us to develop the mechanical system and the image processing algorithm able to perform slate inspection in an industrial environment. The overall performance of the system indicates that the proposed solution can be considered as a replacement for the existing manual inspection system. Originality/value – The development of a real-time automated system for inspecting painted slates proved to be a difficult task since the slate surface is dark coloured, glossy, has depth profile non-uniformities and is being transported at high speeds on a conveyor. In order to address these issues, the system described in this paper proposed a number of novel solutions including the illumination set-up and the development of multi-component image-processing inspection algorithm

    Matched Filters for Source Detection in the Poissonian Noise Regime

    Full text link
    A procedure is described for estimating an optimum kernel for the detection by convolution of signals among Poissonian noise. The technique is applied to the detection of x-ray point sources in XMM-Newton data, and is shown to yield an improvement in detection sensitivity of up to 60% over the sliding-box method used in the creation of the 1XMM catalog

    FPGA implementations for parallel multidimensional filtering algorithms

    Get PDF
    PhD ThesisOne and multi dimensional raw data collections introduce noise and artifacts, which need to be recovered from degradations by an automated filtering system before, further machine analysis. The need for automating wide-ranged filtering applications necessitates the design of generic filtering architectures, together with the development of multidimensional and extensive convolution operators. Consequently, the aim of this thesis is to investigate the problem of automated construction of a generic parallel filtering system. Serving this goal, performance-efficient FPGA implementation architectures are developed to realize parallel one/multi-dimensional filtering algorithms. The proposed generic architectures provide a mechanism for fast FPGA prototyping of high performance computations to obtain efficiently implemented performance indices of area, speed, dynamic power, throughput and computation rates, as a complete package. These parallel filtering algorithms and their automated generic architectures tackle the major bottlenecks and limitations of existing multiprocessor systems in wordlength, input data segmentation, boundary conditions as well as inter-processor communications, in order to support high data throughput real-time applications of low-power architectures using a Xilinx Virtex-6 FPGA board. For one-dimensional raw signal filtering case, mathematical model and architectural development of the generalized parallel 1-D filtering algorithms are presented using the 1-D block filtering method. Five generic architectures are implemented on a Virtex-6 ML605 board, evaluated and compared. A complete set of results on area, speed, power, throughput and computation rates are obtained and discussed as performance indices for the 1-D convolution architectures. A successful application of parallel 1-D cross-correlation is demonstrated. For two dimensional greyscale/colour image processing cases, new parallel 2-D/3-D filtering algorithms are presented and mathematically modelled using input decimation and output image reconstruction by interpolation. Ten generic architectures are implemented on the Virtex-6 ML605 board, evaluated and compared. Key results on area, speed, power, throughput and computation rate are obtained and discussed as performance indices for the 2-D convolution architectures. 2-D image reconfigurable processors are developed and implemented using single, dual and quad MAC FIR units. 3-D Colour image processors are devised to act as 3-D colour filtering engines. A 2-D cross-correlator parallel engine is successfully developed as a parallel 2-D matched filtering algorithm for locating any MRI slice within a MRI data stack library. Twelve 3-D MRI filtering operators are plugged in and adapted to be suitable for biomedical imaging, including 3-D edge operators and 3-D noise smoothing operators. Since three dimensional greyscale/colour volumetric image applications are computationally intensive, a new parallel 3-D/4-D filtering algorithm is presented and mathematically modelled using volumetric data image segmentation by decimation and output reconstruction by interpolation, after simultaneously and independently performing 3-D filtering. Eight generic architectures are developed and implemented on the Virtex-6 board, including 3-D spatial and FFT convolution architectures. Fourteen 3-D MRI filtering operators are plugged and adapted for this particular biomedical imaging application, including 3-D edge operators and 3-D noise smoothing operators. Three successful applications are presented in 4-D colour MRI (fMRI) filtering processors, k-space MRI volume data filter and 3-D cross-correlator.IRAQI Government

    Efficient convolvers using the Polynomial Residue Number System technique

    Get PDF
    The problem of computing linear convolution is a very important one because with linear convolution we can mechanize digital filtering. The linear convolution of two N-point sequences can be computed by the cyclic convolution of the following 2N-point sequences. The original sequence padded with N zero’s each. The cyclic convolution of two N-point sequences requires multiplications and additions for its computation. A very efficient way of computing cyclic convolution of two sequences is by using the Polynomial Residue Number System (PRNS) technique. Using this technique the cyclic convolution of two N-point sequences can be computed using only N multiplications instead of N2 multiplications. This can be achieved based on some forward and inverse PRNS transformation mappings. These mappings rely on additions, subtractions and many scaling operations (multiplications by constants). The PRNS technique would lose a lot in value if these many scaling operations were difficultly implemented. In this thesis we will show how to calculate cyclic convolution of two sequences using the PRNS technique based on forward and inverse transformation mapping which rely on complement operations (negations), additions and rotation operations. These rotation operations do not require any computational hardware. Therefore the complicated hardware required for the scaling operations has now been substituted by rotators, which do not require any computational hardware
    corecore