997 research outputs found

    FPGA implementations for parallel multidimensional filtering algorithms

    Get PDF
    PhD ThesisOne and multi dimensional raw data collections introduce noise and artifacts, which need to be recovered from degradations by an automated filtering system before, further machine analysis. The need for automating wide-ranged filtering applications necessitates the design of generic filtering architectures, together with the development of multidimensional and extensive convolution operators. Consequently, the aim of this thesis is to investigate the problem of automated construction of a generic parallel filtering system. Serving this goal, performance-efficient FPGA implementation architectures are developed to realize parallel one/multi-dimensional filtering algorithms. The proposed generic architectures provide a mechanism for fast FPGA prototyping of high performance computations to obtain efficiently implemented performance indices of area, speed, dynamic power, throughput and computation rates, as a complete package. These parallel filtering algorithms and their automated generic architectures tackle the major bottlenecks and limitations of existing multiprocessor systems in wordlength, input data segmentation, boundary conditions as well as inter-processor communications, in order to support high data throughput real-time applications of low-power architectures using a Xilinx Virtex-6 FPGA board. For one-dimensional raw signal filtering case, mathematical model and architectural development of the generalized parallel 1-D filtering algorithms are presented using the 1-D block filtering method. Five generic architectures are implemented on a Virtex-6 ML605 board, evaluated and compared. A complete set of results on area, speed, power, throughput and computation rates are obtained and discussed as performance indices for the 1-D convolution architectures. A successful application of parallel 1-D cross-correlation is demonstrated. For two dimensional greyscale/colour image processing cases, new parallel 2-D/3-D filtering algorithms are presented and mathematically modelled using input decimation and output image reconstruction by interpolation. Ten generic architectures are implemented on the Virtex-6 ML605 board, evaluated and compared. Key results on area, speed, power, throughput and computation rate are obtained and discussed as performance indices for the 2-D convolution architectures. 2-D image reconfigurable processors are developed and implemented using single, dual and quad MAC FIR units. 3-D Colour image processors are devised to act as 3-D colour filtering engines. A 2-D cross-correlator parallel engine is successfully developed as a parallel 2-D matched filtering algorithm for locating any MRI slice within a MRI data stack library. Twelve 3-D MRI filtering operators are plugged in and adapted to be suitable for biomedical imaging, including 3-D edge operators and 3-D noise smoothing operators. Since three dimensional greyscale/colour volumetric image applications are computationally intensive, a new parallel 3-D/4-D filtering algorithm is presented and mathematically modelled using volumetric data image segmentation by decimation and output reconstruction by interpolation, after simultaneously and independently performing 3-D filtering. Eight generic architectures are developed and implemented on the Virtex-6 board, including 3-D spatial and FFT convolution architectures. Fourteen 3-D MRI filtering operators are plugged and adapted for this particular biomedical imaging application, including 3-D edge operators and 3-D noise smoothing operators. Three successful applications are presented in 4-D colour MRI (fMRI) filtering processors, k-space MRI volume data filter and 3-D cross-correlator.IRAQI Government

    High speed VLSI architectures for DWT in biometric image compression: A study

    Get PDF
    AbstractBiometrics is a field that navigates through a vast database and extracts only the qualifying data to accelerate the processes of biometric authentication/recognition. Image compression is a vital part of the process. Various Very Large Scale Integration (VLSI) architectures have emerged to satisfy the real time requirements of the online processing of the applications. This paper studies various techniques that help in realizing the fast operation of the transform stage of the image compression processes. Various parameters that may involve in optimizations for high speed like computing time, silicon area, memory size etc are considered in the survey

    Concepts for on-board satellite image registration. Volume 3: Impact of VLSI/VHSIC on satellite on-board signal processing

    Get PDF
    Anticipated major advances in integrated circuit technology in the near future are described as well as their impact on satellite onboard signal processing systems. Dramatic improvements in chip density, speed, power consumption, and system reliability are expected from very large scale integration. Improvements are expected from very large scale integration enable more intelligence to be placed on remote sensing platforms in space, meeting the goals of NASA's information adaptive system concept, a major component of the NASA End-to-End Data System program. A forecast of VLSI technological advances is presented, including a description of the Defense Department's very high speed integrated circuit program, a seven-year research and development effort

    Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

    Get PDF
    The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

    Number theoretic techniques applied to algorithms and architectures for digital signal processing

    Get PDF
    Many of the techniques for the computation of a two-dimensional convolution of a small fixed window with a picture are reviewed. It is demonstrated that Winograd's cyclic convolution and Fourier Transform Algorithms, together with Nussbaumer's two-dimensional cyclic convolution algorithms, have a common general form. Many of these algorithms use the theoretical minimum number of general multiplications. A novel implementation of these algorithms is proposed which is based upon one-bit systolic arrays. These systolic arrays are networks of identical cells with each cell sharing a common control and timing function. Each cell is only connected to its nearest neighbours. These are all attractive features for implementation using Very Large Scale Integration (VLSI). The throughput rate is only limited by the time to perform a one-bit full addition. In order to assess the usefulness to these systolic arrays a 'cost function' is developed to compare them with more conventional techniques, such as the Cooley-Tukey radix-2 Fast Fourier Transform (FFT). The cost function shows that these systolic arrays offer a good way of implementing the Discrete Fourier Transform for transforms up to about 30 points in length. The cost function is a general tool and allows comparisons to be made between different implementations of the same algorithm and between dissimilar algorithms. Finally a technique is developed for the derivation of Discrete Cosine Transform (DCT) algorithms from the Winograd Fourier Transform Algorithm. These DCT algorithms may be implemented by modified versions of the systolic arrays proposed earlier, but requiring half the number of cells

    Computer vision algorithms on reconfigurable logic arrays

    Full text link
    corecore