607 research outputs found

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    Design and implementation of a general purpose VLSI median filter unit and its applications

    Get PDF
    A VLSI median filter unit has been designed and implemented in 3-ÎĽ m M2 CMOS, using full-custom VLSI design techniques. The unit consists of two single-chip median filters, one extensible and one real-time. The chips are bit-level pipelined systolic structures based on odd/even transposition sorting. The extensible chip is designed for applications requiring variable window sizes and variable word-lengths, whereas the other one is for real-time applications. Various median filtering techniques are easily realized by using the designed chips together with reasonable external hardware

    General purpose VLSI median filter and its applications for image processing

    Get PDF
    A general-purpose median filter configuration consisting of two single-chip median filters is proposed. One of the chips is designed for applications requiring variable word-length and variable window size, whereas the other is for real-time applications. The architectures of the chips are based on odd/even transposition sorting. The chips are implemented in 3-ÎĽm M2CMOS using full-custom VLSI design techniques. The chips together with a reasonable external hardware can be used for the realizations of many median filtering techniques. The VLSI design procedure of the chips and their applications to different median filtering techniques for image processing are presented

    Study on Low-Power Image Processing for Gastrointestinal Endoscopy

    Get PDF

    Design and Implementation of a General-Purpose Median Filter Unit in CMOS VLSI

    Get PDF
    A general-purpose median filter unit configuration is proposed in the form of two single-chip median filters, one extensible and one real-time. The networks of the chips are pipelined and systolic at bit level and based on the odd/even transposition sorting. The chips are implemented in 3-μm standard CMOS by using full-custom VLSI design techniques. The exact median of elements, in a window size w = 9 with arbitrary word length L, can be found by using only one extensible median filter chip. The filter can be extended to arbitrary window size and word lengths by using many chips. For w > 9 with arbitrary L, the number of chips required to find the exact medians is no more than the smallest greater integer of (w/9)2. Simulation results show that the extensible median filter chip can be clocked up to 40 MHz, and generate 30/L megamedians per second. On the other hand, the real-time median filter chip can find the exact running medians of elements in a window of a fixed size w = 9 with L = 8. According to simulations, it can generate up to 50 megamedians per second with a 50-MHz clock. The chips can be used for the realization of various median filtering techniques. In this paper, the algorithms, VLSI implementations, and testing of the chips are presented together with some possible applications. 0018-9200/90/0400-0505$01.00 © 1990 IEE

    Efficient parallel computation on multiprocessors with optical interconnection networks

    Get PDF
    This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks

    Accurate depth from defocus estimation with video-rate implementation

    Get PDF
    The science of measuring depth from images at video rate using „defocus‟ has been investigated. The method required two differently focussed images acquired from a single view point using a single camera. The relative blur between the images was used to determine the in-focus axial points of each pixel and hence depth. The depth estimation algorithm researched by Watanabe and Nayar was employed to recover the depth estimates, but the broadband filters, referred as the Rational filters were designed using a new procedure: the Two Step Polynomial Approach. The filters designed by the new model were largely insensitive to object texture and were shown to model the blur more precisely than the previous method. Experiments with real planar images demonstrated a maximum RMS depth error of 1.18% for the proposed filters, compared to 1.54% for the previous design. The researched software program required five 2D convolutions to be processed in parallel and these convolutions were effectively implemented on a FPGA using a two channel, five stage pipelined architecture, however the precision of the filter coefficients and the variables had to be limited within the processor. The number of multipliers required for each convolution was reduced from 49 to 10 (79.5% reduction) using a Triangular design procedure. Experimental results suggested that the pipelined processor provided depth estimates comparable in accuracy to the full precision Matlab‟s output, and generated depth maps of size 400 x 400 pixels in 13.06msec, that is faster than the video rate. The defocused images (near and far-focused) were optically registered for magnification using Telecentric optics. A frequency domain approach based on phase correlation was employed to measure the radial shifts due to magnification and also to optimally position the external aperture. The telecentric optics ensured pixel to pixel registration between the defocused images was correct and provided more accurate depth estimates

    Efficient design space exploration of embedded microprocessors

    Get PDF

    Efficient Architecture and Implementation of Vector Median Filter in Co-Design Context

    Get PDF
    This work presents an efficient fast parallel architecture of the Vector Median Filter (VMF) using combined hardware/software (HW/SW) implementation. The hardware part of the system is implemented using VHDL language, whereas the software part is developed using C/C++ language. The software part of the embedded system uses the NIOS-II softcore processor and the operating system used is ÎĽClinux. The comparison between the software and HW/SW solutions shows that adding a hardware part in the design attempts to speed up the filtering process compared to the software solution. This efficient embedded system implementation can perform well in several image processing applications

    A modular and scalable architecture for the realization of high-speed programmable rank-order filters using threshold logic

    Get PDF
    We present a new scalable architecture for the realization of fully programmable rank order filters (ROF). Capacitive Threshold Logic (CTL) gates are utilized for the implementation of the multi-input programmable majority (voting) functions required in the architecture. The CTL-based realization of the majority gates used in the ROF architecture allows the filter rank as well as the window size to be user-programmable, using a much smaller silicon area, compared to conventional realizations of digital median filters. The proposed filter architecture is completely modular and scalable, and the circuit complexity grows only linearly with maximum window size (m) and with word length (n). A prototype of the proposed filter circuit has been designed and fabricated using double-polysilicon 0.8 ÎĽm CMOS technology. Detailed post-layout simulations and test results of the ROF prototype circuit indicate that the new architecture can accommodate sampling clock rates of up to 50 MHz, corresponding to an effective data processing rate of 800 Mb/s for a very large filter with window size 63 and word length of 16 bits
    • …
    corecore