25 research outputs found

    Zynq-Based Reconfigurable System for Real-Time Edge Detection of Noisy Video Sequences

    Get PDF
    We implement Zynq-based self-reconfigurable system to perform real-time edge detection of 1080p video sequences. While object edge detection is a fundamental tool in computer vision, noises in the video frames negatively affect edge detection results significantly. Moreover, due to the high computational complexity of 1080p video filtering operations, hardware implementation on reconfigurable hardware fabric is necessary. Here, the proposed embedded system utilizes dynamic reconfiguration capability of Zynq SoC so that partial reconfiguration of different filter bitstreams is performed during run-time according to the detected noise density level in the incoming video frames. Pratt’s Figure of Merit (PFOM) to evaluate the accuracy of edge detection is analyzed for various noise density levels, and we demonstrate that adaptive run-time reconfiguration of the proposed filter bitstreams significantly increases the accuracy of edge detection results while efficiently providing computing power to support real-time processing of 1080p video frames. Performance results on configuration time, CPU usage, and hardware resource utilization are also compared

    A Scalable H.264/Avc Deblocking Filter Architecture Using Dynamic Partial Reconfiguration

    No full text
    This paper presents a scalable H.264/AVC deblocking filter architecture based on FPGA using dynamic partial reconfiguration. This desirable feature of FPGAs makes it possible for different hardware configurations to be implemented during run-time. Architectural scalability to adapt to different users\u27 requirements intelligently is demonstrated through dynamic self-reconfiguration on the reconfigurable hardware fabric. When exploiting the full capability of the proposed design, filtering operations up to four different edges at the same time can be performed resulting in significant reduction of total processing time. The architecture can easily support the required computing capability for different resolutions and frame rates of video sequences. The implemented architecture has been evaluated using Xilinx Virtex-4 ML410 FPGA board. The design can operate at a maximum frequency of 103 MHz. The reconfiguration is done through Internal Configuration Access Port (ICAP) to achieve maximum performance needed by real time applications. ©2010 IEEE

    A Self-Reconfigurable Platform For Scalable Dct Computation Using Compressed Partial Bitstreams And Blockram Prefetching

    No full text
    In this paper, we propose a self-reconfigurable platform which can reconfigure the architecture of discrete cosine transform (DCT) computations during run-time using dynamic partial reconfiguration. The scalable architecture of DCT computations can compute different numbers of DCT coefficients in a zig-zag scan order to adapt to different requirements, such as power consumption, hardware resources, and performance. We propose a configuration manager, which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use the Lempel-Ziv-Storer-Szymanski algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve a 400 MB/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. © 2009 IEEE

    A Self-Reconfigurable Platform for Scalable DCT Computation Using Compressed Partial Bitstreams and BlockRAM Prefetching

    No full text
    In this paper, we propose a self-reconfigurable platform which can reconfigure the architecture of discrete cosine transform (DCT) computations during run-time using dynamic partial reconfiguration. The scalable architecture of DCT computations can compute different numbers of DCT coefficients in a zig-zag scan order to adapt to different requirements, such as power consumption, hardware resources, and performance. We propose a configuration manager, which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use the Lempel-Ziv-Storer-Szymanski algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve a 400 MB/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. © 2009 IEEE

    A Self-Reconfigurable Platform For Scalable Dct Computation Using Compressed Partial Bitstreams And Blockram Prefetching

    No full text
    In this paper, we propose a self-reconfigurable platform which can reconfigure the architecture of DCT computations during run-time using dynamic partial reconfiguration. The scalable architecture of DCT computations can compute different number of DCT coefficients in the zigzag scan order to adapt to different requirements, such as power consumption, hardware resource, and performance. We propose a configuration manager which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during runtime. In addition, we use LZSS algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve 400 MBytes/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. © 2009 IEEE

    A scalable H.264/AVC deblocking filter architecture using dynamic partial reconfiguration

    No full text
    This paper presents a scalable H.264/AVC deblocking filter architecture based on FPGA using dynamic partial reconfiguration. This desirable feature of FPGAs makes it possible for different hardware configurations to be implemented during run-time. Architectural scalability to adapt to different users\u27 requirements intelligently is demonstrated through dynamic self-reconfiguration on the reconfigurable hardware fabric. When exploiting the full capability of the proposed design, filtering operations up to four different edges at the same time can be performed resulting in significant reduction of total processing time. The architecture can easily support the required computing capability for different resolutions and frame rates of video sequences. The implemented architecture has been evaluated using Xilinx Virtex-4 ML410 FPGA board. The design can operate at a maximum frequency of 103 MHz. The reconfiguration is done through Internal Configuration Access Port (ICAP) to achieve maximum performance needed by real time applications. ©2010 IEEE

    Efficient Vlsi Architecture For Video Transcoding

    No full text
    In this paper, we present a unified architecture that can perform Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT), DCT domain motion estimation and compensation (DCT-ME/MC). Our proposed architecture is a Wavefront Array-based Processor with a highly modular structure consisting of 8 × 8 Processing Elements (PEs). By utilizing statistical properties and arithmetic operations, it can be used as a high performance hardware accelerator for video transcoding applications. We show how different core algorithms can be mapped onto the same hardware fabric and can be executed through the pre-defined PEs. In addition to the simplified design process of the proposed architecture and savings of the hardware resources, we also demonstrate that high throughput rate can be achieved for IDCT and DCT-MC by fully utilizing the sparseness property of DCT coefficient matrix. © 2009 IEEE

    Reconfigurable Architecture For Zqdct Using Computational Complexity Prediction And Bitstream Relocation

    No full text
    Due to the high computational complexity of discrete cosine transform (DCT) computation, prediction of zero quantized DCT (ZQDCT) coefficients has been extensively studied to reduce the computational complexity of DCT computation. In this letter, we propose a reconfigurable architecture to support ZQDCT computation. Twelve different modes of DCT computations including zonal coding, multiblock processing, and parallel-sequential stage mode can be performed using proposed architecture. We develop a hybrid model-based quality priority algorithm to reduce power consumption, required hardware resources, and computation time with a small quality degradation. © 2010 IEEE

    A Bit-Rate Aware Scalable H.264/Avc Deblocking Filter Using Dynamic Partial Reconfiguration

    No full text
    In H.264/AVC, a deblocking filter improves visual quality by reducing the presence of blocking artifacts in decoded video frames. The deblocking filter accounts for one third of the computational complexity of the decoder. This paper exploits the scalability on the hardware and the algorithmic level to synergize the performance and to reduce the computational complexity. First, we propose a modular deblocking filter architecture which can be scaled to adapt to the required computing capability for various bit-rates, resolutions, and frame rate of video sequences. The scalable architecture is based on FPGA using dynamic partial reconfiguration. This desirable feature of FPGAs makes it possible for different hardware configurations to be implemented during run-time. The proposed design can be scaled to filter up to four different edges simultaneously, resulting in significant reduction of total processing time. Secondly, our experiments show that significant reduction in computational complexity can be achieved by the increased presence of skipped macroblocks at lower bitrates, thus, avoiding redundant filtering operations. The implemented architecture is evaluated using the Xilinx Virtex-4 ML410 FPGA board. The design operates at a maximum frequency of 103 MHz. The reconfiguration is done through Internal Configuration Access Port (ICAP) to achieve maximum performance needed by real time applications. © Springer Science+Business Media, LLC 2011

    Performance Evaluation Of Fpga-Based Hardware Accelerator: A Case Study

    No full text
    FPGA has been used as hardware accelerators for many scientific applications in recent years. This paper investigates performance of FPGA hardware accelerator with cell unit capable of floating point operation through a case study of Dirichlet Boundary Problem (DBP). In this paper, we concentrate on the accelerator performance with real-time results updated in PC memory. FPGA architecture for the DBP application is designed and implemented on FPGA computing card with a Xilinx XC4VLX100 chip. A performance model is established for the FPGA implementation based on communication time for data sharing between host PC and FPGA and execution time within FPGA accelerator. Experiment environments and hardware resource utilization are discussed. Finally, the model is analyzed and verified to find the optimum performance
    corecore