11 research outputs found

    Hardware acceleration architectures for MPEG-Based mobile video platforms: a brief overview

    Get PDF
    This paper presents a brief overview of past and current hardware acceleration (HwA) approaches that have been proposed for the most computationally intensive compression tools of the MPEG-4 standard. These approaches are classified based on their historical evolution and architectural approach. An analysis of both evolutionary and functional classifications is carried out in order to speculate on the possible trends of the HwA architectures to be employed in mobile video platforms

    BUILT IN SELF TEST FOR SAD MODULE IN MOTION ARRAY DETECTION

    Get PDF
    A novel method develops a built-in self-detection and correction (BISDC) architecture for motion estimation computing arrays(MECAs).Based on the error detection & correction concepts of biresidue codes, any single error in each processing element in an MECA can be effectively detected and corrected online using the proposed BISD and built-in selfcorrection circuits. Performance analysis and evaluation demonstrate that the proposed BISDC architecture performs well in error detection and correction with minor area i.e single error bit detection and correction . An advanced model has been proposed for multi bit detection using efficient adder implementation .a comparision is performed between efficient adder and processing element resultant

    Extended Successive Elimination Algorithm for Fast Optimal Block Matching Motion Estimation

    Get PDF
    In this paper, we propose an extended successive elimination algorithm (SEA) for fast optimal block matching motion estimation (ME). By reinterpreting the typical sum of absolute differences measure, we can obtain additional decision criteria whether to discard the impossible candidate motion vectors. Experimental results show that the proposed algorithm reduces the computational complexity up to 19.85% on average comparing with the multilevel successive elimination algorithm. The proposed algorithm can be used with other SEA to improve the ME performance

    Image Processing using Approximate Data-path Units

    Get PDF
    abstract: In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders and multipliers presented in [23] and [24]. First, we show how choice of algorithm and parallel adder design can be used to implement 2D Discrete Cosine Transform (DCT) algorithm with good performance but low area. Our implementation of the 2D DCT has comparable PSNR performance with respect to the algorithm presented in [23] with ~35-50% reduction in area. Next, we use the approximate 2x2 multiplier presented in [24] to implement parallel approximate multipliers. We demonstrate that if some of the 2x2 multipliers in the design of the parallel multiplier are accurate, the accuracy of the multiplier improves significantly, especially when two large numbers are multiplied. We choose Gaussian FIR Filter and Fast Fourier Transform (FFT) algorithms to illustrate the efficacy of our proposed approximate multiplier. We show that application of the proposed approximate multiplier improves the PSNR performance of 32x32 FFT implementation by 4.7 dB compared to the implementation using the approximate multiplier described in [24]. We also implement a state-of-the-art image enlargement algorithm, namely Segment Adaptive Gradient Angle (SAGA) [29], in hardware. The algorithm is mapped to pipelined hardware blocks and we synthesized the design using 90 nm technology. We show that a 64x64 image can be processed in 496.48 µs when clocked at 100 MHz. The average PSNR performance of our implementation using accurate parallel adders and multipliers is 31.33 dB and that using approximate parallel adders and multipliers is 30.86 dB, when evaluated against the original image. The PSNR performance of both designs is comparable to the performance of the double precision floating point MATLAB implementation of the algorithm.Dissertation/ThesisM.S. Computer Science 201

    Dynamic power consumption estimation and reduction for full search motion estimation hardware

    Get PDF
    Motion Estimation (ME) is the most computationally intensive and most power consuming part of video compression and video enhancement systems. ME is used in video compression standards such as MPEG4, H.264 and it is used in video enhancement algorithms such as frame rate conversion and de-interlacing. Since portable devices operate with battery, it is important to reduce power consumption so that the battery life can be increased. In addition, consuming excessive power degrades the performance of integrated circuits, increases packaging and cooling costs, reduces the reliability and may cause device failures. Therefore, estimating and reducing power consumption of motion estimation hardware is very important. In this thesis, we propose a novel dynamic power estimation technique for full search ME hardware. We estimated the power consumption of two full search ME hardware implementations on a Xilinx Virtex II FPGA using several existing high and low level dynamic power estimation techniques and our technique. Gate-level timing simulation based power estimation of full search ME hardware for an average frame using Xilinx XPower tool takes 6 - 18 hours in a state-of-the-art PC, whereas estimating the power consumption of the same ME hardware for the same frame takes a few seconds using our technique. The average and maximum difference between the power consumptions estimated by our technique and the power consumptions estimated by XPower tool for four different video sequences are %3 and %13 respectively. We also propose a novel dynamic power reduction technique for ME hardware. We quantified the impact of glitch reduction, clock gating and the proposed technique on the power consumption of two full search ME hardware implementations on a Xilinx Virtex II FPGA using Xilinx XPower tool. Glitch reduction and clock gating together achieved an average of 21% dynamic power reduction. The proposed technique achieved an average of 23% dynamic power reduction with an average of 0.4dB PSNR loss. The proposed technique achieves better power reduction than pixel truncation technique with a similar PSNR loss. We also showed that our dynamic power estimation technique can be used for developing novel dynamic power reduction techniques. To do this, we used our technique to estimate the dynamic power consumption of the ME hardware when two different dynamic power reduction techniques are used. The results show that if a power reduction technique only changes the input data order of the ME hardware, the proposed dynamic power estimation technique can be used to quickly estimate the effectiveness of that technique. However, if the architecture of the ME hardware is modified, the accuracy of the power consumption estimations decrease. Therefore the proposed power estimation technique should be improved for this case

    Reconfigurable Architecture For H.264/avc Variable Block Size Motion Estimation Based On Motion Activity And Adaptive Search Range

    Get PDF
    Motion Estimation (ME) technique plays a key role in the video coding systems to achieve high compression ratios by removing temporal redundancies among video frames. Especially in the newest H.264/AVC video coding standard, ME engine demands large amount of computational capabilities due to its support for wide range of different block sizes for a given macroblock in order to increase accuracy in finding best matching block in the previous frames. We propose scalable architecture for H.264/AVC Variable Block Size (VBS) Motion Estimation with adaptive computing capability to support various search ranges, input video resolutions, and frame rates. Hardware architecture of the proposed ME consists of scalable Sum of Absolute Difference (SAD) arrays which can perform Full Search Block Matching Algorithm (FSBMA) for smaller 4x4 blocks. It is also shown that by predicting motion activity and adaptively adjusting the Search Range (SR) on the reconfigurable hardware platform, the computational cost of ME required for inter-frame encoding in H.264/AVC video coding standard can be reduced significantly. Dynamic Partial Reconfiguration is a unique feature of Field Programmable Gate Arrays (FPGAs) that makes best use of hardware resources and power by allowing adaptive algorithm to be implemented during run-time. We exploit this feature of FPGA to implement the proposed reconfigurable architecture of ME and maximize the architectural benefits through prediction of motion activities in the video sequences ,adaptation of SR during run-time, and fractional ME refinement. The implemented ME architecture can support real time applications at a maximum frequency of 90MHz with multiple reconfigurable regions. iv When compared to reconfiguration of complete design, partial reconfiguration process results in smaller bitstream size which allows FPGA to implement different configurations at higher speed. The proposed architecture has modular structure, regular data flow, and efficient memory organization with lower memory accesses. By increasing the number of active partial reconfigurable modules from one to four, there is a 4 fold increase in data re-use. Also, by introducing adaptive SR reduction algorithm at frame level, the computational load of ME is reduced significantly with only small degradation in PSNR (≤0.1dB)

    Low-power VLSI design for motion estimation using adaptive pixel truncation

    No full text
    Power consumption is very. critical for portable video applications such as portable videophone and digital camcorder. Motion estimation (ME) in the video encoder requires a huge amount of computation, and hence consumes the largest portion of power. In this paper, we propose a novel method of reducing power consumption of the ME by adaptively changing the pixel resolution during the computation of the motion vector. The pixel resolution is changed by masking or truncating the least significant hits of the pixel data, which is governed by the bit-rate control mechanism. Experimental results show that on average more than 4 bits can be truncated without significantly affecting the picture quality. This results in more than 60\% reduction in power consumption

    Fast pattern matching in Walsh-Hadamard domain and its application in video processing.

    Get PDF
    Li Ngai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references.Abstracts in English and Chinese.Chapter Chapter 1. --- Introduction --- p.1-1Chapter 1.1. --- A Brief Review on Pattern Matching --- p.1-1Chapter 1.2. --- Objective of the Research Work --- p.1-5Chapter 1.3. --- Organization of the Thesis --- p.1-6Chapter 1.4. --- Notes on Publications --- p.1-7Chapter Chapter 2. --- Background Information --- p.2-1Chapter 2.1. --- Introduction --- p.2-1Chapter 2.2. --- Review of Block Based Pattern Matching --- p.2-3Chapter 2.2.1 --- Gradient Descent Strategy --- p.2-3Chapter 2.2.2 --- Simplified Matching Operations --- p.2-10Chapter 2.2.3 --- Fast Full-Search Methods --- p.2-14Chapter 2.2.4 --- Transform-domain Manipulations --- p.2-19Chapter Chapter 3. --- Statistical Rejection Threshold for Pattern Matching --- p.3-1Chapter 3.1. --- Introduction --- p.3-1Chapter 3.2. --- Walsh Hadamard Transform --- p.3-3Chapter 3.3. --- Coarse-to-fine Pattern Matching in Walsh Hadamard Domain --- p.3-4Chapter 3.3.1. --- Bounding Euclidean Distance in Walsh Hadamard Domain --- p.3-5Chapter 3.3.2. --- Fast Projection Scheme --- p.3-9Chapter 3.3.3. --- Using the Projection Scheme for Pattern Matching --- p.3-17Chapter 3.4. --- Statistical Rejection Threshold --- p.3-18Chapter 3.5. --- Experimental Results --- p.3-22Chapter 3.6. --- Conclusions --- p.3-29Chapter 3.7. --- Notes on Publication --- p.3-30Chapter Chapter 4. --- Fast Walsh Search --- p.4-1Chapter 4.1. --- Introduction --- p.4-1Chapter 4.2. --- Approximating Sum-of-absolute Difference Using PS AD --- p.4-3Chapter 4.3. --- Two-level Threshold Scheme --- p.4-6Chapter 4.4. --- Block Matching Using SADDCC --- p.4-10Chapter 4.5. --- Optimization of Threshold and Number of Coefficients in PSAD --- p.4-15Chapter 4.6. --- Candidate Elimination by the Mean of PSAD --- p.4-23Chapter 4.7. --- Computation Requirement --- p.4-28Chapter 4.8. --- Experimental Results --- p.4-32Chapter 4.9. --- Conclusions --- p.4-45Chapter 4.10. --- Notes on Publications --- p.4-46Chapter Chapter 5. --- Conclusions & Future Works --- p.5-1Chapter 5.1. --- Contributions and Conclusions --- p.5-1Chapter 5.1.1. --- Statistical Rejection Threshold for Pattern Matching --- p.5-2Chapter 5.1.2. --- Fast Walsh Search --- p.5-3Chapter 5.2. --- Future Works --- p.5-4References --- p.

    Implementing video compression algorithms on reconfigurable devices

    Get PDF
    The increasing density offered by Field Programmable Gate Arrays(FPGA), coupled with their short design cycle, has made them a popular choice for implementing a wide range of algorithms and complete systems. In this thesis the implementation of video compression algorithms on FPGAs is studied. Two areas are specifically focused on; the integration of a video encoder into a complete system and the power consumption of FPGA based video encoders. Two FPGA based video compression systems are described, one which targets surveillance applications and one which targets video conferencing applications. The FPGA video surveillance system makes use of a novel memory format to improve the efficiency with which input video sequences can be loaded over the system bus. The power consumption of a FPGA video encoder is analyzed. The results indicating that the motion estimation encoder stage requires the most power consumption. An algorithm, which reuses the intra prediction results generated during the encoding process, is then proposed to reduce the power consumed on an FPGA video encoder’s external memory bus. Finally, the power reduction algorithm is implemented within an FPGA video encoder. Results are given showing that, in addition to reducing power on the external memory bus, the algorithm also reduces power in the motion estimation stage of a FPGA based video encoder

    Análise do impacto de pel decimation na codificação de vídeos de alta resolução

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014.Ao mesmo tempo em que o número de pixels por quadro tende a aumentar pela iminente adoção de resoluções ultra altas, a subamostragem de pixels, também conhecida por pel decimation, surge como uma opção viável para aumentar a eficiência energética da codificação de vídeo. Este trabalho investiga os impactos em energia e qualidade, quando pel decimation é aplicado ao cálculo da Soma das Diferenças Absolutas (SAD), a qual é a métrica de similaridade mais utilizada durante a etapa de estimação de movimento. Primeiramente, apresenta-se uma análise de qualidade de 15 padrões de subamostragem. Os 10.860 pontos experimentais usados proporcionam evidência estatística de que a razão de amostragem 4:3 proposta apresenta velocidade de codificação duas vezes maior do que a amostragem completa, perdendo apenas 5% em DSSIM e 1% em PSNR. A razão 4:3 apresenta o melhor custo-benefício entre aceleração e redução de qualidade, comparando-se com razões de menor amostragem. Para obter estimativas de área em silício e energia por bloco, cinco arquiteturas para cálculo da SAD foram projetadas e sintetizadas para uma biblioteca standard cell industrial. Dentre elas, uma pode ser configurada para operar com razões de amostragem 1:1, 4:3, 2:1 ou 4:1, ao passo que as demais foram projetadas para operar exclusivamente com cada uma destas razões de amostragem. A arquitetura configurável, operando em amostragem completa, consome 3,54 pJ/bloco (60% menos que a versão não-configurável), podendo ser reduzida até 1,34 pJ/bloco utilizando-se a razão de amostragem 4:1, com redução de 2,8% em PSNR e 14,1% em DSSIM. Finalmente, demonstra-se que a aceleração de codificação de um determinado padrão de subamostragem deve-se à redução conjunta do número de pixels amostradas e do número total de cálculos de SAD. Assim, modelando-se as componentes de energia da codificação de vídeos, demonstra-se que a eficiência energética do processo de codificação como um todo pode ser melhorada além da razão de subamostragem. Utilizando-se uma arquitetura de SAD configurável, a economia de energia pode ser de até 95,11%.Abstract : As the number of pixels per frame tends to increase by the upcoming adoption of ultra high resolutions, pixel subsampling, also known as pel decimation, appears as a viable means to improve the energy efficiency of video coding. This work investigates the impacts on energy and quality when pel decimation is applied to the Sum of Absolute Differences (SAD) calculation, which is the most used similarity metric in motion estimation step of video coding. Firstly, a quality assessment of 15 pel decimation patterns is presented. The 10,680 experimental points used provide statistical evidence that the proposed 4:3 ratio leads to an encoding speedup of more than two times in comparison to full sampling, losing only 5% in DSSIM and 1% in PSNR. Compared with lower sampling ratios, it presents a better trade-off between speedup and quality loss. To obtain estimates for silicon area and energy per block, five SAD architectures were designed and synthesized for an industrial standard cell library. Among those, one can be configured to operate with 1:1, 4:3, 2:1 or 4:1 sampling ratios, whereas the rest are tailored to operate exclusively with each one of these ratios. The configurable architecture consumes 3.54pJ/block operating in full sampling (60% lower than the nonconfigurable). The energy can be further reduced until 1.34pJ/block by using 4:1 ratio, with losses of 2.8% in PSNR and 14.1% in DSSIM. Finally, it is shown that the speedup of a given subsampling pattern is due the reduction of both the number of sampled pixels and the total number of SAD calculations. Therefore, by modeling the video coding energy components, it is shown that the whole video compression energy efficiency can be increased beyond the sampling ratio. By using a configurable SAD architecture operating in 4:1 ratio the energy savings are up to 95:11%
    corecore