11 research outputs found

    Reconfigurable Architecture For H.264/avc Variable Block Size Motion Estimation Based On Motion Activity And Adaptive Search Range

    Get PDF
    Motion Estimation (ME) technique plays a key role in the video coding systems to achieve high compression ratios by removing temporal redundancies among video frames. Especially in the newest H.264/AVC video coding standard, ME engine demands large amount of computational capabilities due to its support for wide range of different block sizes for a given macroblock in order to increase accuracy in finding best matching block in the previous frames. We propose scalable architecture for H.264/AVC Variable Block Size (VBS) Motion Estimation with adaptive computing capability to support various search ranges, input video resolutions, and frame rates. Hardware architecture of the proposed ME consists of scalable Sum of Absolute Difference (SAD) arrays which can perform Full Search Block Matching Algorithm (FSBMA) for smaller 4x4 blocks. It is also shown that by predicting motion activity and adaptively adjusting the Search Range (SR) on the reconfigurable hardware platform, the computational cost of ME required for inter-frame encoding in H.264/AVC video coding standard can be reduced significantly. Dynamic Partial Reconfiguration is a unique feature of Field Programmable Gate Arrays (FPGAs) that makes best use of hardware resources and power by allowing adaptive algorithm to be implemented during run-time. We exploit this feature of FPGA to implement the proposed reconfigurable architecture of ME and maximize the architectural benefits through prediction of motion activities in the video sequences ,adaptation of SR during run-time, and fractional ME refinement. The implemented ME architecture can support real time applications at a maximum frequency of 90MHz with multiple reconfigurable regions. iv When compared to reconfiguration of complete design, partial reconfiguration process results in smaller bitstream size which allows FPGA to implement different configurations at higher speed. The proposed architecture has modular structure, regular data flow, and efficient memory organization with lower memory accesses. By increasing the number of active partial reconfigurable modules from one to four, there is a 4 fold increase in data re-use. Also, by introducing adaptive SR reduction algorithm at frame level, the computational load of ME is reduced significantly with only small degradation in PSNR (≤0.1dB)

    VLSI architecture design approaches for real-time video processing

    Get PDF
    This paper discusses the programmable and dedicated approaches for real-time video processing applications. Various VLSI architecture including the design examples of both approaches are reviewed. Finally, discussions of several practical designs in real-time video processing applications are then considered in VLSI architectures to provide significant guidelines to VLSI designers for any further real-time video processing design works

    Análise do impacto de pel decimation na codificação de vídeos de alta resolução

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014.Ao mesmo tempo em que o número de pixels por quadro tende a aumentar pela iminente adoção de resoluções ultra altas, a subamostragem de pixels, também conhecida por pel decimation, surge como uma opção viável para aumentar a eficiência energética da codificação de vídeo. Este trabalho investiga os impactos em energia e qualidade, quando pel decimation é aplicado ao cálculo da Soma das Diferenças Absolutas (SAD), a qual é a métrica de similaridade mais utilizada durante a etapa de estimação de movimento. Primeiramente, apresenta-se uma análise de qualidade de 15 padrões de subamostragem. Os 10.860 pontos experimentais usados proporcionam evidência estatística de que a razão de amostragem 4:3 proposta apresenta velocidade de codificação duas vezes maior do que a amostragem completa, perdendo apenas 5% em DSSIM e 1% em PSNR. A razão 4:3 apresenta o melhor custo-benefício entre aceleração e redução de qualidade, comparando-se com razões de menor amostragem. Para obter estimativas de área em silício e energia por bloco, cinco arquiteturas para cálculo da SAD foram projetadas e sintetizadas para uma biblioteca standard cell industrial. Dentre elas, uma pode ser configurada para operar com razões de amostragem 1:1, 4:3, 2:1 ou 4:1, ao passo que as demais foram projetadas para operar exclusivamente com cada uma destas razões de amostragem. A arquitetura configurável, operando em amostragem completa, consome 3,54 pJ/bloco (60% menos que a versão não-configurável), podendo ser reduzida até 1,34 pJ/bloco utilizando-se a razão de amostragem 4:1, com redução de 2,8% em PSNR e 14,1% em DSSIM. Finalmente, demonstra-se que a aceleração de codificação de um determinado padrão de subamostragem deve-se à redução conjunta do número de pixels amostradas e do número total de cálculos de SAD. Assim, modelando-se as componentes de energia da codificação de vídeos, demonstra-se que a eficiência energética do processo de codificação como um todo pode ser melhorada além da razão de subamostragem. Utilizando-se uma arquitetura de SAD configurável, a economia de energia pode ser de até 95,11%.Abstract : As the number of pixels per frame tends to increase by the upcoming adoption of ultra high resolutions, pixel subsampling, also known as pel decimation, appears as a viable means to improve the energy efficiency of video coding. This work investigates the impacts on energy and quality when pel decimation is applied to the Sum of Absolute Differences (SAD) calculation, which is the most used similarity metric in motion estimation step of video coding. Firstly, a quality assessment of 15 pel decimation patterns is presented. The 10,680 experimental points used provide statistical evidence that the proposed 4:3 ratio leads to an encoding speedup of more than two times in comparison to full sampling, losing only 5% in DSSIM and 1% in PSNR. Compared with lower sampling ratios, it presents a better trade-off between speedup and quality loss. To obtain estimates for silicon area and energy per block, five SAD architectures were designed and synthesized for an industrial standard cell library. Among those, one can be configured to operate with 1:1, 4:3, 2:1 or 4:1 sampling ratios, whereas the rest are tailored to operate exclusively with each one of these ratios. The configurable architecture consumes 3.54pJ/block operating in full sampling (60% lower than the nonconfigurable). The energy can be further reduced until 1.34pJ/block by using 4:1 ratio, with losses of 2.8% in PSNR and 14.1% in DSSIM. Finally, it is shown that the speedup of a given subsampling pattern is due the reduction of both the number of sampled pixels and the total number of SAD calculations. Therefore, by modeling the video coding energy components, it is shown that the whole video compression energy efficiency can be increased beyond the sampling ratio. By using a configurable SAD architecture operating in 4:1 ratio the energy savings are up to 95:11%

    High performance hardware architectures for one bit transform based motion estimation

    Get PDF
    Motion Estimation (ME) is the most computationally intensive and most power consuming part of video compression and video enhancement systems. ME is used in video compression standards such as MPEG4, H.264 and it is used in video enhancement algorithms such as frame rate conversion and de-interlacing. One bit transform (1BT) based ME algorithms have low computational complexity. Therefore, in this thesis, we propose high performance hardware architectures for 1BT based fixed block size (FBS) single reference frame (SRF) ME, variable block size (VBS) SRF ME, and multiple reference frame (MRF) ME. Constraint One Bit Transform (C-1BT) ME algorithm improves the ME performance of 1BT ME, and the early terminated C-1BT ME algorithm reduces the computational complexity of C-1BT ME. Therefore, in this thesis, we also propose a high performance early terminated C-1BT ME hardware architecture. The proposed FBS SRF ME hardware architectures perform full search ME for 4 Macroblocks in parallel and they are faster than the 1BT based ME hardware reported in the literature. In addition, they use less on-chip memory than the previous 1BT based ME hardware by using a novel data reuse scheme and memory organization. The proposed VBS SRF ME and MRF ME hardware architectures are the first 1BT based VBS ME and MRF ME hardware architectures in the literature. The proposed MRF ME hardware is designed as reconfigurable in order to statically configure the number and selection of reference frames based on the application requirements. The proposed early terminated C-1BT ME hardware architecture is the first early terminated C-1BT ME hardware architecture in the literature. All of the proposed ME hardware architectures are implemented in Verilog HDL and mapped to Xilinx FPGAs. All FPGA implementations are verified with post place & route simulations

    Algoritmo de estimação de movimento e sua arquitetura de hardware para HEVC

    Get PDF
    Doutoramento em Engenharia EletrotécnicaVideo coding has been used in applications like video surveillance, video conferencing, video streaming, video broadcasting and video storage. In a typical video coding standard, many algorithms are combined to compress a video. However, one of those algorithms, the motion estimation is the most complex task. Hence, it is necessary to implement this task in real time by using appropriate VLSI architectures. This thesis proposes a new fast motion estimation algorithm and its implementation in real time. The results show that the proposed algorithm and its motion estimation hardware architecture out performs the state of the art. The proposed architecture operates at a maximum operating frequency of 241.6 MHz and is able to process 1080p@60Hz with all possible variables block sizes specified in HEVC standard as well as with motion vector search range of up to ±64 pixels.A codificação de vídeo tem sido usada em aplicações tais como, vídeovigilância, vídeo-conferência, video streaming e armazenamento de vídeo. Numa norma de codificação de vídeo, diversos algoritmos são combinados para comprimir o vídeo. Contudo, um desses algoritmos, a estimação de movimento é a tarefa mais complexa. Por isso, é necessário implementar esta tarefa em tempo real usando arquiteturas de hardware apropriadas. Esta tese propõe um algoritmo de estimação de movimento rápido bem como a sua implementação em tempo real. Os resultados mostram que o algoritmo e a arquitetura de hardware propostos têm melhor desempenho que os existentes. A arquitetura proposta opera a uma frequência máxima de 241.6 MHz e é capaz de processar imagens de resolução 1080p@60Hz, com todos os tamanhos de blocos especificados na norma HEVC, bem como um domínio de pesquisa de vetores de movimento até ±64 pixels

    Low power motion estimation hardware designs

    Get PDF
    Motion Estimation (ME) is the most computationally intensive and most power consuming part of video compression and video enhancement systems. ME is used in video compression standards such as H.264/MPEG-4 and it is used in video enhancement algorithms such as frame rate conversion and de-interlacing. Half pixel (HP) ME increases the video coding efficiency at the expense of increased computational complexity. Therefore, in this thesis, we designed and implemented efficient integer pixel (IP) ME hardware implementing full search ME algorithm, and we proposed techniques for reducing the dynamic power consumptions of IP and HP ME hardware. The proposed ME hardware architectures are implemented in Verilog HDL and mapped to Xilinx FPGAs. The FPGA implementations are verified with post place & route simulations. We proposed comparison prediction (CP) technique for reducing the power consumption of IP block matching (BM) ME hardware. CP technique reduces the power consumption of absolute difference operations performed by IP BM ME hardware. The proposed technique can easily be used in all IP BM ME hardware. It reduced the power consumption of a fixed block size IP BM ME hardware implementing full search algorithm by 9.3% with 0.04% PSNR loss on a Xilinx XC2VP30-7 FPGA. We also proposed two techniques for reducing the power consumption of H.264 HP ME hardware. The first technique is vector dependent sum of absolute difference (SAD) reuse which reduces the amount of computations for variable block size H.264 HP ME with no PSNR loss. The second technique is a novel modification of the HP search algorithm which adaptively tries to use the IP motion vector trajectories to reduce HP search to 1-D. This technique causes an average PSNR loss of 0.36 dB. The two techniques reduced the power consumption of a variable block size H.264 HP ME hardware by 6% and 31% on a Xilinx Virtex 6 FPGA respectively

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing

    Get PDF
    Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat

    A framework for surface metrology on Cultural Heritage objects based on scanning conoscopic holography

    Get PDF
    L'applicazione della metrologia di superficie e dell'analisi dimensionale allo studio dei beni culturali può rivelare importanti informazioni sull'oggetto e favorire l'integrazione di molteplici tecniche diagnostiche. Tuttavia, l'applicazione di queste discipline ai Beni Culturali richiede particolari requisiti e attenzioni. In questa tesi, presento i risultati dell'implementazione di diversi sistemi di misurazione della superficie basati sul principio della conoscopia olografica. I senori conoscopici sono strumenti capaci di misurare distanze con precisione micrometrica a scale diverse, accoppiati a slitte micrometriche possono essere utilizzati per acquisire scansioni areali dell'oggetto in esame. Per facilitare la loro applicazione alle opere d'arte ho sviluppato un extit{framework} per applicare la metrologia di superficie ai beni culturali. Il framework copre diversi aspetti del processo di analisi ed utilizzo dei dati e comprende la creazione di raccolte di campioni, le strategie per la scansione dell'oggetto, l'archiviazione e l'analisi dei dati ed eventualmente l'incertezza legata alla misura. Il extit{framework} mira a rendere più accessibile l'implementazione della metrologia di superficie e dei sistemi di scansione dell'analisi dimensionale per l'analisi dei beni culturali. I risultati raccolti su una varietà di materiali artistici (metalli, dipinti su tavola, tela, carta, pergamena e dipinti murali) mostrano come questi sistemi possano essere utilizzati per monitorare gli effetti delle procedure di pulitura, la stabilità dimensionale delle opere d'arte ed il loro invecchiamento.The application of surface metrology and dimensional analysis to the study of artworks can reveal important information on the object and aid the integration of multiple techniques. However, the application of these disciplines to Cultural Heritage objects necessitates particular care and requirements. In this dissertation, I present the results of the implementation of different systems, based on Conoscopic Holography range finders, for measuring the surface. Conoscopic holography range finders are viable instruments for measuring distances with micrometer accuracy at different scales, coupled with micrometric stages they can be used for acquiring areal scans of the object under investigation. To ease their application to artworks I built a framework for applying surface metrology to Cultural Heritage objects. The framework covers different aspects of the research workflow comprising the creation of samples collections, the strategies for scanning the object, the storing and the analysis of the data and eventually the uncertainty linked to the measurement. This framework aims to make more accessible the implementation of surface metrology and dimensional analysis scanning systems tailored to the analysis of Cultural Heritage objects. The results collected on a variety of artworks materials (metals, panels painting, canvas, paper, parchment and mural paintings) show how these systems can be used for monitoring the effects of cleaning procedures, the dimensional stability of the artworks and their ageing
    corecore