3 research outputs found

    Low complexity hardware oriented H.264/AVC motion estimation algorithm and related low power and low cost architecture design

    Get PDF
    制度:新 ; 報告番号:甲2999号 ; 学位の種類:博士(工学) ; 授与年月日:2010/3/15 ; 早大学位記番号:新525

    Algorithms and Hardware Co-Design of HEVC Intra Encoders

    Get PDF
    Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

    Análise do impacto de pel decimation na codificação de vídeos de alta resolução

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014.Ao mesmo tempo em que o número de pixels por quadro tende a aumentar pela iminente adoção de resoluções ultra altas, a subamostragem de pixels, também conhecida por pel decimation, surge como uma opção viável para aumentar a eficiência energética da codificação de vídeo. Este trabalho investiga os impactos em energia e qualidade, quando pel decimation é aplicado ao cálculo da Soma das Diferenças Absolutas (SAD), a qual é a métrica de similaridade mais utilizada durante a etapa de estimação de movimento. Primeiramente, apresenta-se uma análise de qualidade de 15 padrões de subamostragem. Os 10.860 pontos experimentais usados proporcionam evidência estatística de que a razão de amostragem 4:3 proposta apresenta velocidade de codificação duas vezes maior do que a amostragem completa, perdendo apenas 5% em DSSIM e 1% em PSNR. A razão 4:3 apresenta o melhor custo-benefício entre aceleração e redução de qualidade, comparando-se com razões de menor amostragem. Para obter estimativas de área em silício e energia por bloco, cinco arquiteturas para cálculo da SAD foram projetadas e sintetizadas para uma biblioteca standard cell industrial. Dentre elas, uma pode ser configurada para operar com razões de amostragem 1:1, 4:3, 2:1 ou 4:1, ao passo que as demais foram projetadas para operar exclusivamente com cada uma destas razões de amostragem. A arquitetura configurável, operando em amostragem completa, consome 3,54 pJ/bloco (60% menos que a versão não-configurável), podendo ser reduzida até 1,34 pJ/bloco utilizando-se a razão de amostragem 4:1, com redução de 2,8% em PSNR e 14,1% em DSSIM. Finalmente, demonstra-se que a aceleração de codificação de um determinado padrão de subamostragem deve-se à redução conjunta do número de pixels amostradas e do número total de cálculos de SAD. Assim, modelando-se as componentes de energia da codificação de vídeos, demonstra-se que a eficiência energética do processo de codificação como um todo pode ser melhorada além da razão de subamostragem. Utilizando-se uma arquitetura de SAD configurável, a economia de energia pode ser de até 95,11%.Abstract : As the number of pixels per frame tends to increase by the upcoming adoption of ultra high resolutions, pixel subsampling, also known as pel decimation, appears as a viable means to improve the energy efficiency of video coding. This work investigates the impacts on energy and quality when pel decimation is applied to the Sum of Absolute Differences (SAD) calculation, which is the most used similarity metric in motion estimation step of video coding. Firstly, a quality assessment of 15 pel decimation patterns is presented. The 10,680 experimental points used provide statistical evidence that the proposed 4:3 ratio leads to an encoding speedup of more than two times in comparison to full sampling, losing only 5% in DSSIM and 1% in PSNR. Compared with lower sampling ratios, it presents a better trade-off between speedup and quality loss. To obtain estimates for silicon area and energy per block, five SAD architectures were designed and synthesized for an industrial standard cell library. Among those, one can be configured to operate with 1:1, 4:3, 2:1 or 4:1 sampling ratios, whereas the rest are tailored to operate exclusively with each one of these ratios. The configurable architecture consumes 3.54pJ/block operating in full sampling (60% lower than the nonconfigurable). The energy can be further reduced until 1.34pJ/block by using 4:1 ratio, with losses of 2.8% in PSNR and 14.1% in DSSIM. Finally, it is shown that the speedup of a given subsampling pattern is due the reduction of both the number of sampled pixels and the total number of SAD calculations. Therefore, by modeling the video coding energy components, it is shown that the whole video compression energy efficiency can be increased beyond the sampling ratio. By using a configurable SAD architecture operating in 4:1 ratio the energy savings are up to 95:11%
    corecore