222 research outputs found

    Algorithms and Hardware Co-Design of HEVC Intra Encoders

    Get PDF
    Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

    Towards visualization and searching :a dual-purpose video coding approach

    Get PDF
    In modern video applications, the role of the decoded video is much more than filling a screen for visualization. To offer powerful video-enabled applications, it is increasingly critical not only to visualize the decoded video but also to provide efficient searching capabilities for similar content. Video surveillance and personal communication applications are critical examples of these dual visualization and searching requirements. However, current video coding solutions are strongly biased towards the visualization needs. In this context, the goal of this work is to propose a dual-purpose video coding solution targeting both visualization and searching needs by adopting a hybrid coding framework where the usual pixel-based coding approach is combined with a novel feature-based coding approach. In this novel dual-purpose video coding solution, some frames are coded using a set of keypoint matches, which not only allow decoding for visualization, but also provide the decoder valuable feature-related information, extracted at the encoder from the original frames, instrumental for efficient searching. The proposed solution is based on a flexible joint Lagrangian optimization framework where pixel-based and feature-based processing are combined to find the most appropriate trade-off between the visualization and searching performances. Extensive experimental results for the assessment of the proposed dual-purpose video coding solution under meaningful test conditions are presented. The results show the flexibility of the proposed coding solution to achieve different optimization trade-offs, notably competitive performance regarding the state-of-the-art HEVC standard both in terms of visualization and searching performance.Em modernas aplicações de vídeo, o papel do vídeo decodificado é muito mais que simplesmente preencher uma tela para visualização. Para oferecer aplicações mais poderosas por meio de sinais de vídeo,é cada vez mais crítico não apenas considerar a qualidade do conteúdo objetivando sua visualização, mas também possibilitar meios de realizar busca por conteúdos semelhantes. Requisitos de visualização e de busca são considerados, por exemplo, em modernas aplicações de vídeo vigilância e comunicações pessoais. No entanto, as atuais soluções de codificação de vídeo são fortemente voltadas aos requisitos de visualização. Nesse contexto, o objetivo deste trabalho é propor uma solução de codificação de vídeo de propósito duplo, objetivando tanto requisitos de visualização quanto de busca. Para isso, é proposto um arcabouço de codificação em que a abordagem usual de codificação de pixels é combinada com uma nova abordagem de codificação baseada em features visuais. Nessa solução, alguns quadros são codificados usando um conjunto de pares de keypoints casados, possibilitando não apenas visualização, mas também provendo ao decodificador valiosas informações de features visuais, extraídas no codificador a partir do conteúdo original, que são instrumentais em aplicações de busca. A solução proposta emprega um esquema flexível de otimização Lagrangiana onde o processamento baseado em pixel é combinado com o processamento baseado em features visuais objetivando encontrar um compromisso adequado entre os desempenhos de visualização e de busca. Os resultados experimentais mostram a flexibilidade da solução proposta em alcançar diferentes compromissos de otimização, nomeadamente desempenho competitivo em relação ao padrão HEVC tanto em termos de visualização quanto de busca

    Video Stream Adaptation In Computer Vision Systems

    Get PDF
    Computer Vision (CV) has been deployed recently in a wide range of applications, including surveillance and automotive industries. According to a recent report, the market for CV technologies will grow to $33.3 billion by 2019. Surveillance and automotive industries share over 20% of this market. This dissertation considers the design of real-time CV systems with live video streaming, especially those over wireless and mobile networks. Such systems include video cameras/sensors and monitoring stations. The cameras should adapt their captured videos based on the events and/or available resources and time requirement. The monitoring station receives video streams from all cameras and run CV algorithms for decisions, warnings, control, and/or other actions. Real-time CV systems have constraints in power, computational, and communicational resources. Most video adaptation techniques considered the video distortion as the primary metric. In CV systems, however, the main objective is enhancing the event/object detection/recognition/tracking accuracy. The accuracy can essentially be thought of as the quality perceived by machines, as opposed to the human perceptual quality. High-Efficiency Video Coding (HEVC) is a recent encoding standard that seeks to address the limited communication bandwidth problem as a result of the popularity of High Definition (HD) videos. Unfortunately, HEVC adopts algorithms that greatly slow down the encoding process, and thus results in complications in real-time systems. This dissertation presents a method for adapting live video streams to limited and varying network bandwidth and energy resources. It analyzes and compares the rate-accuracy and rate-energy characteristics of various video streams adaptation techniques in CV systems. We model the video capturing, encoding, and transmission aspects and then provide an overall model of the power consumed by the video cameras and/or sensors. In addition to modeling the power consumption, we model the achieved bitrate of video encoding. We validate and analyze the power consumption models of each phase as well as the aggregate power consumption model through extensive experiments. The analysis includes examining individual parameters separately and examining the impacts of changing more than one parameter at a time. For HEVC, we develop an algorithm that predicts the size of the block without iterating through the exhaustive Rate Distortion Optimization (RDO) method. We demonstrate the effectiveness of the proposed algorithm in comparison with existing algorithms. The proposed algorithm achieves approximately 5 times the encoding speed of the RDO algorithm and 1.42 times the encoding speed of the fastest analyzed algorithm

    Resource-Constrained Low-Complexity Video Coding for Wireless Transmission

    Get PDF
    corecore