40 research outputs found

    Architecture exploration and VLSI design of multi-symbol arithmetic encoders for the AV1 coding format

    Get PDF
    To reduce the impact of videos in the global Internet capacity, companies rely upon video coding standards and formats, also known as codecs, to reduce the overall sizes of videos before transmitting or storing them. AV1, which arises as a promising state-of-the-art and royalties-free video coding format first released in 2018, aims to reduce the sizes of videos by applying novel techniques to boost AV1’s compression results. Amongst its core components, AV1 comprises an entropy coding block, which is re sponsible for losslessly encoding symbols generated by other core modules (e.g., intra prediction, motion compensation, etc.). The arithmetic encoder, which is part of the en tropy encoder, is a bottleneck due to its difficulty to work with parallelizations, and relies upon two primary operations: CDF Operation and Boolean Operation, where CDF stands for Cumulative Distribution Function. This thesis proposes a baseline VLSI design, which was named AE-AV1, as the first ever AV1 arithmetic encoder found in the literature, and capable of reaching ultra-high performance (i.e., processing of 8K@120fps videos in real-time). Moreover, additional versions of this architecture were proposed as AE-AV1-LP and AE-AV1-MB, which are, respectively, a low-power version and a novel design applying a Multi-Boolean technique also introduced in this thesis. All the herein proposed designs were synthesized using the Cadence™ RC tool and the ST 65nm PDK. As the AV1 is well-known for being an open-source alternative in the video coding industry, the AE-AV1 architecture was also synthesized from Verilog to GDSII layout using a fully open-source ASIC flow (i.e., OpenROAD tool, OpenLane flow, and ASAP7 and SkyWater 130nm PDKs). The architectures were capable of reaching frequencies of 581 MHz, 563 MHz and 590 MHz for the versions AE-AV1, AE-AV1-LP and AE-AV1-MB 2-bool, respectively. With regard to throughput rates, all herein introduced architectures are capable of reaching 8K@120fps real-time video processing with rates of 1.032 Gbits/sec, 0.999 Gbits/sec and 1.117 Gbits/sec respectively.Para reduzir o impacto dos vídeos na capacidade global de Internet, as empresas contam com padrões e formatos de codificação de vídeo, também conhecidos como codecs, para reduzir os tamanhos dos vídeos antes de transmiti-los ou armazená-los. O AV1, que surge como um promissor formato de codificação de vídeo de última geração e livre de royal ties lançado pela primeira vez em 2018, visa reduzir os tamanhos dos vídeos aplicando técnicas inovadoras e aprimoradas para aumentar os resultados de compactação do AV1. Entre seus componentes principais, o AV1 compreende um bloco de codificação de en tropia, que é responsável pela codificação sem perdas de símbolos gerados por outros módulos (por exemplo, predição intra-quadro, compensação de movimento, etc.). O co dificador aritmético, que faz parte do codificador de entropia, é um gargalo devido à sua dificuldade em trabalhar com paralelizações e conta com duas operações principais: CDF Operation e Boolean Operation, onde CDF representa Cumulative Distribution Function. Esta dissertação propõe um projeto VLSI digital, nomeado AE-AV1, como o primeiro codificador aritmético AV1 encontrado na literatura e capaz de atingir desempenho ultra high (ou seja, processamento de vídeos 8K@120fps em tempo real). Além disso, ver sões adicionais desta arquitetura foram propostas como AE-AV1-LP e AE-AV1-MB, que são, respectivamente, uma versão de baixo consumo (low-power) e um design inovador aplicando uma técnica Multi-Boolean também introduzida nesta dissertação. Todos os projetos aqui propostos foram sintetizados usando a ferramenta Cadence™ RC e o PDK ST 65nm. Como o AV1 é conhecido por ser uma alternativa de código aberto na indús tria de codificação de vídeo, a arquitetura AE-AV1 também foi sintetizada de Verilog a layout GDSII usando um fluxo ASIC totalmente de código aberto (ou seja, ferramenta OpenROAD, fluxo OpenLane e PDKs ASAP7 e SkyWater 130nm). As arquiteturas foram capazes de atingir frequências de 581 MHz, 563 MHz e 590 MHz nas versões AE-AV1, AE-AV1-LP e AE-AV1-MB 2-bool, respectivamente. Com relação às vazões, todas as arquiteturas são capazes de processar vídeos 8K@120fps em tempo real com taxas de 1.032 Gbits/seg, 0.999 Gbits/seg e 1.117 Gbits/seg respectivamente

    Low-complexity scalable and multiview video coding

    Get PDF

    Analysis and Comparison of Modern Video Compression Standards for Random-access Light-field Compression

    Get PDF
    Light-field (LF) 3D displays are anticipated to be the next-generation 3D displays by providing smooth motion parallax, wide field of view (FOV), and higher depth range than the current autostereoscopic displays. The projection-based multi-view LF 3D displays bring the desired new functionalities through a set of projection engines creating light sources for the continuous light field to be created. Such displays require a high number of perspective views as an input to fully exploit the visualization capabilities and viewing angle provided by the LF technology. Delivering, processing and de/compressing this amount of views pose big technical challenges. However, when processing light fields in a distributed system, access patterns in ray space are quite regular, some processing nodes do not need all views, moreover the necessary views are used only partially. This trait could be exploited by partial decoding of pictures to help providing less complex and thus real-time operation. However, none of the recent video coding standards (e.g., Advanced Video Coding (AVC)/H.264 and High Efficiency Video Coding (HEVC)/H.265 standards) provides partial decoding of video pictures. Such feature can be achieved by partitioning video pictures into partitions that can be processed independently at the cost of lowering the compression efficiency. Examples of such partitioning features introduced by the modern video coding standards include slices and tiles, which enable random access into the video bitstreams with a specific granularity. In addition, some extra requirements have to be imposed on the standard partitioning tools in order to be applicable in the context of partial decoding. This leads to partitions called self-contained which refers to isolated or independently decodable regions in the video pictures. This work studies the problem of creating self-contained partitions in the conventional AVC/H.264 and HEVC/H.265 standards, and HEVC 3D extensions including multi-view (i.e., MV-HEVC) and 3D (i.e., 3D-HEVC) extensions using slices and tiles, respectively. The requirements that need to be fulfilled in order to build self-contained partitions are described, and an encoder-side solution is proposed. Further, the work examines how slicing/tiling can be used to facilitate random access into the video bitstreams, how the number of slices/tiles affects the compression ratio considering different prediction structures, and how much effect partial decoding has on decoding time. Overall, the experimental results indicate that the finer the partitioning is, the higher the compression loss occurs. The usage of self-contained partitions makes the decoding operation very efficient and less complex

    ECSIC: Epipolar Cross Attention for Stereo Image Compression

    Full text link
    In this paper, we present ECSIC, a novel learned method for stereo image compression. Our proposed method compresses the left and right images in a joint manner by exploiting the mutual information between the images of the stereo image pair using a novel stereo cross attention (SCA) module and two stereo context modules. The SCA module performs cross-attention restricted to the corresponding epipolar lines of the two images and processes them in parallel. The stereo context modules improve the entropy estimation of the second encoded image by using the first image as a context. We conduct an extensive ablation study demonstrating the effectiveness of the proposed modules and a comprehensive quantitative and qualitative comparison with existing methods. ECSIC achieves state-of-the-art performance among stereo image compression models on the two popular stereo image datasets Cityscapes and InStereo2k while allowing for fast encoding and decoding, making it highly practical for real-time applications

    Error resilience and concealment techniques for high-efficiency video coding

    Get PDF
    This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods

    Challenges and solutions in H.265/HEVC for integrating consumer electronics in professional video systems

    Get PDF

    GPU Parallelization of HEVC In-Loop Filters

    Get PDF
    In the High Efficiency Video Coding (HEVC) standard, multiple decoding modules have been designed to take advantage of parallel processing. In particular, the HEVC in-loop filters (i.e., the deblocking filter and sample adaptive offset) were conceived to be exploited by parallel architectures. However, the type of the offered parallelism mostly suits the capabilities of multi-core CPUs, thus making a real challenge to efficiently exploit massively parallel architectures such as Graphic Processing Units (GPUs), mainly due to the existing data dependencies between the HEVC decoding procedures. In accordance, this paper presents a novel strategy to increase the amount of parallelism and the resulting performance of the HEVC in-loop filters on GPU devices. For this purpose, the proposed algorithm performs the HEVC filtering at frame-level and employs intrinsic GPU vector instructions. When compared to the state-of-the-art HEVC in-loop filter implementations, the proposed approach also reduces the amount of required memory transfers, thus further boosting the performance. Experimental results show that the proposed GPU in-loop filters deliver a significant improvement in decoding performance. For example, average frame rates of 76 frames per second (FPS) and 125 FPS for Ultra HD 4K are achieved on an embedded NVIDIA GPU for All Intra and Random Access configurations, respectively

    MAP Joint Source-Channel Arithmetic Decoding for Compressed Video

    Get PDF
    In order to have robust video transmission over error prone telecommunication channels several mechanisms are introduced. These mechanisms try to detect, correct or conceal the errors in the received video stream. In this thesis, the performance of the video codec is improved in terms of error rates without increasing overhead in terms of data bit rate. This is done by exploiting the residual syntactic/semantic redundancy inside compressed video along with optimizing the configuration of the state-of-the art entropy coding, i.e., binary arithmetic coding, and optimizing the quantization of the channel output. The thesis is divided into four phases. In the first phase, a breadth-first suboptimal sequential maximum a posteriori (MAP) decoder is employed for joint source-channel arithmetic decoding of H.264 symbols. The proposed decoder uses not only the intentional redundancy inserted via a forbidden symbol (FS) but also exploits residual redundancy by a syntax checker. In contrast to previous methods this is done as each channel bit is decoded. Simulations using intra prediction modes show improvements in error rates, e.g., syntax element error rate reduction by an order of magnitude for channel SNR of 7.33dB. The cost of this improvement is more computational complexity spent on the syntax checking. In the second phase, the configuration of the FS in the symbol set is studied. The delay probability function, i.e., the probability of the number of bits required to detect an error, is calculated for various FS configurations. The probability of missed error detection is calculated as a figure of merit for optimizing the FS configuration. The simulation results show the effectiveness of the proposed figure of merit, and support the FS configuration in which the FS lies entirely between the other information carrying symbols to be the best. In the third phase, a new method for estimating the a priori probability of particular syntax elements is proposed. This estimation is based on the interdependency among the syntax elements that were previously decoded. This estimation is categorized as either reliable or unreliable. The decoder uses this prior information when they are reliable, otherwise the MAP decoder considers that the syntax elements are equiprobable and in turn uses maximum likelihood (ML) decoding. The reliability detection is carried out using a threshold on the local entropy of syntax elements in the neighboring macroblocks. In the last phase, a new measure to assess performance of the channel quantizer is proposed. This measure is based on the statistics of the rank of true candidate among the sorted list of candidates in the MAP decoder. Simulation results shows that a quantizer designed based on the proposed measure is superior to the quantizers designed based on maximum mutual information and minimum mean square error

    End to end Multi-Objective Optimisation of H.264 and HEVC Codecs

    Get PDF
    All multimedia devices now incorporate video CODECs that comply with international video coding standards such as H.264 / MPEG4-AVC and the new High Efficiency Video Coding Standard (HEVC) otherwise known as H.265. Although the standard CODECs have been designed to include algorithms with optimal efficiency, large number of coding parameters can be used to fine tune their operation, within known constraints of for e.g., available computational power, bandwidth, consumer QoS requirements, etc. With large number of such parameters involved, determining which parameters will play a significant role in providing optimal quality of service within given constraints is a further challenge that needs to be met. Further how to select the values of the significant parameters so that the CODEC performs optimally under the given constraints is a further important question to be answered. This thesis proposes a framework that uses machine learning algorithms to model the performance of a video CODEC based on the significant coding parameters. Means of modelling both the Encoder and Decoder performance is proposed. We define objective functions that can be used to model the performance related properties of a CODEC, i.e., video quality, bit-rate and CPU time. We show that these objective functions can be practically utilised in video Encoder/Decoder designs, in particular in their performance optimisation within given operational and practical constraints. A Multi-objective Optimisation framework based on Genetic Algorithms is thus proposed to optimise the performance of a video codec. The framework is designed to jointly minimize the CPU Time, Bit-rate and to maximize the quality of the compressed video stream. The thesis presents the use of this framework in the performance modelling and multi-objective optimisation of the most widely used video coding standard in practice at present, H.264 and the latest video coding standard, H.265/HEVC. When a communication network is used to transmit video, performance related parameters of the communication channel will impact the end-to-end performance of the video CODEC. Network delays and packet loss will impact the quality of the video that is received at the decoder via the communication channel, i.e., even if a video CODEC is optimally configured network conditions will make the experience sub-optimal. Given the above the thesis proposes a design, integration and testing of a novel approach to simulating a wired network and the use of UDP protocol for the transmission of video data. This network is subsequently used to simulate the impact of packet loss and network delays on optimally coded video based on the framework previously proposed for the modelling and optimisation of video CODECs. The quality of received video under different levels of packet loss and network delay is simulated, concluding the impact on transmitted video based on their content and features
    corecore