6 research outputs found

    Architecture exploration and VLSI design of multi-symbol arithmetic encoders for the AV1 coding format

    Get PDF
    To reduce the impact of videos in the global Internet capacity, companies rely upon video coding standards and formats, also known as codecs, to reduce the overall sizes of videos before transmitting or storing them. AV1, which arises as a promising state-of-the-art and royalties-free video coding format first released in 2018, aims to reduce the sizes of videos by applying novel techniques to boost AV1’s compression results. Amongst its core components, AV1 comprises an entropy coding block, which is re sponsible for losslessly encoding symbols generated by other core modules (e.g., intra prediction, motion compensation, etc.). The arithmetic encoder, which is part of the en tropy encoder, is a bottleneck due to its difficulty to work with parallelizations, and relies upon two primary operations: CDF Operation and Boolean Operation, where CDF stands for Cumulative Distribution Function. This thesis proposes a baseline VLSI design, which was named AE-AV1, as the first ever AV1 arithmetic encoder found in the literature, and capable of reaching ultra-high performance (i.e., processing of 8K@120fps videos in real-time). Moreover, additional versions of this architecture were proposed as AE-AV1-LP and AE-AV1-MB, which are, respectively, a low-power version and a novel design applying a Multi-Boolean technique also introduced in this thesis. All the herein proposed designs were synthesized using the Cadenceℱ RC tool and the ST 65nm PDK. As the AV1 is well-known for being an open-source alternative in the video coding industry, the AE-AV1 architecture was also synthesized from Verilog to GDSII layout using a fully open-source ASIC flow (i.e., OpenROAD tool, OpenLane flow, and ASAP7 and SkyWater 130nm PDKs). The architectures were capable of reaching frequencies of 581 MHz, 563 MHz and 590 MHz for the versions AE-AV1, AE-AV1-LP and AE-AV1-MB 2-bool, respectively. With regard to throughput rates, all herein introduced architectures are capable of reaching 8K@120fps real-time video processing with rates of 1.032 Gbits/sec, 0.999 Gbits/sec and 1.117 Gbits/sec respectively.Para reduzir o impacto dos vĂ­deos na capacidade global de Internet, as empresas contam com padrĂ”es e formatos de codificação de vĂ­deo, tambĂ©m conhecidos como codecs, para reduzir os tamanhos dos vĂ­deos antes de transmiti-los ou armazenĂĄ-los. O AV1, que surge como um promissor formato de codificação de vĂ­deo de Ășltima geração e livre de royal ties lançado pela primeira vez em 2018, visa reduzir os tamanhos dos vĂ­deos aplicando tĂ©cnicas inovadoras e aprimoradas para aumentar os resultados de compactação do AV1. Entre seus componentes principais, o AV1 compreende um bloco de codificação de en tropia, que Ă© responsĂĄvel pela codificação sem perdas de sĂ­mbolos gerados por outros mĂłdulos (por exemplo, predição intra-quadro, compensação de movimento, etc.). O co dificador aritmĂ©tico, que faz parte do codificador de entropia, Ă© um gargalo devido Ă  sua dificuldade em trabalhar com paralelizaçÔes e conta com duas operaçÔes principais: CDF Operation e Boolean Operation, onde CDF representa Cumulative Distribution Function. Esta dissertação propĂ”e um projeto VLSI digital, nomeado AE-AV1, como o primeiro codificador aritmĂ©tico AV1 encontrado na literatura e capaz de atingir desempenho ultra high (ou seja, processamento de vĂ­deos 8K@120fps em tempo real). AlĂ©m disso, ver sĂ”es adicionais desta arquitetura foram propostas como AE-AV1-LP e AE-AV1-MB, que sĂŁo, respectivamente, uma versĂŁo de baixo consumo (low-power) e um design inovador aplicando uma tĂ©cnica Multi-Boolean tambĂ©m introduzida nesta dissertação. Todos os projetos aqui propostos foram sintetizados usando a ferramenta Cadenceℱ RC e o PDK ST 65nm. Como o AV1 Ă© conhecido por ser uma alternativa de cĂłdigo aberto na indĂșs tria de codificação de vĂ­deo, a arquitetura AE-AV1 tambĂ©m foi sintetizada de Verilog a layout GDSII usando um fluxo ASIC totalmente de cĂłdigo aberto (ou seja, ferramenta OpenROAD, fluxo OpenLane e PDKs ASAP7 e SkyWater 130nm). As arquiteturas foram capazes de atingir frequĂȘncias de 581 MHz, 563 MHz e 590 MHz nas versĂ”es AE-AV1, AE-AV1-LP e AE-AV1-MB 2-bool, respectivamente. Com relação Ă s vazĂ”es, todas as arquiteturas sĂŁo capazes de processar vĂ­deos 8K@120fps em tempo real com taxas de 1.032 Gbits/seg, 0.999 Gbits/seg e 1.117 Gbits/seg respectivamente

    Perceptual video quality assessment: the journey continues!

    Get PDF
    Perceptual Video Quality Assessment (VQA) is one of the most fundamental and challenging problems in the field of Video Engineering. Along with video compression, it has become one of two dominant theoretical and algorithmic technologies in television streaming and social media. Over the last 2 decades, the volume of video traffic over the internet has grown exponentially, powered by rapid advancements in cloud services, faster video compression technologies, and increased access to high-speed, low-latency wireless internet connectivity. This has given rise to issues related to delivering extraordinary volumes of picture and video data to an increasingly sophisticated and demanding global audience. Consequently, developing algorithms to measure the quality of pictures and videos as perceived by humans has become increasingly critical since these algorithms can be used to perceptually optimize trade-offs between quality and bandwidth consumption. VQA models have evolved from algorithms developed for generic 2D videos to specialized algorithms explicitly designed for on-demand video streaming, user-generated content (UGC), virtual and augmented reality (VR and AR), cloud gaming, high dynamic range (HDR), and high frame rate (HFR) scenarios. Along the way, we also describe the advancement in algorithm design, beginning with traditional hand-crafted feature-based methods and finishing with current deep-learning models powering accurate VQA algorithms. We also discuss the evolution of Subjective Video Quality databases containing videos and human-annotated quality scores, which are the necessary tools to create, test, compare, and benchmark VQA algorithms. To finish, we discuss emerging trends in VQA algorithm design and general perspectives on the evolution of Video Quality Assessment in the foreseeable future

    Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

    Get PDF
    Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76 Ό\muJ for the 4 points system (IDCT and IDST) and 3.06 Ό\muJ for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA

    Quality of Experience in Immersive Video Technologies

    Get PDF
    Over the last decades, several technological revolutions have impacted the television industry, such as the shifts from black & white to color and from standard to high-definition. Nevertheless, further considerable improvements can still be achieved to provide a better multimedia experience, for example with ultra-high-definition, high dynamic range & wide color gamut, or 3D. These so-called immersive technologies aim at providing better, more realistic, and emotionally stronger experiences. To measure quality of experience (QoE), subjective evaluation is the ultimate means since it relies on a pool of human subjects. However, reliable and meaningful results can only be obtained if experiments are properly designed and conducted following a strict methodology. In this thesis, we build a rigorous framework for subjective evaluation of new types of image and video content. We propose different procedures and analysis tools for measuring QoE in immersive technologies. As immersive technologies capture more information than conventional technologies, they have the ability to provide more details, enhanced depth perception, as well as better color, contrast, and brightness. To measure the impact of immersive technologies on the viewersĂą QoE, we apply the proposed framework for designing experiments and analyzing collected subjectsĂą ratings. We also analyze eye movements to study human visual attention during immersive content playback. Since immersive content carries more information than conventional content, efficient compression algorithms are needed for storage and transmission using existing infrastructures. To determine the required bandwidth for high-quality transmission of immersive content, we use the proposed framework to conduct meticulous evaluations of recent image and video codecs in the context of immersive technologies. Subjective evaluation is time consuming, expensive, and is not always feasible. Consequently, researchers have developed objective metrics to automatically predict quality. To measure the performance of objective metrics in assessing immersive content quality, we perform several in-depth benchmarks of state-of-the-art and commonly used objective metrics. For this aim, we use ground truth quality scores, which are collected under our subjective evaluation framework. To improve QoE, we propose different systems for stereoscopic and autostereoscopic 3D displays in particular. The proposed systems can help reducing the artifacts generated at the visualization stage, which impact picture quality, depth quality, and visual comfort. To demonstrate the effectiveness of these systems, we use the proposed framework to measure viewersĂą preference between these systems and standard 2D & 3D modes. In summary, this thesis tackles the problems of measuring, predicting, and improving QoE in immersive technologies. To address these problems, we build a rigorous framework and we apply it through several in-depth investigations. We put essential concepts of multimedia QoE under this framework. These concepts not only are of fundamental nature, but also have shown their impact in very practical applications. In particular, the JPEG, MPEG, and VCEG standardization bodies have adopted these concepts to select technologies that were proposed for standardization and to validate the resulting standards in terms of compression efficiency

    Bitstream-based video quality modeling and analysis of HTTP-based adaptive streaming

    Get PDF
    Die Verbreitung erschwinglicher Videoaufnahmetechnologie und verbesserte Internetbandbreiten ermöglichen das Streaming von hochwertigen Videos (Auflösungen > 1080p, Bildwiederholraten ≄ 60fps) online. HTTP-basiertes adaptives Streaming ist die bevorzugte Methode zum Streamen von Videos, bei der Videoparameter an die verfĂŒgbare Bandbreite angepasst wird, was sich auf die VideoqualitĂ€t auswirkt. Adaptives Streaming reduziert Videowiedergabeunterbrechnungen aufgrund geringer Netzwerkbandbreite, wirken sich jedoch auf die wahrgenommene QualitĂ€t aus, weswegen eine systematische Bewertung dieser notwendig ist. Diese Bewertung erfolgt ĂŒblicherweise fĂŒr kurze Abschnitte von wenige Sekunden und wĂ€hrend einer Sitzung (bis zu mehreren Minuten). Diese Arbeit untersucht beide Aspekte mithilfe perzeptiver und instrumenteller Methoden. Die perzeptive Bewertung der kurzfristigen VideoqualitĂ€t umfasst eine Reihe von Labortests, die in frei verfĂŒgbaren DatensĂ€tzen publiziert wurden. Die QualitĂ€t von lĂ€ngeren Sitzungen wurde in Labortests mit menschlichen Betrachtern bewertet, die reale Betrachtungsszenarien simulieren. Die Methodik wurde zusĂ€tzlich außerhalb des Labors fĂŒr die Bewertung der kurzfristigen VideoqualitĂ€t und der GesamtqualitĂ€t untersucht, um alternative AnsĂ€tze fĂŒr die perzeptive QualitĂ€tsbewertung zu erforschen. Die instrumentelle QualitĂ€tsevaluierung wurde anhand von bitstrom- und hybriden pixelbasierten VideoqualitĂ€tsmodellen durchgefĂŒhrt, die im Zuge dieser Arbeit entwickelt wurden. Dazu wurde die Modellreihe AVQBits entwickelt, die auf den Labortestergebnissen basieren. Es wurden vier verschiedene Modellvarianten von AVQBits mit verschiedenen Inputinformationen erstellt: Mode 3, Mode 1, Mode 0 und Hybrid Mode 0. Die Modellvarianten wurden untersucht und schneiden besser oder gleichwertig zu anderen aktuellen Modellen ab. Diese Modelle wurden auch auf 360°- und Gaming-Videos, HFR-Inhalte und Bilder angewendet. DarĂŒber hinaus wird ein Langzeitintegrationsmodell (1 - 5 Minuten) auf der Grundlage des ITU-T-P.1203.3-Modells prĂ€sentiert, das die verschiedenen Varianten von AVQBits mit sekĂŒndigen QualitĂ€tswerten als VideoqualitĂ€tskomponente des vorgeschlagenen Langzeitintegrationsmodells verwendet. Alle AVQBits-Varianten, das Langzeitintegrationsmodul und die perzeptiven Testdaten wurden frei zugĂ€nglich gemacht, um weitere Forschung zu ermöglichen.The pervasion of affordable capture technology and increased internet bandwidth allows high-quality videos (resolutions > 1080p, framerates ≄ 60fps) to be streamed online. HTTP-based adaptive streaming is the preferred method for streaming videos, adjusting video quality based on available bandwidth. Although adaptive streaming reduces the occurrences of video playout being stopped (called “stalling”) due to narrow network bandwidth, the automatic adaptation has an impact on the quality perceived by the user, which results in the need to systematically assess the perceived quality. Such an evaluation is usually done on a short-term (few seconds) and overall session basis (up to several minutes). In this thesis, both these aspects are assessed using subjective and instrumental methods. The subjective assessment of short-term video quality consists of a series of lab-based video quality tests that have resulted in publicly available datasets. The overall integral quality was subjectively assessed in lab tests with human viewers mimicking a real-life viewing scenario. In addition to the lab tests, the out-of-the-lab test method was investigated for both short-term video quality and overall session quality assessment to explore the possibility of alternative approaches for subjective quality assessment. The instrumental method of quality evaluation was addressed in terms of bitstream- and hybrid pixel-based video quality models developed as part of this thesis. For this, a family of models, namely AVQBits has been conceived using the results of the lab tests as ground truth. Based on the available input information, four different instances of AVQBits, that is, a Mode 3, a Mode 1, a Mode 0, and a Hybrid Mode 0 model are presented. The model instances have been evaluated and they perform better or on par with other state-of-the-art models. These models have further been applied to 360° and gaming videos, HFR content, and images. Also, a long-term integration (1 - 5 mins) model based on the ITU-T P.1203.3 model is presented. In this work, the different instances of AVQBits with the per-1-sec scores output are employed as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module and the subjective test data are made publicly available for further research
    corecore