Search CORE

99 research outputs found

Learned Video Compression via Heterogeneous Deformable Compensation Network

Author: Chen Chang Wen
Chen Zhenzhong
Wang Huairui
Publication venue
Publication date: 22/07/2022
Field of study

Learned video compression has recently emerged as an essential research topic in developing advanced video compression technologies, where motion compensation is considered one of the most challenging issues. In this paper, we propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance caused by single-size deformable kernels in downsampled feature domain. More specifically, instead of utilizing optical flow warping or single-size-kernel deformable alignment, the proposed algorithm extracts features from the two adjacent frames to estimate content-adaptive heterogeneous deformable (HetDeform) kernel offsets. Then we transform the reference features with the HetDeform convolution to accomplish motion compensation. Moreover, we design a Spatial-Neighborhood-Conditioned Divisive Normalization (SNCDN) to achieve more effective data Gaussianization combined with the Generalized Divisive Normalization. Furthermore, we propose a multi-frame enhanced reconstruction module for exploiting context and temporal information for final quality enhancement. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches

arXiv.org e-Print Archive

Performance analysis of Discrete Cosine Transform in Multibeamforming

Author: Gias Ziad 1983-
Publication venue: 'University of Saskatchewan Library'
Publication date: 11/02/2020
Field of study

Aperture arrays are widely used in beamforming applications where element signals are steered to a particular direction of interest and a single beam is formed. Multibeamforming is an extension of single beamforming, which is desired in the fields where sources located in multiple directions are of interest. Discrete Fourier Transform (DFT) is usually used in these scenarios to segregate the received signals based on their direction of arrivals. In case of broadband signals, DFT of the data at each sensor of an array decomposes the signal into multiple narrowband signals. However, if hardware cost and implementation complexity are of concern while maintaining the desired performance, Discrete Cosine Transform (DCT) outperforms DFT. In this work, instead of DFT, the Discrete Cosine Transform (DCT) is used to decompose the received signal into multiple beams into multiple directions. DCT offers simple and efficient hardware implementation. Also, while low frequency signals are of interest, DCT can process correlated data and perform close to the ideal Karhunen-Loeve Transform (KLT). To further improve the accuracy and reduce the implementation cost, an efficient technique using Algebraic Integer Quantization (AIQ) of the DCT is presented. Both 8-point and 16-point versions of DCT using AIQ mapping have been presented and their performance is analyzed in terms of accuracy and hardware complexity. It has been shown that the proposed AIQ DCT offers considerable savings in hardware compared to DFT and classical DCT while maintaining the same accuracy of beam steering in multibeamforming application

University of Saskatchewan Research Archive

Neural Video Compression using GANs for Detail Synthesis and Propagation

Author: Agustsson Eirikur
Ballé Johannes
Johnston Nick
Mentzer Fabian
Minnen David
Toderici George
Publication venue
Publication date: 12/07/2022
Field of study

We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective: we i) synthesize detail by conditioning the generator on a latent extracted from the warped previous reconstruction to then ii) propagate this detail with high-quality flow. We find that user studies are required to compare methods, i.e., none of our quantitative metrics were able to predict all studies. We present the network design choices in detail, and ablate them with user studies.Comment: First two authors contributed equally. ECCV Camera ready versio

arXiv.org e-Print Archive

Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing

Author: Paim Guilherme Pereira
Publication venue
Publication date: 01/01/2021
Field of study

Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat

Lume 5.8

Lossy Image Compression with Conditional Diffusion Models

Author: Mandt Stephan
Yang Ruihan
Publication venue
Publication date: 30/09/2022
Field of study

Denoising diffusion models have recently marked a milestone in high-quality image generation. One may thus wonder if they are suitable for neural image compression. This paper outlines an end-to-end optimized image compression framework based on a conditional diffusion model, drawing on the transform-coding paradigm. Besides the latent variables inherent to the diffusion process, this paper introduces an additional discrete "content" latent variable to condition the denoising process on. This variable is equipped with a hierarchical prior for entropy coding. The remaining "texture" latent variables characterizing the diffusion process are synthesized (either stochastically or deterministically) at decoding time. We furthermore show that the performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving five datasets and 16 image perceptual quality assessment metrics show that our approach not only compares favorably in terms of rate and perceptual distortion tradeoffs but also shows robust performance under all metrics while other baselines show less consistent behavior.Comment: Accepted at the ECCV 2022 Workshop on Uncertainty Quantification for Computer Visio

arXiv.org e-Print Archive

High-Level Synthesis Implementation of HEVC Intra Encoder

Author: Viitamäki Vili
Publication venue
Publication date: 07/11/2018
Field of study

High Efficiency Video Coding (HEVC) is the latest video coding standard that aims to alleviate the increasing transmission and storage needs of modern video applications. Compared with its predecessor, HEVC is able to halve the bit rate required for high quality video, but at the cost of increased complexity. High complexity makes HEVC video encoding slow and resource intensive but also ideal for hardware acceleration. With increasingly more complex designs, the effort required for traditional hardware development at register-transfer level (RTL) grows substantially. High-Level Synthesis (HLS) aims to solve this by raising the abstraction level through automatic tools that generate RTL-level code from general programming languages like C or C++. In this Thesis, we made use of Catapult-C HLS tool to create an intra coding accelerator for an HEVC encoder on a Field Programmable Gate Array (FPGA). We used the C source code of Kvazaar open-source HEVC encoder as a reference model for accelerator implementation. Over 90 % of the implementation including all major intra coding tools were implemented with HLS, with the rest being ready made IP blocks and hand-written RTL components. The accelerator was synthesized into an Arria 10 FPGA chip that was able to accommodate three accelerators and associated interface components. With two FPGAs connected to a high-end PC, our encoder was able to encode 2160p Ultra-High definition (UHD) video at 123 fps. Total FPGA resource usage was around 80 % with 346k Adaptive logic modules (ALMs) and 1227 Digital signal processors (DSPs)

Trepo - Institutional Repository of Tampere University

TUT DPub

深層学習に基づく画像圧縮と品質評価

Author: Cheng Zhengxue
Publication venue
Publication date: 01/01/2019
Field of study

早大学位記番号:新8427早稲田大

Waseda University Repository

An Energy-efficient Live Video Coding and Communication over Unreliable Channels

Author: Belyaev Evgeny
Publication venue: Tampere University of Technology
Publication date: 01/01/2015
Field of study

In the ﬁeld of multimedia communications there exist many important applications where live or real-time video data is captured by a camera, compressed and transmitted over the channel which can be very unreliable and, at the same time, computational resources or battery capacity of the transmission device are very limited. For example, such scenario holds for video transmission for space missions, vehicle-to-infrastructure video delivery, multimedia wireless sensor networks, wireless endoscopy, video coding on mobile phones, high deﬁnition wireless video surveillance and so on. Taking into account such restrictions, a development of eﬃcient video coding techniques for these applications is a challenging problem. The most popular video compression standards, such as H.264/AVC, are based on the hybrid video coding concept, which is very eﬃcient when video encoding is performed oﬀ-line or non real-time and the pre-encoded video is played back. However, the high computational complexity of the encoding and the high sensitivity of the hybrid video bit stream to losses in the communication channel constitute a signiﬁcant barrier of using these standards for the applications mentioned above. In this thesis, as an alternative to the standards, a video coding based on three-dimensional discrete wavelet transform (3-D DWT) is considered as a candidate to provide a good trade-oﬀ between encoding eﬃciency, computational complexity and robustness to channel losses. Eﬃcient tools are proposed to reduce the computational complexity of the 3-D DWT codec. These tools cover all levels of the codec’s development such as adaptive binary arithmetic coding, bit-plane entropy coding, wavelet transform, packet loss protection based on error-correction codes and bit rate control. These tools can be implemented as end-to-end solution and directly used in real-life scenarios. The thesis provides theoretical, simulation and real-world results which show that the proposed 3-D DWT codec can be more preferable than the standards for live video coding and communication over highly unreliable channels and or in systems where the video encoding computational complexity or power consumption plays a critical role

Trepo - Institutional Repository of Tampere University