780 research outputs found
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
Recommended from our members
A content-aware quantisation mechanism for transform domain distributed video coding
The discrete cosine transform (DCT) is widely applied in modern codecs to remove spatial redundancies, with the resulting DCT coefficients being quantised to achieve compression as well as bit-rate control. In distributed video coding (DVC) architectures like DISCOVER, DCT coefficient quantisation is traditionally performed using predetermined quantisation matrices (QM), which means the compression is heavily dependent on the sequence being coded. This makes bit-rate control challenging, with the situation exacerbated in the coding of high resolution sequences due to QM scarcity and the non-uniform bit-rate gaps between them. This paper introduces a novel content-aware quantisation (CAQ) mechanism to overcome the limitations of existing quantisation methods in transform domain DVC. CAQ creates a frame-specific QM to reduce quantisation errors by analysing the distribution of DCT coefficients. In contrast to the predetermined QM that is applicable to only 4x4 block sizes, CAQ produces QM for larger block sizes to enhance compression at higher resolutions. This provides superior bit-rate control and better output quality by seeking to fully exploit the available bandwidth, which is especially beneficial in bandwidth constrained scenarios. In addition, CAQ generates superior perceptual results by innovatively applying different weightings to the DCT coefficients to reflect the human visual system. Experimental results corroborate that CAQ both quantitatively and qualitatively provides enhanced output quality in bandwidth limited scenarios, by consistently utilising over 90% of available bandwidth
SPARSE DECOMPOSITION OF AUDIO SIGNALS USING A PERCEPTUAL MEASURE OF DISTORTION. APPLICATION TO LOSSY AUDIO CODING.
International audienceState-of the art audio codecs use time-frequency transforms derived from cosine bases, followed by a quantification stage. The quantization steps are set according to perceptual considerations. In the last decade, several studies applied adaptive sparse time-frequency transforms to audio coding, e.g. on unions of cosine bases using a Matching-Pursuit-derived algorithm. This was shown to significantly improve the coding efficiency. We propose another approach based on a variational algorithm, i.e. the optimization of a cost function taking into account both a perceptual distortion measure derived form a hearing model and a sparsity constraint, which favors the coding efficiency. In this early version, we show that, using a coding scheme without perceptual control of quantization, our method outperforms a codec from the literature with the same quantization scheme. In future work, a more sophisticated quantization scheme would probably allow our method to challenge standard codecs e.g. AAC
The AV1 Constrained Directional Enhancement Filter (CDEF)
This paper presents the constrained directional enhancement filter designed
for the AV1 royalty-free video codec. The in-loop filter is based on a
non-linear low-pass filter and is designed for vectorization efficiency. It
takes into account the direction of edges and patterns being filtered. The
filter works by identifying the direction of each block and then adaptively
filtering with a high degree of control over the filter strength along the
direction and across it. The proposed enhancement filter is shown to improve
the quality of the Alliance for Open Media (AOM) AV1 and Thor video codecs in
particular in low complexity configurations.Comment: 5 page
Perceptual impact of the loss function on deep-learning image coding performance
Nowadays, deep-learning image coding solutions have shown similar or better
compression efficiency than conventional solutions based on hand-crafted
transforms and spatial prediction techniques. These deep-learning codecs
require a large training set of images and a training methodology to obtain a
suitable model (set of parameters) for efficient compression. The training is
performed with an optimization algorithm which provides a way to minimize the
loss function. Therefore, the loss function plays a key role in the overall
performance and includes a differentiable quality metric that attempts to mimic
human perception. The main objective of this paper is to study the perceptual
impact of several image quality metrics that can be used in the loss function
of the training process, through a crowdsourcing subjective image quality
assessment study. From this study, it is possible to conclude that the choice
of the quality metric is critical for the perceptual performance of the
deep-learning codec and that can vary depending on the image content.Comment: 5 pages, 4 figure
An efficient rate control algorithm for a wavelet video codec
Rate control plays an essential role in video coding and transmission to provide the best video quality at the receiver's end given the constraint of certain network conditions. In this paper, a rate control algorithm using the Quality Factor (QF) optimization method is proposed for the wavelet-based video codec and implemented on an open source Dirac video encoder. A mathematical model which we call Rate-QF (R - QF) model is derived to generate the optimum QF for the current coding frame according to the target bitrate. The proposed algorithm is a complete one pass process and does not require complex mathematical calculation. The process of calculating the QF is quite simple and further calculation is not required for each coded frame. The experimental results show that the proposed algorithm can control the bitrate precisely (within 1% of target bitrate in average). Moreover, the variation of bitrate over each Group of Pictures (GOPs) is lower than that of H.264. This is an advantage in preventing the buffer overflow and underflow for real-time multimedia data streaming
- …