41 research outputs found
Estimation of Perceptual Redundancies of HEVC Encoded Dynamic Textures
International audienceStatistical redundancies have been the dominant target in the image/video compression standards. Perceptually, there exists further redundancies that can be removed to further enhance the compression efficiency. In this paper, we considered short term homogeneous patches that fall into the foveal vision as dynamic textures, for which a psychophysical test was used to estimate their amount of perceptual redundancies. We demonstrated the possible rate saving by utilizing these redundancies. We further designed a learning model that can precisely predict the amount of redundancies and accordingly proposed a generalized perceptual optimization framework
Recommended from our members
A content-aware quantisation mechanism for transform domain distributed video coding
The discrete cosine transform (DCT) is widely applied in modern codecs to remove spatial redundancies, with the resulting DCT coefficients being quantised to achieve compression as well as bit-rate control. In distributed video coding (DVC) architectures like DISCOVER, DCT coefficient quantisation is traditionally performed using predetermined quantisation matrices (QM), which means the compression is heavily dependent on the sequence being coded. This makes bit-rate control challenging, with the situation exacerbated in the coding of high resolution sequences due to QM scarcity and the non-uniform bit-rate gaps between them. This paper introduces a novel content-aware quantisation (CAQ) mechanism to overcome the limitations of existing quantisation methods in transform domain DVC. CAQ creates a frame-specific QM to reduce quantisation errors by analysing the distribution of DCT coefficients. In contrast to the predetermined QM that is applicable to only 4x4 block sizes, CAQ produces QM for larger block sizes to enhance compression at higher resolutions. This provides superior bit-rate control and better output quality by seeking to fully exploit the available bandwidth, which is especially beneficial in bandwidth constrained scenarios. In addition, CAQ generates superior perceptual results by innovatively applying different weightings to the DCT coefficients to reflect the human visual system. Experimental results corroborate that CAQ both quantitatively and qualitatively provides enhanced output quality in bandwidth limited scenarios, by consistently utilising over 90% of available bandwidth
JOINT CODING OF MULTIMODAL BIOMEDICAL IMAGES US ING CONVOLUTIONAL NEURAL NETWORKS
The massive volume of data generated daily by the gathering of medical images with
different modalities might be difficult to store in medical facilities and share through
communication networks. To alleviate this issue, efficient compression methods
must be implemented to reduce the amount of storage and transmission resources
required in such applications. However, since the preservation of all image details
is highly important in the medical context, the use of lossless image compression
algorithms is of utmost importance.
This thesis presents the research results on a lossless compression scheme designed
to encode both computerized tomography (CT) and positron emission tomography
(PET). Different techniques, such as image-to-image translation, intra prediction,
and inter prediction are used. Redundancies between both image modalities are
also investigated. To perform the image-to-image translation approach, we resort to
lossless compression of the original CT data and apply a cross-modality image translation
generative adversarial network to obtain an estimation of the corresponding
PET.
Two approaches were implemented and evaluated to determine a PET residue
that will be compressed along with the original CT. In the first method, the
residue resulting from the differences between the original PET and its estimation
is encoded, whereas in the second method, the residue is obtained using encoders
inter-prediction coding tools. Thus, in alternative to compressing two independent
picture modalities, i.e., both images of the original PET-CT pair solely the CT is
independently encoded alongside with the PET residue, in the proposed method.
Along with the proposed pipeline, a post-processing optimization algorithm that
modifies the estimated PET image by altering the contrast and rescaling the image
is implemented to maximize the compression efficiency.
Four different versions (subsets) of a publicly available PET-CT pair dataset
were tested. The first proposed subset was used to demonstrate that the concept
developed in this work is capable of surpassing the traditional compression schemes.
The obtained results showed gains of up to 8.9% using the HEVC. On the other
side, JPEG2k proved not to be the most suitable as it failed to obtain good results,
having reached only -9.1% compression gain. For the remaining (more challenging) subsets, the results reveal that the proposed refined post-processing scheme attains,
when compared to conventional compression methods, up 6.33% compression gain
using HEVC, and 7.78% using VVC
Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark
HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for
over-the-top (OTT) video streaming services, due to its ability to deliver a
seamless streaming experience. A key component of HAS is the bitrate ladder,
which provides the encoding parameters (e.g., bitrate-resolution pairs) to
encode the source video. The representations in the bitrate ladder allow the
client's player to dynamically adjust the quality of the video stream based on
network conditions by selecting the most appropriate representation from the
bitrate ladder. The most straightforward and lowest complexity approach
involves using a fixed bitrate ladder for all videos, consisting of
pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely,
the most reliable technique relies on intensively encoding all resolutions over
a wide range of bitrates to build the convex hull, thereby optimizing the
bitrate ladder for each specific video. Several techniques have been proposed
to predict content-based ladders without performing a costly exhaustive search
encoding. This paper provides a comprehensive review of various methods,
including both conventional and learning-based approaches. Furthermore, we
conduct a benchmark study focusing exclusively on various learning-based
approaches for predicting content-optimized bitrate ladders across multiple
codec settings. The considered methods are evaluated on our proposed
large-scale dataset, which includes 300 UHD video shots encoded with software
and hardware encoders using three state-of-the-art encoders, including
AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis
provides baseline methods and insights, which will be valuable for future
research in the field of bitrate ladder prediction. The source code of the
proposed benchmark and the dataset will be made publicly available upon
acceptance of the paper
Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression
Recently, the performance of neural image compression (NIC) has steadily
improved thanks to the last line of study, reaching or outperforming
state-of-the-art conventional codecs. Despite significant progress, current NIC
methods still rely on ConvNet-based entropy coding, limited in modeling
long-range dependencies due to their local connectivity and the increasing
number of architectural biases and priors, resulting in complex underperforming
models with high decoding latency. Motivated by the efficiency investigation of
the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose
to enhance the latter, as first, with a more straightforward yet effective
Tranformer-based channel-wise auto-regressive prior model, resulting in an
absolute image compression transformer (ICT). Through the proposed ICT, we can
capture both global and local contexts from the latent representations and
better parameterize the distribution of the quantized latents. Further, we
leverage a learnable scaling module with a sandwich ConvNeXt-based
pre-/post-processor to accurately extract more compact latent codes while
reconstructing higher-quality images. Extensive experimental results on
benchmark datasets showed that the proposed framework significantly improves
the trade-off between coding efficiency and decoder complexity over the
versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec
SwinT-ChARM. Moreover, we provide model scaling studies to verify the
computational efficiency of our approach and conduct several objective and
subjective analyses to bring to the fore the performance gap between the
adaptive image compression transformer (AICT) and the neural codec SwinT-ChARM
Low complexity in-loop perceptual video coding
The tradition of broadcast video is today complemented with user generated content, as portable devices support video coding. Similarly, computing is becoming ubiquitous, where Internet of Things (IoT) incorporate heterogeneous networks to communicate with personal and/or infrastructure devices. Irrespective, the emphasises is on bandwidth and processor efficiencies, meaning increasing the signalling options in video encoding. Consequently, assessment for pixel differences applies uniform cost to be processor efficient, in contrast the Human Visual System (HVS) has non-uniform sensitivity based upon lighting, edges and textures. Existing perceptual assessments, are natively incompatible and processor demanding, making perceptual video coding (PVC) unsuitable for these environments. This research allows existing perceptual assessment at the native level using low complexity techniques, before producing new pixel-base image quality assessments (IQAs). To manage these IQAs a framework was developed and implemented in the high efficiency video coding (HEVC) encoder. This resulted in bit-redistribution, where greater bits and smaller partitioning were allocated to perceptually significant regions. Using a HEVC optimised processor the timing increase was < +4% and < +6% for video streaming and recording applications respectively, 1/3 of an existing low complexity PVC solution. Future work should be directed towards perceptual quantisation which offers the potential for perceptual coding gain
Learned-based Intra Coding Tools for Video Compression.
PhD Theses.The increase in demand for video rendering in 4K and beyond displays, as well
as immersive video formats, requires the use of e cient compression techniques. In
this thesis novel methods for enhancing the e ciency of current and next generation
video codecs are investigated. Several aspects that in
uence the way conventional video
coding methods work are considered. The methods proposed in this thesis utilise Neural
Networks (NNs) trained for regression tasks in order to predict data. In particular,
Convolutional Neural Networks (CNNs) are used to predict Rate-Distortion (RD) data
for intra-coded frames. Moreover, a novel intra-prediction methods are proposed with
the aim of providing new ways to exploit redundancies overlooked by traditional intraprediction
tools. Additionally, it is shown how such methods can be simpli ed in order
to derive less resource-demanding tools
Verilog implementation of the VESA DSC compression algorithm
O trabalho consiste em implementar em verilog o Standard de compressão VESA DSC v1.1. O projecto está na fase de teste e optimização para cumprir restrições de timming. É esperado estar concluido nos inicios de Junho. Feito isto será feita uma comparação entre uma abordagem usando ferramentes de sÃntese de alto nÃvel e a abordagem "manual" (RTL