295 research outputs found
Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs
Recently deep learning-based methods have been applied in image compression
and achieved many promising results. In this paper, we propose an improved
hybrid layered image compression framework by combining deep learning and the
traditional image codecs. At the encoder, we first use a convolutional neural
network (CNN) to obtain a compact representation of the input image, which is
losslessly encoded by the FLIF codec as the base layer of the bit stream. A
coarse reconstruction of the input is obtained by another CNN from the
reconstructed compact representation. The residual between the input and the
coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG
codec as the enhancement layer of the bit stream. Experimental results using
the Kodak and Tecnick datasets show that the proposed scheme outperforms the
state-of-the-art deep learning-based layered coding scheme and traditional
codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of
bit rates, when the images are coded in the RGB444 domain.Comment: Submitted to Signal Processing: Image Communicatio
MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding
The rapid advancement of artificial intelligence (AI) technology has led to
the prioritization of standardizing the processing, coding, and transmission of
video using neural networks. To address this priority area, the Moving Picture,
Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a
suite of standards called MPAI-EEV for "end-to-end optimized neural video
coding." The aim of this AI-based video standard project is to compress the
number of bits required to represent high-fidelity video data by utilizing
data-trained neural coding technologies. This approach is not constrained by
how data coding has traditionally been applied in the context of a hybrid
framework. This paper presents an overview of recent and ongoing
standardization efforts in this area and highlights the key technologies and
design philosophy of EEV. It also provides a comparison and report on some
primary efforts such as the coding efficiency of the reference model.
Additionally, it discusses emerging activities such as learned
Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under
development, or in the exploration phase. With a focus on UAV video signals,
this paper addresses the current status of these preliminary efforts. It also
indicates development timelines, summarizes the main technical details, and
provides pointers to further points of reference. The exploration experiment
shows that the EEV model performs better than the state-of-the-art video coding
standard H.266/VVC in terms of perceptual evaluation metric
The Comprehensive Review of Neural Network: An Intelligent Medical Image Compression for Data Sharing
In the healthcare environment, digital images are the most commonly shared information. It has become a vital resource in health care services that facilitates decision-making and treatment procedures. The medical image requires large volumes of storage and the storage scale continues to grow because of the advancement of medical image technology. To enhance the interaction and coordination between healthcare institutions, the efficient exchange of medical information is necessary. Therefore, the sharing of the medical image with zero loss of information and efficiency needs to be guaranteed exactly. Image compression helps ensure that the purpose of sharing this data from a medical image must be as intelligent as possible to contain valuable information while at the same time minimizing unnecessary diagnostic information. Artificial Neural Network has been used to solve many issues in the processing of images. It has proved its dominance in the handling of noisy or incomplete image compression applications over traditional methods. It contributes to the resulting image by a high compression ratio and noise reduction. This paper reviews previous studies on the compression of intelligent medical images with the neural network approach to data sharing
Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications
Recent deep learning methods have led to increased interest in solving
high-efficiency end-to-end transmission problems. These methods, we call
nonlinear transform source-channel coding (NTSCC), extract the semantic latent
features of source signal, and learn entropy model to guide the joint
source-channel coding with variable rate to transmit latent features over
wireless channels. In this paper, we propose a comprehensive framework for
improving NTSCC, thereby higher system coding gain, better model versatility,
and more flexible adaptation strategy aligned with semantic guidance are all
achieved. This new sophisticated NTSCC model is now ready to support large-size
data interaction in emerging XR, which catalyzes the application of semantic
communications. Specifically, we propose three useful improvement approaches.
First, we introduce a contextual entropy model to better capture the spatial
correlations among the semantic latent features, thereby more accurate rate
allocation and contextual joint source-channel coding are developed accordingly
to enable higher coding gain. On that basis, we further propose response
network architectures to formulate versatile NTSCC, i.e., once-trained model
supports various rates and channel states that benefits the practical
deployment. Following this, we propose an online latent feature editing method
to enable more flexible coding rate control aligned with some specific semantic
guidance. By comprehensively applying the above three improvement methods for
NTSCC, a deployment-friendly semantic coded transmission system stands out
finally. Our improved NTSCC system has been experimentally verified to achieve
considerable bandwidth saving versus the state-of-the-art engineered VTM + 5G
LDPC coded transmission system with lower processing latency
Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation
The application of the context-adaptive entropy model significantly improves
the rate-distortion (R-D) performance, in which hyperpriors and autoregressive
models are jointly utilized to effectively capture the spatial redundancy of
the latent representations. However, the latent representations still contain
some spatial correlations. In addition, these methods based on the
context-adaptive entropy model cannot be accelerated in the decoding process by
parallel computing devices, e.g. FPGA or GPU. To alleviate these limitations,
we propose a learned multi-resolution image compression framework, which
exploits the recently developed octave convolutions to factorize the latent
representations into the high-resolution (HR) and low-resolution (LR) parts,
similar to wavelet transform, which further improves the R-D performance. To
speed up the decoding, our scheme does not use context-adaptive entropy model.
Instead, we exploit an additional hyper layer including hyper encoder and hyper
decoder to further remove the spatial redundancy of the latent representation.
Moreover, the cross-resolution parameter estimation (CRPE) is introduced into
the proposed framework to enhance the flow of information and further improve
the rate-distortion performance. An additional information-fidelity loss is
proposed to the total loss function to adjust the contribution of the LR part
to the final bit stream. Experimental results show that our method separately
reduces the decoding time by approximately 73.35 % and 93.44 % compared with
that of state-of-the-art learned image compression methods, and the R-D
performance is still better than H.266/VVC(4:2:0) and some learning-based
methods on both PSNR and MS-SSIM metrics across a wide bit rates.Comment: Accepted by signal processin
Adaptive Semantic Communications: Overfitting the Source and Channel for Profit
Most semantic communication systems leverage deep learning models to provide
end-to-end transmission performance surpassing the established source and
channel coding approaches. While, so far, research has mainly focused on
architecture and model improvements, but such a model trained over a full
dataset and ergodic channel responses is unlikely to be optimal for every test
instance. Due to limitations on the model capacity and imperfect optimization
and generalization, such learned models will be suboptimal especially when the
testing data distribution or channel response is different from that in the
training phase, as is likely to be the case in practice. To tackle this, in
this paper, we propose a novel semantic communication paradigm by leveraging
the deep learning model's overfitting property. Our model can for instance be
updated after deployment, which can further lead to substantial gains in terms
of the transmission rate-distortion (RD) performance. This new system is named
adaptive semantic communication (ASC). In our ASC system, the ingredients of
wireless transmitted stream include both the semantic representations of source
data and the adapted decoder model parameters. Specifically, we take the
overfitting concept to the extreme, proposing a series of ingenious methods to
adapt the semantic codec or representations to an individual data or channel
state instance. The whole ASC system design is formulated as an optimization
problem whose goal is to minimize the loss function that is a tripartite
tradeoff among the data rate, model rate, and distortion terms. The experiments
(including user study) verify the effectiveness and efficiency of our ASC
system. Notably, the substantial gain of our overfitted coding paradigm can
catalyze semantic communication upgrading to a new era
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation
Deep learning-based image compression has made great progresses recently.
However, many leading schemes use serial context-adaptive entropy model to
improve the rate-distortion (R-D) performance, which is very slow. In addition,
the complexities of the encoding and decoding networks are quite high and not
suitable for many practical applications. In this paper, we introduce four
techniques to balance the trade-off between the complexity and performance. We
are the first to introduce deformable convolutional module in compression
framework, which can remove more redundancies in the input image, thereby
enhancing compression performance. Second, we design a checkerboard context
model with two separate distribution parameter estimation networks and
different probability models, which enables parallel decoding without
sacrificing the performance compared to the sequential context-adaptive model.
Third, we develop an improved three-step knowledge distillation and training
scheme to achieve different trade-offs between the complexity and the
performance of the decoder network, which transfers both the final and
intermediate results of the teacher network to the student network to help its
training. Fourth, we introduce regularization to make the numerical
values of the latent representation more sparse. Then we only encode non-zero
channels in the encoding and decoding process, which can greatly reduce the
encoding and decoding time. Experiments show that compared to the
state-of-the-art learned image coding scheme, our method can be about 20 times
faster in encoding and 70-90 times faster in decoding, and our R-D performance
is also higher. Our method outperforms the traditional approach in
H.266/VVC-intra (4:4:4) and some leading learned schemes in terms of PSNR and
MS-SSIM metrics when testing on Kodak and Tecnick-40 datasets.Comment: Submitted to Trans. Journa
Data-driven visual quality estimation using machine learning
Heutzutage werden viele visuelle Inhalte erstellt und sind zugĂ€nglich, was auf Verbesserungen der Technologie wie Smartphones und das Internet zurĂŒckzufĂŒhren ist. Es ist daher notwendig, die von den Nutzern wahrgenommene QualitĂ€t zu bewerten, um das Erlebnis weiter zu verbessern. Allerdings sind nur wenige der aktuellen QualitĂ€tsmodelle speziell fĂŒr höhere Auflösungen konzipiert, sagen mehr als nur den Mean Opinion Score vorher oder nutzen maschinelles Lernen. Ein Ziel dieser Arbeit ist es, solche maschinellen Modelle fĂŒr höhere Auflösungen mit verschiedenen DatensĂ€tzen zu trainieren und zu evaluieren. Als Erstes wird eine objektive Analyse der BildqualitĂ€t bei höheren Auflösungen durchgefĂŒhrt. Die Bilder wurden mit Video-Encodern komprimiert, hierbei weist AV1 die beste QualitĂ€t und Kompression auf. AnschlieĂend werden die Ergebnisse eines Crowd-Sourcing-Tests mit einem Labortest bezĂŒglich BildqualitĂ€t verglichen. Weiterhin werden auf Deep Learning basierende Modelle fĂŒr die Vorhersage von Bild- und VideoqualitĂ€t beschrieben. Das auf Deep Learning basierende Modell ist aufgrund der benötigten Ressourcen fĂŒr die Vorhersage der VideoqualitĂ€t in der Praxis nicht anwendbar. Aus diesem Grund werden pixelbasierte VideoqualitĂ€tsmodelle vorgeschlagen und ausgewertet, die aussagekrĂ€ftige Features verwenden, welche Bild- und Bewegungsaspekte abdecken. Diese Modelle können zur Vorhersage von Mean Opinion Scores fĂŒr Videos oder sogar fĂŒr anderer Werte im Zusammenhang mit der VideoqualitĂ€t verwendet werden, wie z.B. einer Bewertungsverteilung. Die vorgestellte Modellarchitektur kann auf andere Videoprobleme angewandt werden, wie z.B. Videoklassifizierung, Vorhersage der QualitĂ€t von Spielevideos, Klassifikation von Spielegenres oder der Klassifikation von Kodierungsparametern. Ein wichtiger Aspekt ist auch die Verarbeitungszeit solcher Modelle. Daher wird ein allgemeiner Ansatz zur Beschleunigung von State-of-the-Art-VideoqualitĂ€tsmodellen vorgestellt, der zeigt, dass ein erheblicher Teil der Verarbeitungszeit eingespart werden kann, wĂ€hrend eine Ă€hnliche Vorhersagegenauigkeit erhalten bleibt. Die Modelle sind als Open Source veröffentlicht, so dass die entwickelten Frameworks fĂŒr weitere Forschungsarbeiten genutzt werden können. AuĂerdem können die vorgestellten AnsĂ€tze als Bausteine fĂŒr neuere Medienformate verwendet werden.Today a lot of visual content is accessible and produced, due to improvements in technology such as smartphones and the internet. This results in a need to assess the quality perceived by users to further improve the experience. However, only a few of the state-of-the-art quality models are specifically designed for higher resolutions, predict more than mean opinion score, or use machine learning. One goal of the thesis is to train and evaluate such machine learning models of higher resolutions with several datasets. At first, an objective evaluation of image quality in case of higher resolutions is performed. The images are compressed using video encoders, and it is shown that AV1 is best considering quality and compression. This evaluation is followed by the analysis of a crowdsourcing test in comparison with a lab test investigating image quality. Afterward, deep learning-based models for image quality prediction and an extension for video quality are proposed. However, the deep learning-based video quality model is not practically usable because of performance constrains. For this reason, pixel-based video quality models using well-motivated features covering image and motion aspects are proposed and evaluated. These models can be used to predict mean opinion scores for videos, or even to predict other video quality-related information, such as a rating distributions. The introduced model architecture can be applied to other video problems, such as video classification, gaming video quality prediction, gaming genre classification or encoding parameter estimation. Furthermore, one important aspect is the processing time of such models. Hence, a generic approach to speed up state-of-the-art video quality models is introduced, which shows that a significant amount of processing time can be saved, while achieving similar prediction accuracy. The models have been made publicly available as open source so that the developed frameworks can be used for further research. Moreover, the presented approaches may be usable as building blocks for newer media formats
- âŠ