242 research outputs found
Analysis of Neural Video Compression Networks for 360-Degree Video Coding
With the increasing efforts of bringing high-quality virtual reality
technologies into the market, efficient 360-degree video compression gains in
importance. As such, the state-of-the-art H.266/VVC video coding standard
integrates dedicated tools for 360-degree video, and considerable efforts have
been put into designing 360-degree projection formats with improved compression
efficiency. For the fast-evolving field of neural video compression networks
(NVCs), the effects of different 360-degree projection formats on the overall
compression performance have not yet been investigated. It is thus unclear,
whether a resampling from the conventional equirectangular projection (ERP) to
other projection formats yields similar gains for NVCs as for hybrid video
codecs, and which formats perform best. In this paper, we analyze several
generations of NVCs and an extensive set of 360-degree projection formats with
respect to their compression performance for 360-degree video. Based on our
analysis, we find that projection format resampling yields significant
improvements in compression performance also for NVCs. The adjusted cubemap
projection (ACP) and equatorial cylindrical projection (ECP) show to perform
best and achieve rate savings of more than 55% compared to ERP based on WS-PSNR
for the most recent NVC. Remarkably, the observed rate savings are higher than
for H.266/VVC, emphasizing the importance of projection format resampling for
NVCs.Comment: 5 pages, 4 figures, 1 table, accepted for Picture Coding Symposium
2024 (PCS 2024
Conditional Residual Coding: A Remedy for Bottleneck Problems in Conditional Inter Frame Coding
Conditional coding is a new video coding paradigm enabled by
neural-network-based compression. It can be shown that conditional coding is in
theory better than the traditional residual coding, which is widely used in
video compression standards like HEVC or VVC. However, on closer inspection, it
becomes clear that conditional coders can suffer from information bottlenecks
in the prediction path, i.e., that due to the data processing inequality not
all information from the prediction signal can be passed to the reconstructed
signal, thereby impairing the coder performance. In this paper we propose the
conditional residual coding concept, which we derive from information
theoretical properties of the conditional coder. This coder significantly
reduces the influence of bottlenecks, while maintaining the theoretical
performance of the conditional coder. We provide a theoretical analysis of the
coding paradigm and demonstrate the performance of the conditional residual
coder in a practical example. We show that conditional residual coders
alleviate the disadvantages of conditional coders while being able to maintain
their advantages over residual coders. In the spectrum of residual and
conditional coding, we can therefore consider them as ``the best from both
worlds''.Comment: 12 pages, 8 figure
Boosting Neural Image Compression for Machines Using Latent Space Masking
Today, many image coding scenarios do not have a human as final intended
user, but rather a machine fulfilling computer vision tasks on the decoded
image. Thereby, the primary goal is not to keep visual quality but maintain the
task accuracy of the machine for a given bitrate. Due to the tremendous
progress of deep neural networks setting benchmarking results, mostly neural
networks are employed to solve the analysis tasks at the decoder side.
Moreover, neural networks have also found their way into the field of image
compression recently. These two developments allow for an end-to-end training
of the neural compression network for an analysis network as information sink.
Therefore, we first roll out such a training with a task-specific loss to
enhance the coding performance of neural compression networks. Compared to the
standard VVC, 41.4% of bitrate are saved by this method for Mask R-CNN as
analysis network on the uncompressed Cityscapes dataset. As a main
contribution, we propose LSMnet, a network that runs in parallel to the encoder
network and masks out elements of the latent space that are presumably not
required for the analysis network. By this approach, additional 27.3% of
bitrate are saved compared to the basic neural compression network optimized
with the task loss. In addition, we are the first to utilize a feature-based
distortion in the training loss within the context of machine-to-machine
communication, which allows for a training without annotated data. We provide
extensive analyses on the Cityscapes dataset including cross-evaluation with
different analysis networks and present exemplary visual results. Inference
code and pre-trained models are published at
https://github.com/FAU-LMS/NCN_for_M2M.Comment: 12 pages, 9 figures, 3 tables; This work has been accepted for IEEE
T-CSVT special issue "Learned Visual Data Compression for both Human and
Machine". Copyright may be transferred without notice, after which this
version may no longer be accessibl
On Benefits and Challenges of Conditional Interframe Video Coding in Light of Information Theory
The rise of variational autoencoders for image and video compression has
opened the door to many elaborate coding techniques. One example here is the
possibility of conditional interframe coding. Here, instead of transmitting the
residual between the original frame and the predicted frame (often obtained by
motion compensation), the current frame is transmitted under the condition of
knowing the prediction signal. In practice, conditional coding can be
straightforwardly implemented using a conditional autoencoder, which has also
shown good results in recent works. In this paper, we provide an information
theoretical analysis of conditional coding for inter frames and show in which
cases gains compared to traditional residual coding can be expected. We also
show the effect of information bottlenecks which can occur in practical video
coders in the prediction signal path due to the network structure, as a
consequence of the data-processing theorem or due to quantization. We
demonstrate that conditional coding has theoretical benefits over residual
coding but that there are cases in which the benefits are quickly canceled by
small information bottlenecks of the prediction signal.Comment: 5 pages, 4 figures, accepted to be presented at PCS 2022. arXiv admin
note: text overlap with arXiv:2112.08011 Update Note: Fixed notation in Eq.
10, no changes otherwis
The Violent Interstellar Medium of Nearby Dwarf Galaxies
High resolution HI observations of nearby dwarf galaxies (most of which are
situated in the M 81 group at a distance of about 3.2 Mpc) reveal that their
neutral interstellar medium (ISM) is dominated by hole-like features most of
which are expanding. A comparison of the physical properties of these holes
with the ones found in more massive spiral galaxies (such as M 31 and M 33)
shows that they tend to reach much larger sizes in dwarf galaxies. This can be
understood in terms of the galaxy's gravitational potential. The origin of
these features is still a matter of debate. In general, young star forming
regions (OB-associations) are held responsible for their formation. This
picture, however, is not without its critics and other mechanism such as the
infall of high velocity clouds, turbulent motions or even gamma ray bursters
have been recently proposed. Here I will present one example of a supergiant
shell in IC 2574 which corroborates the picture that OB associations are indeed
creating these structures. This particular supergiant shell is currently the
most promising case to study the effects of the combined effects of stellar
winds and supernova-explosions which shape the neutral interstellar medium of
(dwarf) galaxies.Comment: 8 pages, 4 figures, accepted for publication in PASA, in press.
Online version: http://www.atnf.csiro.au/pasa/16_1/walter/paper
Learning Frequency-Specific Quantization Scaling in VVC for Standard-Compliant Task-driven Image Coding
Today, visual data is often analyzed by a neural network without any human
being involved, which demands for specialized codecs. For standard-compliant
codec adaptations towards certain information sinks, HEVC or VVC provide the
possibility of frequency-specific quantization with scaling lists. This is a
well-known method for the human visual system, where scaling lists are derived
from psycho-visual models. In this work, we employ scaling lists when
performing VVC intra coding for neural networks as information sink. To this
end, we propose a novel data-driven method to obtain optimal scaling lists for
arbitrary neural networks. Experiments with Mask R-CNN as information sink
reveal that coding the Cityscapes dataset with the proposed scaling lists
result in peak bitrate savings of 8.9 % over VVC with constant quantization. By
that, our approach also outperforms scaling lists optimized for the human
visual system. The generated scaling lists can be found under
https://github.com/FAU-LMS/VCM_scaling_lists.Comment: Originally submitted at IEEE ICIP 202
Mixed-Initiative Planning for Manned-Unmanned Teaming Missions
The proposed presentation describes an adaptive mixed-initiative agent which assists during mission
(re-) planning to enable efficient multi-vehicle mission management. Our application comprises
future military manned-unmanned teaming missions. The mixed-initiative agent is capable
to plan and schedule tasks of manned and unmanned aircrafts. But instead of replacing the human’s
role as mission manager, the agent acts as an additional team member and supports the
human with task proposals and flaw corrections. Therefore, the agent supports on a level, which
was formerly exclusively owned by human operators. The type and extend of support is adapted to
the particular situation automatically. By reducing the pilot’s work share in the planning process,
pilot mental workload can be reduced significantly. However, the probability for lacks in plan
awareness increases
Processing Energy Modeling for Neural Network Based Image Compression
Nowadays, the compression performance of neural-networkbased image
compression algorithms outperforms state-of-the-art compression approaches such
as JPEG or HEIC-based image compression. Unfortunately, most neural-network
based compression methods are executed on GPUs and consume a high amount of
energy during execution. Therefore, this paper performs an in-depth analysis on
the energy consumption of state-of-the-art neural-network based compression
methods on a GPU and show that the energy consumption of compression networks
can be estimated using the image size with mean estimation errors of less than
7%. Finally, using a correlation analysis, we find that the number of
operations per pixel is the main driving force for energy consumption and
deduce that the network layers up to the second downsampling step are consuming
most energy.Comment: 5 pages, 3 figures, accepted for IEEE International Conference on
Image Processing (ICIP) 202
- …