396 research outputs found
Content-Adaptive Variable Framerate Encoding Scheme for Green Live Streaming
Adaptive live video streaming applications use a fixed predefined
configuration for the bitrate ladder with constant framerate and encoding
presets in a session. However, selecting optimized framerates and presets for
every bitrate ladder representation can enhance perceptual quality, improve
computational resource allocation, and thus, the streaming energy efficiency.
In particular, low framerates for low-bitrate representations reduce
compression artifacts and decrease encoding energy consumption. In addition, an
optimized preset may lead to improved compression efficiency. To this light,
this paper proposes a Content-adaptive Variable Framerate (CVFR) encoding
scheme, which offers two modes of operation: ecological (ECO) and high-quality
(HQ). CVFR-ECO optimizes for the highest encoding energy savings by predicting
the optimized framerate for each representation in the bitrate ladder. CVFR-HQ
takes it further by predicting each representation's optimized
framerate-encoding preset pair using low-complexity discrete cosine transform
energy-based spatial and temporal features for compression efficiency and
sustainable storage. We demonstrate the advantage of CVFR using the x264
open-source video encoder. The results show that CVFR-ECO yields an average
PSNR and VMAF increase of 0.02 dB and 2.50 points, respectively, for the same
bitrate, compared to the fastest preset highest framerate encoding. CVFR-ECO
also yields an average encoding and storage energy consumption reduction of
34.54% and 76.24%, considering a just noticeable difference (JND) of six VMAF
points. In comparison, CVFR-HQ yields an average increase in PSNR and VMAF of
2.43 dB and 10.14 points, respectively, for the same bitrate. Finally, CVFR-HQ
resulted in an average reduction in storage energy consumption of 83.18%,
considering a JND of six VMAF points
State of the art in 2D content representation and compression
Livrable D1.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.1 du projet
SpatioTemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment
Perceptual video quality assessment models are either frame-based or
video-based, i.e., they apply spatiotemporal filtering or motion estimation to
capture temporal video distortions. Despite their good performance on video
quality databases, video-based approaches are time-consuming and harder to
efficiently deploy. To balance between high performance and computational
efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF)
framework, which integrates multiple quality-aware features to predict video
quality. Nevertheless, this fusion framework does not fully exploit temporal
video quality measurements which are relevant to temporal video distortions. To
this end, we propose two improvements to the VMAF framework: SpatioTemporal
VMAF and Ensemble VMAF. Both algorithms exploit efficient temporal video
features which are fed into a single or multiple regression models. To train
our models, we designed a large subjective database and evaluated the proposed
models against state-of-the-art approaches. The compared algorithms will be
made available as part of the open source package in
https://github.com/Netflix/vmaf
Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark
HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for
over-the-top (OTT) video streaming services, due to its ability to deliver a
seamless streaming experience. A key component of HAS is the bitrate ladder,
which provides the encoding parameters (e.g., bitrate-resolution pairs) to
encode the source video. The representations in the bitrate ladder allow the
client's player to dynamically adjust the quality of the video stream based on
network conditions by selecting the most appropriate representation from the
bitrate ladder. The most straightforward and lowest complexity approach
involves using a fixed bitrate ladder for all videos, consisting of
pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely,
the most reliable technique relies on intensively encoding all resolutions over
a wide range of bitrates to build the convex hull, thereby optimizing the
bitrate ladder for each specific video. Several techniques have been proposed
to predict content-based ladders without performing a costly exhaustive search
encoding. This paper provides a comprehensive review of various methods,
including both conventional and learning-based approaches. Furthermore, we
conduct a benchmark study focusing exclusively on various learning-based
approaches for predicting content-optimized bitrate ladders across multiple
codec settings. The considered methods are evaluated on our proposed
large-scale dataset, which includes 300 UHD video shots encoded with software
and hardware encoders using three state-of-the-art encoders, including
AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis
provides baseline methods and insights, which will be valuable for future
research in the field of bitrate ladder prediction. The source code of the
proposed benchmark and the dataset will be made publicly available upon
acceptance of the paper
3D Wavelet-Based Video Codec with Human Perceptual Model
This thesis explores the use of a human perceptual model in video compression, channel coding, error concealment and subjective image quality measurement. The perceptual distortion model just-noticeable-distortion (JND) is investigated. A video encoding/decoding scheme based on 3D wavelet decomposition and the human perceptual model is implemented. It provides a prior compression quality control which is distinct from the conventional video coding system. JND is applied in quantizer design to improve the subjective quality ofcompressed video. The 3D wavelet decomposition helps to remove spatial and temporal redundancy and provides scalability of video quality. In order to conceal the errors that may occur under bad wireless channel conditions, a slicing method and a joint source channel coding scenario that combines RCPC with CRC and uses the distortion information toallocate convolutional coding rates are proposed. A new subjective quality index based on JND is proposed and used to evaluate the overall performance at different signal to noise ratios (SNR) and at different compression ratios.Due to the wide use of arithmetic coding (AC) in data compression, we consider it as a readily available unit in the video codec system for broadcasting. A new scheme for conditional access (CA) sub-system is designed based on the cryptographic property of arithmetic coding. Itsperformance is analyzed along with its application in a multi-resolution video compression system. This scheme simplifies the conditional access sub-system and provides satisfactory system reliability
Video compression algorithms for HEVC and beyond
PhDDue to the increasing number of new services and devices that allow the creation, distribution and consumption of video content, the amount of video information being transmitted all over the world is constantly growing. Video compression technology is essential to cope with the ever increasing volume of digital video data being distributed in today's networks, as more e cient video compression techniques allow support for higher volumes of video data under the same memory/bandwidth constraints. This is especially relevant with the introduction of new and more immersive video formats associated with signi cantly higher amounts of data. In this thesis, novel techniques for improving the e ciency of current and future video coding technologies are investigated. Several aspects that in uence the way conventional video coding methods work are considered. In particular, the properties and limitations of the Human Visual System are exploited to tune the performance of video encoders towards better subjective quality. Additionally, it is shown how the visibility of speci c types of visual artefacts can be prevented during the video encoding process, in order to avoid subjective quality degradations in the compressed content. Techniques for higher video compression e ciency are also explored, targeting to improve the compression capabilities of state-of-the-art video coding standards. Finally, the application of video coding technologies to practical use-cases is considered. Accurate estimation models are devised to control the encoding time and bit rate associated with compressed video signals, in order to meet speci c encoding time and transmission time restrictions
Image Processing Using FPGAs
This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs
- …