89 research outputs found
Weighted bi-prediction for light field image coding
Light field imaging based on a single-tier camera equipped with a microlens array – also known as integral, holoscopic, and plenoptic imaging – has currently risen up as a practical and prospective approach for future visual applications and services. However, successfully deploying actual light field imaging applications and services will require developing adequate coding solutions to efficiently handle the massive amount of data involved in these systems. In this context, self-similarity compensated prediction is a non-local spatial prediction scheme based on block matching that has been shown to achieve high efficiency for light field image coding based on the High Efficiency Video Coding (HEVC) standard. As previously shown by the authors, this is possible by simply averaging two predictor blocks that are jointly estimated from a causal search window in the current frame itself, referred to as self-similarity bi-prediction. However, theoretical analyses for motion compensated bi-prediction have suggested that it is still possible to achieve further rate-distortion performance improvements by adaptively estimating the weighting coefficients of the two predictor blocks. Therefore, this paper presents a comprehensive study of the rate-distortion performance for HEVC-based light field image coding when using different sets of weighting coefficients for self-similarity bi-prediction. Experimental results demonstrate that it is possible to extend the previous theoretical conclusions to light field image coding and show that the proposed adaptive weighting coefficient selection leads to up to 5 % of bit savings compared to the previous self-similarity bi-prediction scheme.info:eu-repo/semantics/acceptedVersio
Recommended from our members
Error control strategies in H.265|HEVC video transmission
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonWith the rapid development in video coding technologies in the last decade, high-resolution video delivery suffers from packet loss due to unreliable transmission channels (time-varying characteristics). The error Resilience approaches at channel coding level are less efficient to implement in real time video transmission as the encoded video samples are in variable code length. Therefore, error resilience in video coding standard plays a vital role to reduce the effect of error propagation and improve the perceived visual quality. The main work in this thesis is to develop an efficient error resilience mechanism for H.265|HEVC video coding standard to reduce the effects of error propagation in error-prone conditions. In this thesis, two error resilience algorithms are proposed. The first one is Adaptive Slice Encoding (ASE) error resilience algorithm. The concept of this algorithm is to extract and protect the most active slices in the coded bitstream based on the adaptive search window. This algorithm can be applied in low delay video transmission with and without using a feedback channel. It is also designed to be compatible with reference coding software manual (HM16) for H.265|HEVC coding standard. The second proposed algorithm is a joint encoder-decoder error resilience called Error resilience based on Supplemental Enhancement Information (ERSEI) algorithm. A feedback message status is used from the decoder to notify the encoder to start encoding clean random-access picture adaptively based on the decoded picture hash message status from the decoder. At the same time, the decoder will be notified to start the error concealment process whilst waiting to receive correct video data. A recovery point message from the decoder feedback channel is used to update the encoder with error messages.
In this thesis, extensive experimental work, evaluation, and comparison with state-of-the-art related algorithms have been conducted to evaluate the proposed algorithms. Furthermore, the best trade-off between the coding efficiency of the proposed error resilience algorithms and error resilience performance has been considered at the design stage. The experimental work evaluation includes both encoding conditions, i.e. error-free and error-prone. The results achieved from the experiments show significant improvements, in (Y-PSNR) results and subjective quality of the decoded bitstream, using the proposed algorithm in error-prone conditions with a variety of packet loss rates.
Moreover, experimental work is conducted to test the algorithms complexity in terms of required processing execution time at both encoding and decoding stages. Additionally, the video coding standard performance for both H.264|AVC and H.265|HEVC coding standards are evaluated in error-free and error-prone environments.
For ASE algorithm and when compared with improved region of interest (IROI) and region of interest (ROI) algorithms, a significant improvement in visual quality was the most obvious finding from the obtained results with PLRs of 2-18 (%).
For ERSEI algorithm and when compared with the default HM16 with pixel copy concealment and motion compensated error concealment (MCEC) techniques, the evaluation results indicate clear visual quality enhancement under different packet loss rates PLRs (1,2 6, 8) %.The Ministry of Higher Education and Scientific Research in Ira
HEVC-based 3D holoscopic video coding using self-similarity compensated prediction
Holoscopic imaging, also known as integral, light field, and plenoptic imaging, is an appealing technology for glassless 3D video systems, which has recently emerged as a prospective candidate for future image and video applications, such as 3D television. However, to successfully introduce 3D holoscopic video applications into the market, adequate coding tools that can efficiently handle 3D holoscopic video are necessary. In this context, this paper discusses the requirements and challenges for 3D holoscopic video coding, and presents an efficient 3D holoscopic coding scheme based on High Efficiency Video Coding (HEVC). The proposed 3D holoscopic codec makes use of the self-similarity (SS) compensated prediction concept to efficiently explore the inherent correlation of the 3D holoscopic content in Intra- and Inter-coded frames, as well as a novel vector prediction scheme to take advantage of the peculiar characteristics of the SS prediction data. Extensive experiments were conducted, and have shown that the proposed solution is able to outperform HEVC as well as other coding solutions proposed in the literature. Moreover, a consistently better performance is also observed for a set of different quality metrics proposed in the literature for 3D holoscopic content, as well as for the visual quality of views synthesized from decompressed 3D holoscopic content.info:eu-repo/semantics/submittedVersio
Recommended from our members
Employing Information and Communications Technologies in Homes and Cities for the Health and Well-Being of Older People
YesHe X and Sheriff RE (Eds.) Employing ICT in Homes and Cities for the Health and Well-Being of Older People. Workshop Proceedings of ICT4HOP’16. 15-17 Aug 2016. Sichuan University, Chengdu, China.British Council, Researcher Links, Newton Fund, NSF
Dense light field coding: a survey
Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems.
Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio
NERV++: An Enhanced Implicit Neural Video Representation
Neural fields, also known as implicit neural representations (INRs), have
shown a remarkable capability of representing, generating, and manipulating
various data types, allowing for continuous data reconstruction at a low memory
footprint. Though promising, INRs applied to video compression still need to
improve their rate-distortion performance by a large margin, and require a huge
number of parameters and long training iterations to capture high-frequency
details, limiting their wider applicability. Resolving this problem remains a
quite challenging task, which would make INRs more accessible in compression
tasks. We take a step towards resolving these shortcomings by introducing
neural representations for videos NeRV++, an enhanced implicit neural video
representation, as more straightforward yet effective enhancement over the
original NeRV decoder architecture, featuring separable conv2d residual blocks
(SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation
skip layer for improved feature representation. NeRV++ allows videos to be
directly represented as a function approximated by a neural network, and
significantly enhance the representation capacity beyond current INR-based
video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets,
achieving competitive results for video compression with INRs. This achievement
narrows the gap to autoencoder-based video coding, marking a significant stride
in INR-based video compression research
Applications in Electronics Pervading Industry, Environment and Society
This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs
C3: High-performance and low-complexity neural compression from a single image or video
Most neural compression models are trained on large datasets of images or
videos in order to generalize to unseen data. Such generalization typically
requires large and expressive architectures with a high decoding complexity.
Here we introduce C3, a neural compression method with strong rate-distortion
(RD) performance that instead overfits a small model to each image or video
separately. The resulting decoding complexity of C3 can be an order of
magnitude lower than neural baselines with similar RD performance. C3 builds on
COOL-CHIC (Ladune et al.) and makes several simple and effective improvements
for images. We further develop new methodology to apply C3 to videos. On the
CLIC2020 image benchmark, we match the RD performance of VTM, the reference
implementation of the H.266 codec, with less than 3k MACs/pixel for decoding.
On the UVG video benchmark, we match the RD performance of the Video
Compression Transformer (Mentzer et al.), a well-established neural video
codec, with less than 5k MACs/pixel for decoding
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
The success of the Neural Radiance Fields (NeRFs) for modeling and free-view
rendering static objects has inspired numerous attempts on dynamic scenes.
Current techniques that utilize neural rendering for facilitating free-view
videos (FVVs) are restricted to either offline rendering or are capable of
processing only brief sequences with minimal motion. In this paper, we present
a novel technique, Residual Radiance Field or ReRF, as a highly compact neural
representation to achieve real-time FVV rendering on long-duration dynamic
scenes. ReRF explicitly models the residual information between adjacent
timestamps in the spatial-temporal feature space, with a global
coordinate-based tiny MLP as the feature decoder. Specifically, ReRF employs a
compact motion grid along with a residual feature grid to exploit inter-frame
feature similarities. We show such a strategy can handle large motions without
sacrificing quality. We further present a sequential training scheme to
maintain the smoothness and the sparsity of the motion/residual grids. Based on
ReRF, we design a special FVV codec that achieves three orders of magnitudes
compression rate and provides a companion ReRF player to support online
streaming of long-duration FVVs of dynamic scenes. Extensive experiments
demonstrate the effectiveness of ReRF for compactly representing dynamic
radiance fields, enabling an unprecedented free-viewpoint viewing experience in
speed and quality.Comment: Accepted by CVPR 2023. Project page, see
https://aoliao12138.github.io/ReRF
- …