147,763 research outputs found
Joint exploration model based light field image coding: A comparative study
© 2017 IEEE. The recent light field imaging technology has been attracting a lot of interests due to its potential applications in a large number of areas including Virtual Reality, Augmented Reality (VR/AR), Teleconferencing, and E-learning. Light Field (LF) data is able to provide rich visual information such as scene rendering with changes in depth of field, viewpoint, and focal length. However, Light Field data usually associates to a critical problem - the massive data. Therefore, compressing LF data is one of the main challenges in LF research. In this context, we present in this paper a comparative study for compressing LF data with not only the widely used image/video coding standards, such as JPEG-2000, H.264/AVC, HEVC and Google/VP9 but also with the most recent image/video coding solution, the Joint Exploration Model. In addition, this paper also proposes a LF image coding flow, which can be used as a benchmark for future LF compression evaluation. Finally, the compression efficiency of these coding solutions is thoroughly compared throughout a rich set of test conditions
Rate Distortion Theory for Causal Video Coding: Characterization, Computation Algorithm, Comparison, and Code Design
Due to the sheer volume of data involved, video coding is an important application of lossy source coding, and has received wide industrial interest and support as evidenced by the development and success of a series of video coding standards. All MPEG-series and H-series video coding standards proposed so far are based upon a video coding paradigm called predictive video coding, where video source frames Xᔹ,i=1,2,...,N, are encoded in a frame by frame manner, the encoder and decoder for each frame Xᔹ, i =1, 2, ..., N, enlist help only from all previous encoded frames Sj, j=1, 2, ..., i-1.
In this thesis, we will look further beyond all existing and proposed video coding standards,
and introduce a new coding paradigm called causal video coding, in which the encoder for each frame Xᔹ
can use all previous original frames Xj, j=1, 2, ..., i-1, and all previous
encoded frames Sj, while the corresponding decoder can use only all
previous encoded frames. We consider all studies, comparisons, and designs on causal video coding
from an information theoretic
point of view.
Let R*c(Dâ,...,D_N) (R*p(Dâ,...,D_N), respectively)
denote the minimum total rate required to achieve a given distortion
level Dâ,...,D_N > 0 in causal video coding (predictive video coding, respectively).
A novel computation
approach is proposed to analytically characterize, numerically
compute, and compare the
minimum total rate of causal video coding R*c(Dâ,...,D_N)
required to achieve a given distortion (quality) level Dâ,...,D_N > 0.
Specifically, we first show that for jointly stationary and ergodic
sources Xâ, ..., X_N, R*c(Dâ,...,D_N) is equal
to the infimum of the n-th order total rate distortion function
R_{c,n}(Dâ,...,D_N) over all n, where
R_{c,n}(Dâ,...,D_N) itself is given by the minimum of an
information quantity over a set of auxiliary random variables. We
then present an iterative algorithm for computing
R_{c,n}(Dâ,...,D_N) and demonstrate the convergence of the
algorithm to the global minimum. The global convergence of the
algorithm further enables us to not only establish a single-letter
characterization of R*c(Dâ,...,D_N) in a novel way when the
N sources are an independent and identically distributed (IID)
vector source, but also demonstrate
a somewhat surprising result (dubbed the more and less coding
theorem)---under some conditions on source frames and distortion,
the more frames need to be encoded and transmitted, the less amount
of data after encoding has to be actually sent.
With the help of the algorithm, it is also shown by example that
R*c(Dâ,...,D_N) is in general much smaller than the total rate
offered by the traditional greedy coding method by which each frame
is encoded in a local optimum manner based on all information
available to the encoder of the frame.
As a by-product, an extended Markov lemma is
established for correlated ergodic sources.
From an information theoretic point of view,
it is interesting to compare causal
video coding and predictive video coding,
which all existing video
coding standards proposed so far are based upon.
In this thesis, by fixing N=3,
we first derive a single-letter characterization
of R*p(Dâ,Dâ,Dâ) for an IID
vector source (Xâ,Xâ,Xâ) where Xâ and Xâ are independent, and then demonstrate the existence of such Xâ,Xâ,Xâ for which R*p(Dâ,Dâ,Dâ)>R*c(Dâ,Dâ,Dâ) under some conditions on source frames and distortion. This result makes causal video coding an attractive framework for future video coding systems and standards.
The design of causal video coding is also considered in the thesis from an information
theoretic perspective by modeling each frame as a stationary information source.
We first put forth a concept called causal scalar quantization, and then
propose an algorithm for designing optimum fixed-rate causal scalar quantizers
for causal video coding to minimize the total distortion among all sources.
Simulation results show that in comparison with fixed-rate predictive scalar quantization,
fixed-rate causal scalar quantization offers as large as 16% quality improvement (distortion reduction)
A Research on Enhancing Reconstructed Frames in Video Codecs
A series of video codecs, combining encoder and decoder, have been developed to improve the human experience of video-on-demand: higher quality videos at lower bitrates. Despite being at the leading of the compression race, the High Efficiency Video Coding (HEVC or H.265), the latest Versatile Video Coding (VVC) standard, and compressive sensing (CS) are still suffering from lossy compression. Lossy compression algorithms approximate input signals by smaller file size but degrade reconstructed data, leaving space for further improvement. This work aims to develop hybrid codecs taking advantage of both state-of-the-art video coding technologies and deep learning techniques: traditional non-learning components will either be replaced or combined with various deep learning models. Note that related studies have not made the most of coding information, this work studies and utilizes more potential resources in both encoder and decoder for further improving different codecs.In the encoder, motion compensated prediction (MCP) is one of the key components that bring high compression ratios to video codecs. For enhancing the MCP performance, modern video codecs offer interpolation filters for fractional motions. However, these handcrafted fractional interpolation filters are designed on ideal signals, which limit the codecs in dealing with real-world video data. This proposal introduces a deep learning approach for all Luma and Chroma fractional pixels, aiming for more accurate motion compensation and coding efficiency.One extraordinary feature of CS compared to other codecs is that CS can recover multiple images at the decoder by applying various algorithms on the one and only coded data. Note that the related works have not made use of this property, this work enables a deep learning-based compressive sensing image enhancement framework using multiple reconstructed signals. Learning to enhance from multiple reconstructed images delivers a valuable mechanism for training deep neural networks while requiring no additional transmitted data.In the encoder and decoder of modern video coding standards, in-loop filters (ILF) dedicate the most important role in producing the final reconstructed image quality and compression rate. This work introduces a deep learning approach for improving the handcrafted ILF for modern video coding standards. We first utilize various coding resources and present novel deep learning-based ILF. Related works perform the rate-distortion-based ILF mode selection at the coding-tree-unit (CTU) level to further enhance the deep learning-based ILF, and the corresponding bits are encoded and transmitted to the decoder. In this work, we move towards a deeper approach: a reinforcement-learning based autonomous ILF mode selection scheme is presented, enabling the ability to adapt to different coding unit (CU) levels. Using this approach, we require no additional bits while ensuring the best image quality at local levels beyond the CTU level.While this research mainly targets improving the recent video coding standard VVC and the sparse-based CS, it is also flexibly designed to adapt the previous and future video coding standards with minor modifications.ć棫ïŒć·„ćŠïŒæłæżć€§ćŠ (Hosei University
Recommended from our members
Multimedia delivery in the future internet
The term âNetworked Mediaâ implies that all kinds of media including text, image, 3D graphics, audio
and video are produced, distributed, shared, managed and consumed on-line through various networks,
like the Internet, Fiber, WiFi, WiMAX, GPRS, 3G and so on, in a convergent manner [1]. This white
paper is the contribution of the Media Delivery Platform (MDP) cluster and aims to cover the Networked
challenges of the Networked Media in the transition to the Future of the Internet.
Internet has evolved and changed the way we work and live. End users of the Internet have been confronted
with a bewildering range of media, services and applications and of technological innovations concerning
media formats, wireless networks, terminal types and capabilities. And there is little evidence that the pace
of this innovation is slowing. Today, over one billion of users access the Internet on regular basis, more
than 100 million users have downloaded at least one (multi)media file and over 47 millions of them do so
regularly, searching in more than 160 Exabytes1 of content. In the near future these numbers are expected
to exponentially rise. It is expected that the Internet content will be increased by at least a factor of 6, rising
to more than 990 Exabytes before 2012, fuelled mainly by the users themselves. Moreover, it is envisaged
that in a near- to mid-term future, the Internet will provide the means to share and distribute (new)
multimedia content and services with superior quality and striking flexibility, in a trusted and personalized
way, improving citizensâ quality of life, working conditions, edutainment and safety.
In this evolving environment, new transport protocols, new multimedia encoding schemes, cross-layer inthe
network adaptation, machine-to-machine communication (including RFIDs), rich 3D content as well as
community networks and the use of peer-to-peer (P2P) overlays are expected to generate new models of
interaction and cooperation, and be able to support enhanced perceived quality-of-experience (PQoE) and
innovative applications âon the moveâ, like virtual collaboration environments, personalised services/
media, virtual sport groups, on-line gaming, edutainment. In this context, the interaction with content
combined with interactive/multimedia search capabilities across distributed repositories, opportunistic P2P
networks and the dynamic adaptation to the characteristics of diverse mobile terminals are expected to
contribute towards such a vision.
Based on work that has taken place in a number of EC co-funded projects, in Framework Program 6 (FP6)
and Framework Program 7 (FP7), a group of experts and technology visionaries have voluntarily
contributed in this white paper aiming to describe the status, the state-of-the art, the challenges and the way
ahead in the area of Content Aware media delivery platforms
Multiview Video Coding for Virtual Reality
Virtual reality (VR) is one of the emerging technologies in recent years. It brings a sense of real world experience in simulated environments, hence, it is being used in many applications for example in live sporting events, music recordings and in many other interactive multimedia applications. VR makes use of multimedia content, and videos are a major part of it. VR videos are captured from multiple directions to cover the entire 360 field-of-view. It usually employs, multiple cameras of wide field-of-view such as fisheye lenses and the camera arrangement can also vary from linear to spherical set-ups. Videos in VR system are also subjected to constraints such as, variations in network bandwidth, heterogeneous mobile devices with limited decoding capacity, adaptivity for view switching in the display.
The uncompressed videos from multiview cameras are redundant and impractical for storage and transmission. The existing video coding standards compresses the multiview videos effi ciently. However, VR systems place certain limitations on the video and camera arrangements, such as, it assumes rectilinear properties for video, translational motion model for prediction and the camera set-up to be linearly arranged.
The aim of the thesis is to propose coding schemes which are compliant to the current video coding standards of H.264/AVC and its successor H.265/HEVC, the current state-of-the-art and multiview/scalable extensions. This thesis presents methods that compress the multiview videos which are captured from eight cameras that are arranged spherically, pointing radially outwards. The cameras produce circular fi sheye videos of 195 degree field-of-view. The final goal is to present methods, which optimize the bitrate in both storage and transmission of videos for the VR system.
The presented methods can be categorized into two groups: optimizing storage bitrate and optimizing streaming bitrate of multiview videos. In the storage bitrate category, six methods were experimented. The presented methods competed against simulcast coding of individual views. The coding schemes were experimented with two data sets of 8 views each. The method of scalable coding with inter-layer prediction in all frames outperformed simulcast coding with approximately 7.9%. In the case of optimizing streaming birates, five methods were experimented. The method of scalable plus multiview skip-coding outperformed the simulcast method of coding by 36% on average.
Future work will focus on pre-processing the fi sheye videos to rectilinear videos, in-order to fit them to the current translational model of the video coding standards. Moreover, the methods will be tested in comprehensive applications and system requirements
Overview of MV-HEVC prediction structures for light field video
Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions
- âŠ