Search CORE

147,763 research outputs found

Joint exploration model based light field image coding: A comparative study

Author: Cong HP
Hoang Van X
Perry S
Vu TA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/11/2017
Field of study

© 2017 IEEE. The recent light field imaging technology has been attracting a lot of interests due to its potential applications in a large number of areas including Virtual Reality, Augmented Reality (VR/AR), Teleconferencing, and E-learning. Light Field (LF) data is able to provide rich visual information such as scene rendering with changes in depth of field, viewpoint, and focal length. However, Light Field data usually associates to a critical problem - the massive data. Therefore, compressing LF data is one of the main challenges in LF research. In this context, we present in this paper a comparative study for compressing LF data with not only the widely used image/video coding standards, such as JPEG-2000, H.264/AVC, HEVC and Google/VP9 but also with the most recent image/video coding solution, the Joint Exploration Model. In addition, this paper also proposes a LF image coding flow, which can be used as a benchmark for future LF compression evaluation. Finally, the compression efficiency of these coding solutions is thoroughly compared throughout a rich set of test conditions

Crossref

OPUS - University of Technology Sydney

Rate Distortion Theory for Causal Video Coding: Characterization, Computation Algorithm, Comparison, and Code Design

Author: Zheng Lin
Publication venue: 'University of Waterloo'
Publication date: 01/01/2012
Field of study

Due to the sheer volume of data involved, video coding is an important application of lossy source coding, and has received wide industrial interest and support as evidenced by the development and success of a series of video coding standards. All MPEG-series and H-series video coding standards proposed so far are based upon a video coding paradigm called predictive video coding, where video source frames Xᵢ,i=1,2,...,N, are encoded in a frame by frame manner, the encoder and decoder for each frame Xᵢ, i =1, 2, ..., N, enlist help only from all previous encoded frames Sj, j=1, 2, ..., i-1. In this thesis, we will look further beyond all existing and proposed video coding standards, and introduce a new coding paradigm called causal video coding, in which the encoder for each frame Xᵢ can use all previous original frames Xj, j=1, 2, ..., i-1, and all previous encoded frames Sj, while the corresponding decoder can use only all previous encoded frames. We consider all studies, comparisons, and designs on causal video coding from an information theoretic point of view. Let R*c(D₁,...,D_N) (R*p(D₁,...,D_N), respectively) denote the minimum total rate required to achieve a given distortion level D₁,...,D_N > 0 in causal video coding (predictive video coding, respectively). A novel computation approach is proposed to analytically characterize, numerically compute, and compare the minimum total rate of causal video coding R*c(D₁,...,D_N) required to achieve a given distortion (quality) level D₁,...,D_N > 0. Specifically, we first show that for jointly stationary and ergodic sources X₁, ..., X_N, R*c(D₁,...,D_N) is equal to the infimum of the n-th order total rate distortion function R_{c,n}(D₁,...,D_N) over all n, where R_{c,n}(D₁,...,D_N) itself is given by the minimum of an information quantity over a set of auxiliary random variables. We then present an iterative algorithm for computing R_{c,n}(D₁,...,D_N) and demonstrate the convergence of the algorithm to the global minimum. The global convergence of the algorithm further enables us to not only establish a single-letter characterization of R*c(D₁,...,D_N) in a novel way when the N sources are an independent and identically distributed (IID) vector source, but also demonstrate a somewhat surprising result (dubbed the more and less coding theorem)---under some conditions on source frames and distortion, the more frames need to be encoded and transmitted, the less amount of data after encoding has to be actually sent. With the help of the algorithm, it is also shown by example that R*c(D₁,...,D_N) is in general much smaller than the total rate offered by the traditional greedy coding method by which each frame is encoded in a local optimum manner based on all information available to the encoder of the frame. As a by-product, an extended Markov lemma is established for correlated ergodic sources. From an information theoretic point of view, it is interesting to compare causal video coding and predictive video coding, which all existing video coding standards proposed so far are based upon. In this thesis, by fixing N=3, we first derive a single-letter characterization of R*p(D₁,D₂,D₃) for an IID vector source (X₁,X₂,X₃) where X₁ and X₂ are independent, and then demonstrate the existence of such X₁,X₂,X₃ for which R*p(D₁,D₂,D₃)>R*c(D₁,D₂,D₃) under some conditions on source frames and distortion. This result makes causal video coding an attractive framework for future video coding systems and standards. The design of causal video coding is also considered in the thesis from an information theoretic perspective by modeling each frame as a stationary information source. We first put forth a concept called causal scalar quantization, and then propose an algorithm for designing optimum fixed-rate causal scalar quantizers for causal video coding to minimize the total distortion among all sources. Simulation results show that in comparison with fixed-rate predictive scalar quantization, fixed-rate causal scalar quantization offers as large as 16% quality improvement (distortion reduction)

University of Waterloo's Institutional Repository

A Research on Enhancing Reconstructed Frames in Video Codecs

Author: PHAM DO Kim Chi
Publication venue
Publication date: 15/09/2022
Field of study

A series of video codecs, combining encoder and decoder, have been developed to improve the human experience of video-on-demand: higher quality videos at lower bitrates. Despite being at the leading of the compression race, the High Efficiency Video Coding (HEVC or H.265), the latest Versatile Video Coding (VVC) standard, and compressive sensing (CS) are still suffering from lossy compression. Lossy compression algorithms approximate input signals by smaller file size but degrade reconstructed data, leaving space for further improvement. This work aims to develop hybrid codecs taking advantage of both state-of-the-art video coding technologies and deep learning techniques: traditional non-learning components will either be replaced or combined with various deep learning models. Note that related studies have not made the most of coding information, this work studies and utilizes more potential resources in both encoder and decoder for further improving different codecs.In the encoder, motion compensated prediction (MCP) is one of the key components that bring high compression ratios to video codecs. For enhancing the MCP performance, modern video codecs offer interpolation filters for fractional motions. However, these handcrafted fractional interpolation filters are designed on ideal signals, which limit the codecs in dealing with real-world video data. This proposal introduces a deep learning approach for all Luma and Chroma fractional pixels, aiming for more accurate motion compensation and coding efficiency.One extraordinary feature of CS compared to other codecs is that CS can recover multiple images at the decoder by applying various algorithms on the one and only coded data. Note that the related works have not made use of this property, this work enables a deep learning-based compressive sensing image enhancement framework using multiple reconstructed signals. Learning to enhance from multiple reconstructed images delivers a valuable mechanism for training deep neural networks while requiring no additional transmitted data.In the encoder and decoder of modern video coding standards, in-loop filters (ILF) dedicate the most important role in producing the final reconstructed image quality and compression rate. This work introduces a deep learning approach for improving the handcrafted ILF for modern video coding standards. We first utilize various coding resources and present novel deep learning-based ILF. Related works perform the rate-distortion-based ILF mode selection at the coding-tree-unit (CTU) level to further enhance the deep learning-based ILF, and the corresponding bits are encoded and transmitted to the decoder. In this work, we move towards a deeper approach: a reinforcement-learning based autonomous ILF mode selection scheme is presented, enabling the ability to adapt to different coding unit (CU) levels. Using this approach, we require no additional bits while ensuring the best image quality at local levels beyond the CTU level.While this research mainly targets improving the recent video coding standard VVC and the sparse-based CS, it is also flexibly designed to adapt the previous and future video coding standards with minor modifications.博士（工学）法政大学 (Hosei University

Hosei University Repository

Recommended from our members

Multimedia delivery in the future internet

Author: Aggoun A
Amon P
Arbel I
Chernilov A
Cosmas J
Garcia G
Jari A
Keller S
Kontopoulos C
Lamy-Bergot C
Leon A
Mattavelli M
Mauthe A
Mota T
Naumann M
Navarro A
Negru O
Pinto F
Shao B
Timmerer C
Tsekleves E
Zahariadis T
Publication venue: 'Society for Leukocyte Biology'
Publication date: 01/01/2008
Field of study

The term “Networked Media” implies that all kinds of media including text, image, 3D graphics, audio and video are produced, distributed, shared, managed and consumed on-line through various networks, like the Internet, Fiber, WiFi, WiMAX, GPRS, 3G and so on, in a convergent manner [1]. This white paper is the contribution of the Media Delivery Platform (MDP) cluster and aims to cover the Networked challenges of the Networked Media in the transition to the Future of the Internet. Internet has evolved and changed the way we work and live. End users of the Internet have been confronted with a bewildering range of media, services and applications and of technological innovations concerning media formats, wireless networks, terminal types and capabilities. And there is little evidence that the pace of this innovation is slowing. Today, over one billion of users access the Internet on regular basis, more than 100 million users have downloaded at least one (multi)media file and over 47 millions of them do so regularly, searching in more than 160 Exabytes1 of content. In the near future these numbers are expected to exponentially rise. It is expected that the Internet content will be increased by at least a factor of 6, rising to more than 990 Exabytes before 2012, fuelled mainly by the users themselves. Moreover, it is envisaged that in a near- to mid-term future, the Internet will provide the means to share and distribute (new) multimedia content and services with superior quality and striking flexibility, in a trusted and personalized way, improving citizens’ quality of life, working conditions, edutainment and safety. In this evolving environment, new transport protocols, new multimedia encoding schemes, cross-layer inthe network adaptation, machine-to-machine communication (including RFIDs), rich 3D content as well as community networks and the use of peer-to-peer (P2P) overlays are expected to generate new models of interaction and cooperation, and be able to support enhanced perceived quality-of-experience (PQoE) and innovative applications “on the move”, like virtual collaboration environments, personalised services/ media, virtual sport groups, on-line gaming, edutainment. In this context, the interaction with content combined with interactive/multimedia search capabilities across distributed repositories, opportunistic P2P networks and the dynamic adaptation to the characteristics of diverse mobile terminals are expected to contribute towards such a vision. Based on work that has taken place in a number of EC co-funded projects, in Framework Program 6 (FP6) and Framework Program 7 (FP7), a group of experts and technology visionaries have voluntarily contributed in this white paper aiming to describe the status, the state-of-the art, the challenges and the way ahead in the area of Content Aware media delivery platforms

Brunel University Research Archive

Multiview Video Coding for Virtual Reality

Author: Kashyap Kammachi Sreedhar
Publication venue
Publication date: 13/01/2016
Field of study

Virtual reality (VR) is one of the emerging technologies in recent years. It brings a sense of real world experience in simulated environments, hence, it is being used in many applications for example in live sporting events, music recordings and in many other interactive multimedia applications. VR makes use of multimedia content, and videos are a major part of it. VR videos are captured from multiple directions to cover the entire 360 field-of-view. It usually employs, multiple cameras of wide field-of-view such as fisheye lenses and the camera arrangement can also vary from linear to spherical set-ups. Videos in VR system are also subjected to constraints such as, variations in network bandwidth, heterogeneous mobile devices with limited decoding capacity, adaptivity for view switching in the display. The uncompressed videos from multiview cameras are redundant and impractical for storage and transmission. The existing video coding standards compresses the multiview videos effi ciently. However, VR systems place certain limitations on the video and camera arrangements, such as, it assumes rectilinear properties for video, translational motion model for prediction and the camera set-up to be linearly arranged. The aim of the thesis is to propose coding schemes which are compliant to the current video coding standards of H.264/AVC and its successor H.265/HEVC, the current state-of-the-art and multiview/scalable extensions. This thesis presents methods that compress the multiview videos which are captured from eight cameras that are arranged spherically, pointing radially outwards. The cameras produce circular fi sheye videos of 195 degree field-of-view. The final goal is to present methods, which optimize the bitrate in both storage and transmission of videos for the VR system. The presented methods can be categorized into two groups: optimizing storage bitrate and optimizing streaming bitrate of multiview videos. In the storage bitrate category, six methods were experimented. The presented methods competed against simulcast coding of individual views. The coding schemes were experimented with two data sets of 8 views each. The method of scalable coding with inter-layer prediction in all frames outperformed simulcast coding with approximately 7.9%. In the case of optimizing streaming birates, five methods were experimented. The method of scalable plus multiview skip-coding outperformed the simulcast method of coding by 36% on average. Future work will focus on pre-processing the fi sheye videos to rectilinear videos, in-order to fit them to the current translational model of the video coding standards. Moreover, the methods will be tested in comprehensive applications and system requirements

Trepo - Institutional Repository of Tampere University

Digital recording of performing arts: formats and conversion

Author: Barbarien Joeri
Coppens SamUGent000141139848802000264764977310338918F99AD20E-F0ED-11E1-A9DE-61C894A0A6B4
De Cock Jan001999328230801001861346F71F8358-F0ED-11E1-A9DE-61C894A0A6B4
Evens TomeditorPS010020011964888020000000360000-0002-7274-7432F83A851C-F0ED-11E1-A9DE-61C894A0A6B4
Jacobs Marc
Mannens ErikTW060019873040688010020739380000-0001-7946-4884F80364A6-F0ED-11E1-A9DE-61C894A0A6B4
Moreels DrieseditorCA208020010888600000-0002-5297-107437DCF06A-F0EE-11E1-A9DE-61C894A0A6B4
Notebaert StijnUGent801001861750002001630362053248469389F71FEFC8-F0ED-11E1-A9DE-61C894A0A6B4
Schelkens Peter
Van de Walle RikCA01CA05TW068010009619730000-0002-7491-5145F4CFF146-F0ED-11E1-A9DE-61C894A0A6B4
Publication venue: Vlaams Theater Instituut (VTI)
Publication date: 01/01/2009
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Overview of MV-HEVC prediction structures for light field video

Author: Avramelos Vasileios
Lambert Peter
Van Wallendael Glenn
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2019
Field of study

Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions

Ghent University Academic Bibliography