27 research outputs found
HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos
We present a no-reference video quality model and algorithm that delivers
standout performance for High Dynamic Range (HDR) videos, which we call
HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and
colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in
massively scaled video networks has driven the need for video quality
assessment (VQA) algorithms that better account for distortions on HDR content.
In particular, standard VQA models may fail to capture conspicuous distortions
at the extreme ends of the dynamic range, because the features that drive them
may be dominated by distortions {that pervade the mid-ranges of the signal}. We
introduce a new approach whereby a local expansive nonlinearity emphasizes
distortions occurring at the higher and lower ends of the {local} luma range,
allowing for the definition of additional quality-aware features that are
computed along a separate path. These features are not HDR-specific, and also
improve VQA on SDR video contents, albeit to a reduced degree. We show that
this preprocessing step significantly boosts the power of distortion-sensitive
natural video statistics (NVS) features when used to predict the quality of HDR
content. In similar manner, we separately compute novel wide-gamut color
features using the same nonlinear processing steps. We have found that our
model significantly outperforms SDR VQA algorithms on the only publicly
available, comprehensive HDR database, while also attaining state-of-the-art
performance on SDR content
Making Video Quality Assessment Models Robust to Bit Depth
We introduce a novel feature set, which we call HDRMAX features, that when
included into Video Quality Assessment (VQA) algorithms designed for Standard
Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic
Range (HDR) videos that are inadequately accounted for by these algorithms.
While these features are not specific to HDR, and also augment the equality
prediction performances of VQA models on SDR content, they are especially
effective on HDR. HDRMAX features modify powerful priors drawn from Natural
Video Statistics (NVS) models by enhancing their measurability where they
visually impact the brightest and darkest local portions of videos, thereby
capturing distortions that are often poorly accounted for by existing VQA
models. As a demonstration of the efficacy of our approach, we show that, while
current state-of-the-art VQA models perform poorly on 10-bit HDR databases,
their performances are greatly improved by the inclusion of HDRMAX features
when tested on HDR and 10-bit distorted videos.Comment: Published in IEEE Signal Processing Letters 202
HDR or SDR? A Subjective and Objective Study of Scaled and Compressed Videos
We conducted a large-scale study of human perceptual quality judgments of
High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to
scaling and compression levels and viewed on three different display devices.
HDR videos are able to present wider color gamuts, better contrasts, and
brighter whites and darker blacks than SDR videos. While conventional
expectations are that HDR quality is better than SDR quality, we have found
subject preference of HDR versus SDR depends heavily on the display device, as
well as on resolution scaling and bitrate. To study this question, we collected
more than 23,000 quality ratings from 67 volunteers who watched 356 videos on
OLED, QLED, and LCD televisions. Since it is of interest to be able to measure
the quality of videos under these scenarios, e.g. to inform decisions regarding
scaling, compression, and SDR vs HDR, we tested several well-known
full-reference and no-reference video quality models on the new database.
Towards advancing progress on this problem, we also developed a novel
no-reference model called HDRPatchMAX, that uses both classical and bit-depth
sensitive distortion statistics more accurately than existing metrics
NETWORKING FOR IMMERSIVE TELEPRESENCE: ARCHITECTURES AND PROTOCOLS- A CASE STUDY
Immersive telepresence allows an observer to view a remote scene from any viewpoint of choice and thus give a sense of presence, as opposed to conventional video viewing where the user can only view content from the viewpoint of a single camera. We specifically consider a depth-based rendering approach for enabling telepresence. In this paper, we explore various networking architectures in which telepresence can be enabled over the Internet using this approach. From these architectures, we derive a set of requirements for describing and conducting a telepresence session. Based on these requirements, we present a session protocol that draws upon the features of RTSP and SDP and extends them. Various media streams from a single camera viewpoint are aggregated as a view group and the concept of this view group is used to support the various architectures. Both unicast and multicast configurations, with central or distributed servers, are supported by this protocol. Finally, we present the overall end-to-end architecture used in our current implementation. 1
The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map
Compression can be achieved by exploiting knowledge both internal and external to a given image or video source. In this paper, we present means for generating and exploiting the specific external knowledge of a 3D stereoscopic depth map of the given scene to compress the given monoscopic source. Several instances in which the depth map can potentially increase compression or provide improved functionality are presented to motivate further work along this line of reasoning. 1. INTRODUCTION The primary goal of any image (2D) or video (2D+t) compression algorithm is to generate a representation of the source that is smaller than the source's raw bitmap. The ability to compactly represent the source implies knowledge, about the source content, that is in some sense deeper than the raw pixel intensity arrays. In this paper we distinguish between the internal and external knowledge about the source. Both of them facilitate description, and thus enable compression. Internal knowledge is kn..
The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by Exploiting the
Compression can be achieved by exploiting knowledge both internal and external to a given image or video source. In this paper, we present means for generating and exploiting the specific external knowledge of a 3D stereoscopic depth map of the given scene to compress the given monoscopic source. Several instances in which the depth map can potentially increase compression or provide improved functionality are presented to motivate further work along this line of reasoning. 1
Multiresolution Based Hierarchical Disparity Estimation for Stereo Image Pair Compression
this paper a multiresolution based approach is proposed for compressing `still' stereo image pairs. In Section II the task at hand is contrasted with the stereo disparity estimation problem in the machine vision community; a block based scheme on the lines of a motion estimation scheme is suggested as a possible approach. In Section III, the suitability of hierarchical techniques for disparity estimation is outlined. Section IV provides an overview of wavelet decomposition. Section V details the multiresolution approach taken. In section VI, the typical computational gains and compression ratios possible with this scheme are computed. Subjective and objective evaluations of several different compressed stereo image pairs highlight the efficacy of the proposed compression scheme. Possible extensions of this approach to stereo image sequence compression are discussed in the last section
A depth map representation for real-time transmission and view-based rendering of a dynamic 3d scene
Abstrac
Segmentation Based Coding Of Stereoscopic Image Sequences
A binocular disparity based segmentation scheme to compactly represent one image of a stereoscopic image pair given the other image was proposed earlier by us. That scheme adapted the excess bitcount, needed to code the additional image, to the binocular disparity detail present in the image pair. This paper addresses the issue of extending such a segmentation in the temporal dimension to achieve efficient stereoscopic sequence compression. The easiest conceivable temporal extension would be to code one of the sequences using an MPEG-type scheme while the frames of the other stream are coded based on the segmentation. However such independent compression of one of the streams fails to take advantage of the segmentation or the additional disparity information available. To achieve better compression by exploiting this additional information, we propose the following scheme. Each frame in one of the streams is segmented based on disparity. An MPEG-type frame structure is used for motion..