1,846 research outputs found
Foveated Video Streaming for Cloud Gaming
Good user experience with interactive cloud-based multimedia applications,
such as cloud gaming and cloud-based VR, requires low end-to-end latency and
large amounts of downstream network bandwidth at the same time. In this paper,
we present a foveated video streaming system for cloud gaming. The system
adapts video stream quality by adjusting the encoding parameters on the fly to
match the player's gaze position. We conduct measurements with a prototype that
we developed for a cloud gaming system in conjunction with eye tracker
hardware. Evaluation results suggest that such foveated streaming can reduce
bandwidth requirements by even more than 50% depending on parametrization of
the foveated video coding and that it is feasible from the latency perspective.Comment: Submitted to: IEEE 19th International Workshop on Multimedia Signal
Processin
JND-Based Perceptual Video Coding for 4:4:4 Screen Content Data in HEVC
The JCT-VC standardized Screen Content Coding (SCC) extension in the HEVC HM
RExt + SCM reference codec offers an impressive coding efficiency performance
when compared with HM RExt alone; however, it is not significantly perceptually
optimized. For instance, it does not include advanced HVS-based perceptual
coding methods, such as JND-based spatiotemporal masking schemes. In this
paper, we propose a novel JND-based perceptual video coding technique for HM
RExt + SCM. The proposed method is designed to further improve the compression
performance of HM RExt + SCM when applied to YCbCr 4:4:4 SC video data. In the
proposed technique, luminance masking and chrominance masking are exploited to
perceptually adjust the Quantization Step Size (QStep) at the Coding Block (CB)
level. Compared with HM RExt 16.10 + SCM 8.0, the proposed method considerably
reduces bitrates (Kbps), with a maximum reduction of 48.3%. In addition to
this, the subjective evaluations reveal that SC-PAQ achieves visually lossless
coding at very low bitrates.Comment: Preprint: 2018 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP 2018
Spatiotemporal adaptive quantization for the perceptual video coding of RGB 4:4:4 data
Due to the spectral sensitivity phenomenon of the Human Visual System (HVS), the color channels of raw RGB 4:4:4 sequences contain significant psychovisual redundancies; these redundancies can be perceptually quantized. The default quantization systems in the HEVC standard are known as Uniform Reconstruction Quantization (URQ) and Rate Distortion Optimized Quantization (RDOQ); URQ and RDOQ are not perceptually optimized for the coding of RGB 4:4:4 video data. In this paper, we propose a novel spatiotemporal perceptual quantization technique named SPAQ. With application for RGB 4:4:4 video data, SPAQ exploits HVS spectral sensitivity-related color masking in addition to spatial masking and temporal masking; SPAQ operates at the Coding Block (CB) level and the Prediction Unit (PU) level. The proposed technique perceptually adjusts the Quantization Step Size (QStep) at the CB level if high variance spatial data in G, B and R CBs is detected and also if high motion vector magnitudes in PUs are detected. Compared with anchor 1 (HEVC HM 16.17 RExt), SPAQ considerably reduces bitrates with a maximum reduction of approximately 80%. The Mean Opinion Score (MOS) in the subjective evaluations, in addition to the SSIM scores, show that SPAQ successfully achieves perceptually lossless compression compared with anchors
Neural Video Compression with Diverse Contexts
For any video codecs, the coding efficiency highly relies on whether the
current signal to be encoded can find the relevant contexts from the previous
reconstructed signals. Traditional codec has verified more contexts bring
substantial coding gain, but in a time-consuming manner. However, for the
emerging neural video codec (NVC), its contexts are still limited, leading to
low compression ratio. To boost NVC, this paper proposes increasing the context
diversity in both temporal and spatial dimensions. First, we guide the model to
learn hierarchical quality patterns across frames, which enriches long-term and
yet high-quality temporal contexts. Furthermore, to tap the potential of
optical flow-based coding framework, we introduce a group-based offset
diversity where the cross-group interaction is proposed for better context
mining. In addition, this paper also adopts a quadtree-based partition to
increase spatial context diversity when encoding the latent representation in
parallel. Experiments show that our codec obtains 23.5% bitrate saving over
previous SOTA NVC. Better yet, our codec has surpassed the under-developing
next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in
terms of PSNR. The codes are at https://github.com/microsoft/DCVC.Comment: Accepted by CVPR 2023. Codes are at https://github.com/microsoft/DCV
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
RLFC: Random Access Light Field Compression using Key Views and Bounded Integer Encoding
We present a new hierarchical compression scheme for encoding light field
images (LFI) that is suitable for interactive rendering. Our method (RLFC)
exploits redundancies in the light field images by constructing a tree
structure. The top level (root) of the tree captures the common high-level
details across the LFI, and other levels (children) of the tree capture
specific low-level details of the LFI. Our decompressing algorithm corresponds
to tree traversal operations and gathers the values stored at different levels
of the tree. Furthermore, we use bounded integer sequence encoding which
provides random access and fast hardware decoding for compressing the blocks of
children of the tree. We have evaluated our method for 4D two-plane
parameterized light fields. The compression rates vary from 0.08 - 2.5 bits per
pixel (bpp), resulting in compression ratios of around 200:1 to 20:1 for a PSNR
quality of 40 to 50 dB. The decompression times for decoding the blocks of LFI
are 1 - 3 microseconds per channel on an NVIDIA GTX-960 and we can render new
views with a resolution of 512X512 at 200 fps. Our overall scheme is simple to
implement and involves only bit manipulations and integer arithmetic
operations.Comment: Accepted for publication at Symposium on Interactive 3D Graphics and
Games (I3D '19
- …