9 research outputs found
Foveated Video Streaming for Cloud Gaming
Good user experience with interactive cloud-based multimedia applications,
such as cloud gaming and cloud-based VR, requires low end-to-end latency and
large amounts of downstream network bandwidth at the same time. In this paper,
we present a foveated video streaming system for cloud gaming. The system
adapts video stream quality by adjusting the encoding parameters on the fly to
match the player's gaze position. We conduct measurements with a prototype that
we developed for a cloud gaming system in conjunction with eye tracker
hardware. Evaluation results suggest that such foveated streaming can reduce
bandwidth requirements by even more than 50% depending on parametrization of
the foveated video coding and that it is feasible from the latency perspective.Comment: Submitted to: IEEE 19th International Workshop on Multimedia Signal
Processin
Foveated Video Streaming for Cloud Gaming
Video gaming is generally a computationally intensive application and to provide a pleasant user experience specialized hardware like Graphic Processing Units may be required. Computational resources and power consumption are constraints which limit visually complex gaming on, for example, laptops, tablets and smart phones. Cloud gaming may be a possible approach towards providing a pleasant gaming experience on thin clients which have limited computational and energy resources. In a cloud gaming architecture, the game-play video is rendered and encoded in the cloud and streamed to a client where it is displayed. User inputs are captured at the client and streamed back to the server, where they are relayed to the game. High quality of experience requires the streamed video to be of high visual quality which translates to substantial downstream bandwidth requirements. The visual perception of the human eye is non-uniform, being maximum along the optical axis of the eye and dropping off rapidly away from it. This phenomenon, called foveation, makes the practice of encoding all areas of a video frame with the same resolution wasteful.
In this thesis, foveated video streaming from a cloud gaming server to a cloud gaming client is investigated. A prototype cloud gaming system with foveated video streaming is implemented. The cloud gaming server of the prototype is configured to encode gameplay video in a foveated fashion based on gaze location data provided by the cloud gaming client. The effect of foveated encoding on the output bitrate of the streamed video is investigated. Measurements are performed using games from various genres and with different player points of view to explore changes in video bitrate with different parameters of foveation. Latencies involved in foveated video streaming for cloud gaming, including latency of the eye tracker used in the thesis, are also briefly discussed
On the Interplay of Foveated Rendering and Video Encoding
Publisher Copyright: © 2020 Owner/Author.Humans have sharp central vision but low peripheral visual acuity. Prior work has taken advantage of this phenomenon in two ways: foveated rendering (FR) reduces the computational workload of rendering by producing lower visual quality for peripheral regions and foveated video encoding (FVE) reduces the bitrate of streamed video through heavier compression of peripheral regions. Remote rendering systems require both rendering and video encoding and the two techniques can be combined to reduce both computing and bandwidth consumption. We report early results from such a combination with remote VR rendering. The results highlight that FR causes large bitrate overhead when combined with normal video encoding but combining it with FVE can mitigate it.Peer reviewe
Foveated Streaming of Real-Time Graphics
Remote rendering systems comprise powerful servers that render graphics on behalf of low-end client devices and stream the graphics as compressed video, enabling high end gaming and Virtual Reality on those devices. One key challenge with them is the amount of bandwidth required for streaming high quality video. Humans have spatially non-uniform visual acuity: We have sharp central vision but our ability to discern details rapidly decreases with angular distance from the point of gaze. This phenomenon called foveation can be taken advantage of to reduce the need for bandwidth. In this paper, we study three different methods to produce a foveated video stream of real-time rendered graphics in a remote rendered system: 1) foveated shading as part of the rendering pipeline, 2) foveation as post processing step after rendering and before video encoding, 3) foveated video encoding. We report results from a number of experiments with these methods. They suggest that foveated rendering alone does not help save bandwidth. Instead, the two other methods decrease the resulting video bitrate significantly but they also have different quality per bit and latency profiles, which makes them desirable solutions in slightly different situations.Peer reviewe
Video Caching, Analytics and Delivery at the Wireless Edge: A Survey and Future Directions
Future wireless networks will provide high bandwidth, low-latency, and ultra-reliable Internet connectivity to meet the requirements of different applications, ranging from mobile broadband to the Internet of Things. To this aim, mobile edge caching, computing, and communication (edge-C3) have emerged to bring network resources (i.e., bandwidth, storage, and computing) closer to end users. Edge-C3 allows improving the network resource utilization as well as the quality of experience (QoE) of end users. Recently, several video-oriented mobile applications (e.g., live content sharing, gaming, and augmented reality) have leveraged edge-C3 in diverse scenarios involving video streaming in both the downlink and the uplink. Hence, a large number of recent works have studied the implications of video analysis and streaming through edge-C3. This article presents an in-depth survey on video edge-C3 challenges and state-of-the-art solutions in next-generation wireless and mobile networks. Specifically, it includes: a tutorial on video streaming in mobile networks (e.g., video encoding and adaptive bitrate streaming); an overview of mobile network architectures, enabling technologies, and applications for video edge-C3; video edge computing and analytics in uplink scenarios (e.g., architectures, analytics, and applications); and video edge caching, computing and communication methods in downlink scenarios (e.g., collaborative, popularity-based, and context-aware). A new taxonomy for video edge-C3 is proposed and the major contributions of recent studies are first highlighted and then systematically compared. Finally, several open problems and key challenges for future research are outlined
On Improving QoE of Remote Rendered Graphics
A new class of interactive multimedia experiences leverages real-time remote rendering with video encoding to provide high quality visual experiences on low end devices, the so called thin-clients. The basic architecture entails off-loading some or all the rendering calculations of a complex computer graphics scene to a remote server, often a cloud graphics server, which renders the scene, encodes it and sends it to a client as video. The video is then decoded by the thin-client and displayed to a user. Cloud gaming and Cloud Virtual Reality (VR) are two example use cases of such experiences. These applications have two principal constraints: downstream bandwidth and motion to photon (M2P) latency. Quality of experience (QoE) of such applications can be improved by reducing the downstream bandwidth needed for a given visual quality of the encoded video and by reducing the perceived M2P latency; that is the perceived latency between user action and corresponding frame update at the client. In this thesis, we investigate avenues to improve QoE of remotely rendered graphics applications by addressing the above constraints. We evaluate the feasibility of leveraging the characteristics of the Human Visual System (HVS) to reduce the downstream bandwidth needed for streaming high quality graphics videos. Specifically, we investigate the phenomenon of foveation in the context of real time video encoding and evaluate different parameterizations and schemes of foveated video encoding (FVE). We also investigate whether synergies exist between FVE and foveated rendering (FR). To address the challenge of low latency requirements for interactive remotely rendered graphics applications, we investigate Machine Learning (ML) based approaches to predict human motion kinematics used to render a scene by a rendering engine. Specifically, we investigate head pose and gaze prediction using past pose and gaze data. Accurate head pose and gaze information are critical for field of view (FoV) rendering and foveated encoding or rendering respectively. The investigated approaches focus on light weight data ingest and low latency inference in order to preclude introduction of additional latency in the rendering and media delivery pipeline
Cloud Gaming With Foveated Video Encoding
| openaire: EC/H2020/637991/EU//COMPUTEDCloud gaming enables playing high-end games, originally designed for PC or game console setups, on low-end devices such as netbooks and smartphones, by offloading graphics rendering to GPU-powered cloud servers. However, transmitting the high-resolution video requires a large amount of network bandwidth, even though it is a compressed video stream. Foveated video encoding (FVE) reduces the bandwidth requirement by taking advantage of the non-uniform acuity of human visual system and by knowing where the user is looking. Based on a consumer-grade real-time eye tracker and an open source cloud gaming platform, we provide a cloud gaming FVE prototype that is game-agnostic and requires no modifications to the underlying game engine. In this article, we describe the prototype and its evaluation through measurements with representative games from different genres to understand the effect of parametrization of the FVE scheme on bandwidth requirements and to understand its feasibility from the latency perspective. Wealso present results from a user study on first-person shooter games. The results suggest that it is possible to find a "sweet spot" for the encoding parameters so the users hardly notice the presence of foveated encoding but at the same time the scheme yields most of the achievable bandwidth savings.Peer reviewe
Learning to Predict Head Pose in Remotely-Rendered Virtual Reality
Funding Information: This work has been supported by the Academy of Finland under grant numbers 332306, 332307, and 357533. We would like to thank the CSC – IT Center for Science and the Aalto Science-IT project for provisioning the computational resources used for the evaluation. Publisher Copyright: © 2023 Owner/Author(s).Accurate characterization of Head Mounted Display (HMD) pose in a virtual scene is essential for rendering immersive graphics in Extended Reality (XR). Remote rendering employs servers in the cloud or at the edge of the network to overcome the computational limitations of either standalone or tethered HMDs. Unfortunately, it increases the latency experienced by the user; for this reason, predicting HMD pose in advance is highly beneficial, as long as it achieves high accuracy. This work provides a thorough characterization of solutions that forecast HMD pose in remotely-rendered virtual reality (VR) by considering six degrees of freedom. Specifically, it provides an extensive evaluation of pose representations, forecasting methods, machine learning models, and the use of multiple modalities along with joint and separate training. In particular, a novel three-point representation of pose is introduced together with a data fusion scheme for long-Term short-Term memory (LSTM) neural networks. Our findings show that machine learning models benefit from using multiple modalities, even though simple statistical models perform surprisingly well. Moreover, joint training is comparable to separate training with carefully chosen pose representation and data fusion strategies.Peer reviewe