2,741 research outputs found
Optimized Data Representation for Interactive Multiview Navigation
In contrary to traditional media streaming services where a unique media
content is delivered to different users, interactive multiview navigation
applications enable users to choose their own viewpoints and freely navigate in
a 3-D scene. The interactivity brings new challenges in addition to the
classical rate-distortion trade-off, which considers only the compression
performance and viewing quality. On the one hand, interactivity necessitates
sufficient viewpoints for richer navigation; on the other hand, it requires to
provide low bandwidth and delay costs for smooth navigation during view
transitions. In this paper, we formally describe the novel trade-offs posed by
the navigation interactivity and classical rate-distortion criterion. Based on
an original formulation, we look for the optimal design of the data
representation by introducing novel rate and distortion models and practical
solving algorithms. Experiments show that the proposed data representation
method outperforms the baseline solution by providing lower resource
consumptions and higher visual quality in all navigation configurations, which
certainly confirms the potential of the proposed data representation in
practical interactive navigation systems
Navigation domain representation for interactive multiview imaging
Enabling users to interactively navigate through different viewpoints of a
static scene is a new interesting functionality in 3D streaming systems. While
it opens exciting perspectives towards rich multimedia applications, it
requires the design of novel representations and coding techniques in order to
solve the new challenges imposed by interactive navigation. Interactivity
clearly brings new design constraints: the encoder is unaware of the exact
decoding process, while the decoder has to reconstruct information from
incomplete subsets of data since the server can generally not transmit images
for all possible viewpoints due to resource constrains. In this paper, we
propose a novel multiview data representation that permits to satisfy bandwidth
and storage constraints in an interactive multiview streaming system. In
particular, we partition the multiview navigation domain into segments, each of
which is described by a reference image and some auxiliary information. The
auxiliary information enables the client to recreate any viewpoint in the
navigation segment via view synthesis. The decoder is then able to navigate
freely in the segment without further data request to the server; it requests
additional data only when it moves to a different segment. We discuss the
benefits of this novel representation in interactive navigation systems and
further propose a method to optimize the partitioning of the navigation domain
into independent segments, under bandwidth and storage constraints.
Experimental results confirm the potential of the proposed representation;
namely, our system leads to similar compression performance as classical
inter-view coding, while it provides the high level of flexibility that is
required for interactive streaming. Hence, our new framework represents a
promising solution for 3D data representation in novel interactive multimedia
services
Foveated Video Streaming for Cloud Gaming
Good user experience with interactive cloud-based multimedia applications,
such as cloud gaming and cloud-based VR, requires low end-to-end latency and
large amounts of downstream network bandwidth at the same time. In this paper,
we present a foveated video streaming system for cloud gaming. The system
adapts video stream quality by adjusting the encoding parameters on the fly to
match the player's gaze position. We conduct measurements with a prototype that
we developed for a cloud gaming system in conjunction with eye tracker
hardware. Evaluation results suggest that such foveated streaming can reduce
bandwidth requirements by even more than 50% depending on parametrization of
the foveated video coding and that it is feasible from the latency perspective.Comment: Submitted to: IEEE 19th International Workshop on Multimedia Signal
Processin
Network streaming and compression for mixed reality tele-immersion
Bulterman, D.C.A. [Promotor]Cesar, P.S. [Copromotor
Content-Based Hyperspectral Image Compression Using a Multi-Depth Weighted Map With Dynamic Receptive Field Convolution
In content-based image compression, the importance map guides the bit allocation based on its ability to represent the importance of image contents. In this paper, we improve the representational power of importance map using Squeeze-and-Excitation (SE) block, and propose multi-depth structure to reconstruct non-important channel information at low bit rates. Furthermore, Dynamic Receptive Field convolution (DRFc) is introduced to improve the ability of normal convolution to extract edge information, so as to increase the weight of edge content in the importance map and improve the reconstruction quality of edge regions. Results indicate that our proposed method can extract an importance map with clear edges and fewer artifacts so as to provide obvious advantages for bit rate allocation in content-based image compression. Compared with typical compression methods, our proposed method can greatly improve the performance of Peak Signal-to-Noise Ratio (PSNR), structural similarity (SSIM) and spectral angle (SAM) on three public datasets, and can produce a much better visual result with sharp edges and fewer artifacts. As a result, our proposed method reduces the SAM by 42.8% compared to the recently SOTA method to achieve the same low bpp (0.25) on the KAIST dataset
Light field image coding with flexible viewpoint scalability and random access
This paper proposes a novel light field image compression approach with viewpoint scalability and random access functionalities. Although current state-of-the-art image coding algorithms for light fields already achieve high compression ratios, there is a lack of support for such functionalities, which are important for ensuring compatibility with different displays/capturing devices, enhanced user interaction and low decoding delay. The proposed solution enables various encoding profiles with different flexible viewpoint scalability and random access capabilities, depending on the application scenario. When compared to other state-of-the-art methods, the proposed approach consistently presents higher bitrate savings (44% on average), namely when compared to pseudo-video sequence coding approach based on HEVC. Moreover, the proposed scalable codec also outperforms MuLE and WaSP verification models, achieving average bitrate saving gains of 37% and 47%, respectively. The various flexible encoding profiles proposed add fine control to the image prediction dependencies, which allow to exploit the tradeoff between coding efficiency and the viewpoint random access, consequently, decreasing the maximum random access penalties that range from 0.60 to 0.15, for lenslet and HDCA light fields.info:eu-repo/semantics/acceptedVersio
Visual Distortions in 360-degree Videos.
Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers' visual perception and the immersive experience at large is still unknown-thus, it is an open research topic-this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications
- …