2,766 research outputs found
Providing 3D video services: the challenge from 2D to 3DTV quality of experience
Recently, three-dimensional (3D) video has decisively burst onto the entertainment industry scene, and has arrived in households even before the standardization process has been completed. 3D television (3DTV) adoption and deployment can be seen as a major leap in television history, similar to previous transitions from black and white (B&W) to color, from analog to digital television (TV), and from standard definition to high definition. In this paper, we analyze current 3D video technology trends in order to define a taxonomy of the availability and possible introduction of 3D-based services. We also propose an audiovisual network services architecture which provides a smooth transition from two-dimensional (2D) to 3DTV in an Internet Protocol (IP)-based scenario. Based on subjective assessment tests, we also analyze those factors which will influence the quality of experience in those 3D video services, focusing on effects of both coding and transmission errors. In addition, examples of the application of the architecture and results of assessment tests are provided
Recommended from our members
Multimedia delivery in the future internet
The term “Networked Media” implies that all kinds of media including text, image, 3D graphics, audio
and video are produced, distributed, shared, managed and consumed on-line through various networks,
like the Internet, Fiber, WiFi, WiMAX, GPRS, 3G and so on, in a convergent manner [1]. This white
paper is the contribution of the Media Delivery Platform (MDP) cluster and aims to cover the Networked
challenges of the Networked Media in the transition to the Future of the Internet.
Internet has evolved and changed the way we work and live. End users of the Internet have been confronted
with a bewildering range of media, services and applications and of technological innovations concerning
media formats, wireless networks, terminal types and capabilities. And there is little evidence that the pace
of this innovation is slowing. Today, over one billion of users access the Internet on regular basis, more
than 100 million users have downloaded at least one (multi)media file and over 47 millions of them do so
regularly, searching in more than 160 Exabytes1 of content. In the near future these numbers are expected
to exponentially rise. It is expected that the Internet content will be increased by at least a factor of 6, rising
to more than 990 Exabytes before 2012, fuelled mainly by the users themselves. Moreover, it is envisaged
that in a near- to mid-term future, the Internet will provide the means to share and distribute (new)
multimedia content and services with superior quality and striking flexibility, in a trusted and personalized
way, improving citizens’ quality of life, working conditions, edutainment and safety.
In this evolving environment, new transport protocols, new multimedia encoding schemes, cross-layer inthe
network adaptation, machine-to-machine communication (including RFIDs), rich 3D content as well as
community networks and the use of peer-to-peer (P2P) overlays are expected to generate new models of
interaction and cooperation, and be able to support enhanced perceived quality-of-experience (PQoE) and
innovative applications “on the move”, like virtual collaboration environments, personalised services/
media, virtual sport groups, on-line gaming, edutainment. In this context, the interaction with content
combined with interactive/multimedia search capabilities across distributed repositories, opportunistic P2P
networks and the dynamic adaptation to the characteristics of diverse mobile terminals are expected to
contribute towards such a vision.
Based on work that has taken place in a number of EC co-funded projects, in Framework Program 6 (FP6)
and Framework Program 7 (FP7), a group of experts and technology visionaries have voluntarily
contributed in this white paper aiming to describe the status, the state-of-the art, the challenges and the way
ahead in the area of Content Aware media delivery platforms
Visual Distortions in 360-degree Videos.
Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers' visual perception and the immersive experience at large is still unknown-thus, it is an open research topic-this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications
WAVELET BASED DATA HIDING OF DEM IN THE CONTEXT OF REALTIME 3D VISUALIZATION (Visualisation 3D Temps-Réel à Distance de MNT par Insertion de Données Cachées Basée Ondelettes)
The use of aerial photographs, satellite images, scanned maps and digital elevation models necessitates the setting up of strategies for the storage and visualization of these data. In order to obtain a three dimensional visualization it is necessary to drape the images, called textures, onto the terrain geometry, called Digital Elevation Model (DEM). Practically, all these information are stored in three different files: DEM, texture and position/projection of the data in a geo-referential system. In this paper we propose to stock all these information in a single file for the purpose of synchronization. For this we have developed a wavelet-based embedding method for hiding the data in a colored image. The texture images containing hidden DEM data can then be sent from the server to a client in order to effect 3D visualization of terrains. The embedding method is integrable with the JPEG2000 coder to accommodate compression and multi-resolution visualization. Résumé L'utilisation de photographies aériennes, d'images satellites, de cartes scannées et de modèles numériques de terrains amène à mettre en place des stratégies de stockage et de visualisation de ces données. Afin d'obtenir une visualisation en trois dimensions, il est nécessaire de lier ces images appelées textures avec la géométrie du terrain nommée Modèle Numérique de Terrain (MNT). Ces informations sont en pratiques stockées dans trois fichiers différents : MNT, texture, position et projection des données dans un système géo-référencé. Dans cet article, nous proposons de stocker toutes ces informations dans un seul fichier afin de les synchroniser. Nous avons développé pour cela une méthode d'insertion de données cachées basée ondelettes dans une image couleur. Les images de texture contenant les données MNT cachées peuvent ensuite être envoyées du serveur au client afin d'effectuer une visualisation 3D de terrains. Afin de combiner une visualisation en multirésolution et une compression, l'insertion des données cachées est intégrable dans le codeur JPEG 2000
Asymmetric 3D video coding based on regions of perceptual relevance
This dissertation presents a study and experimental research on asymmetric coding of
stereoscopic video. A review on 3D technologies, video formats and coding is rst presented
and then particular emphasis is given to asymmetric coding of 3D content and
performance evaluation methods, based on subjective measures, of methods using asymmetric
coding.
The research objective was de ned to be an extension of the current concept of asymmetric
coding for stereo video. To achieve this objective the rst step consists in de ning
regions in the spatial dimension of auxiliary view with di erent perceptual relevance
within the stereo pair, which are identi ed by a binary mask. Then these regions are
encoded with better quality (lower quantisation) for the most relevant ones and worse
quality (higher quantisation) for the those with lower perceptual relevance. The actual
estimation of the relevance of a given region is based on a measure of disparity according
to the absolute di erence between views. To allow encoding of a stereo sequence using
this method, a reference H.264/MVC encoder (JM) has been modi ed to allow additional
con guration parameters and inputs. The nal encoder is still standard compliant.
In order to show the viability of the method subjective assessment tests were performed
over a wide range of objective qualities of the auxiliary view. The results of these tests
allow us to prove 3 main goals. First, it is shown that the proposed method can be
more e cient than traditional asymmetric coding when encoding stereo video at higher
qualities/rates. The method can also be used to extend the threshold at which uniform
asymmetric coding methods start to have an impact on the subjective quality perceived
by the observers. Finally the issue of eye dominance is addressed. Results from stereo
still images displayed over a short period of time showed it has little or no impact on the
proposed method
Algorithms for compression of high dynamic range images and video
The recent advances in sensor and display technologies have brought upon the High Dynamic Range (HDR) imaging capability. The modern multiple exposure HDR sensors can achieve the dynamic range of 100-120 dB and LED and OLED display devices have contrast ratios of 10^5:1 to 10^6:1.
Despite the above advances in technology the image/video compression algorithms and associated hardware are yet based on Standard Dynamic Range (SDR) technology, i.e. they operate within an effective dynamic range of up to 70 dB for 8 bit gamma corrected images. Further the existing infrastructure for content distribution is also designed for SDR, which creates interoperability problems with true HDR capture and display equipment.
The current solutions for the above problem include tone mapping the HDR content to fit SDR. However this approach leads to image quality associated problems, when strong dynamic range compression is applied. Even though some HDR-only solutions have been proposed in literature, they are not interoperable with current SDR infrastructure and are thus typically used in closed systems.
Given the above observations a research gap was identified in the need for efficient algorithms for the compression of still images and video, which are capable of storing full dynamic range and colour gamut of HDR images and at the same time backward compatible with existing SDR infrastructure. To improve the usability of SDR content it is vital that any such algorithms should accommodate different tone mapping operators, including those that are spatially non-uniform.
In the course of the research presented in this thesis a novel two layer CODEC architecture is introduced for both HDR image and video coding. Further a universal and computationally efficient approximation of the tone mapping operator is developed and presented. It is shown that the use of perceptually uniform colourspaces for internal representation of pixel data enables improved compression efficiency of the algorithms. Further proposed novel approaches to the compression of metadata for the tone mapping operator is shown to improve compression performance for low bitrate video content. Multiple compression algorithms are designed, implemented and compared and quality-complexity trade-offs are identified. Finally practical aspects of implementing the developed algorithms are explored by automating the design space exploration flow and integrating the high level systems design framework with domain specific tools for synthesis and simulation of multiprocessor systems. The directions for further work are also presented
T4DT: Tensorizing Time for Learning Temporal 3D Visual Data
Unlike 2D raster images, there is no single dominant representation for 3D
visual data processing. Different formats like point clouds, meshes, or
implicit functions each have their strengths and weaknesses. Still, grid
representations such as signed distance functions have attractive properties
also in 3D. In particular, they offer constant-time random access and are
eminently suitable for modern machine learning. Unfortunately, the storage size
of a grid grows exponentially with its dimension. Hence they often exceed
memory limits even at moderate resolution. This work explores various low-rank
tensor formats, including the Tucker, tensor train, and quantics tensor train
decompositions, to compress time-varying 3D data. Our method iteratively
computes, voxelizes, and compresses each frame's truncated signed distance
function and applies tensor rank truncation to condense all frames into a
single, compressed tensor that represents the entire 4D scene. We show that
low-rank tensor compression is extremely compact to store and query
time-varying signed distance functions. It significantly reduces the memory
footprint of 4D scenes while surprisingly preserving their geometric quality.
Unlike existing iterative learning-based approaches like DeepSDF and NeRF, our
method uses a closed-form algorithm with theoretical guarantees
- …