18 research outputs found

    A New Fast Motion Estimation and Mode Decision algorithm for H.264 Depth Maps encoding in Free Viewpoint TV

    Get PDF
    In this paper, we consider a scenario where 3D scenes are modeled through a View+Depth representation. This representation is to be used at the rendering side to generate synthetic views for free viewpoint video. The encoding of both type of data (view and depth) is carried out using two H.264/AVC encoders. In this scenario we address the reduction of the encoding complexity of depth data. Firstly, an analysis of the Mode Decision and Motion Estimation processes has been conducted for both view and depth sequences, in order to capture the correlation between them. Taking advantage of this correlation, we propose a fast mode decision and motion estimation algorithm for the depth encoding. Results show that the proposed algorithm reduces the computational burden with a negligible loss in terms of quality of the rendered synthetic views. Quality measurements have been conducted using the Video Quality Metric

    Fast mode decision for Multiview Video Coding based on scene geometry

    Full text link
    A new fast mode decision (FMD) algorithm for multi-view video coding (MVC) is presented. The codification of the views is based on the analysis of the homogeneity of the depth map and corrected with the motion analysis of a reference view, which is encoded based on traditional methods and on the use of the disparity differences between the views. This approach reduces the burden of the rate-distortion motion analysis using the availability of a depth map and the presence of the disparity vectors, which are assumed to be provided by the acquisition proces

    Multi-party holomeetings: toward a new era of low-cost volumetric holographic meetings in virtual reality

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Fueled by advances in multi-party communications, increasingly mature immersive technologies being adopted, and the COVID-19 pandemic, a new wave of social virtual reality (VR) platforms have emerged to support socialization, interaction, and collaboration among multiple remote users who are integrated into shared virtual environments. Social VR aims to increase levels of (co-)presence and interaction quality by overcoming the limitations of 2D windowed representations in traditional multi-party video conferencing tools, although most existing solutions rely on 3D avatars to represent users. This article presents a social VR platform that supports real-time volumetric holographic representations of users that are based on point clouds captured by off-the-shelf RGB-D sensors, and it analyzes the platform’s potential for conducting interactive holomeetings (i.e., holoconferencing scenarios). This work evaluates such a platform’s performance and readiness for conducting meetings with up to four users, and it provides insights into aspects of the user experience when using single-camera and low-cost capture systems in scenarios with both frontal and side viewpoints. Overall, the obtained results confirm the platform’s maturity and the potential of holographic communications for conducting interactive multi-party meetings, even when using low-cost systems and single-camera capture systems in scenarios where users are sitting or have a limited translational movement along the X, Y, and Z axes within the 3D virtual environment (commonly known as 3 Degrees of Freedom plus, 3DoF+).The authors would like to thank the members of the EU H2020 VR-Together consortium for their valuable contributions, especially Marc Martos and Mohamad Hjeij for their support in developing and evaluating tasks. This work has been partially funded by: the EU’s Horizon 2020 program, under agreement nº 762111 (VR-Together project); by ACCIÓ (Generalitat de Catalunya), under agreement COMRDI18-1-0008 (ViVIM project); and by Cisco Research and the Silicon Valley Community Foundation, under the grant Extended Reality Multipoint Control Unit (ID: 1779376). The work by Mario Montagud has been additionally funded by Spain’s Agencia Estatal de Investigación under grant RYC2020-030679-I (AEI / 10.13039/501100011033) and by Fondo Social Europeo. The work of David Rincón was supported by Spain’s Agencia Estatal de Investigación within the Ministerio de Ciencia e Innovación under Project PID2019-108713RB-C51 MCIN/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

    Toward hyper-realistic and interactive social VR experiences in live TV scenarios

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Social Virtual Reality (VR) allows multiple distributed users getting together in shared virtual environments to socially interact and/or collaborate. This article explores the applicability and potential of Social VR in the broadcast sector, focusing on a live TV show use case. For such a purpose, a novel and lightweight Social VR platform is introduced. The platform provides three key outstanding features compared to state-of-the-art solutions. First, it allows a real-time integration of remote users in shared virtual environments, using realistic volumetric representations and affordable capturing systems, thus not relying on the use of synthetic avatars. Second, it supports a seamless and rich integration of heterogeneous media formats, including 3D scenarios, dynamic volumetric representation of users and (live/stored) stereoscopic 2D and 180º/360º videos. Third, it enables low-latency interaction between the volumetric users and a video-based presenter (Chroma keying), and a dynamic control of the media playout to adapt to the session’s evolution. The production process of an immersive TV show to be able to evaluate the experience is also described. On the one hand, the results from objective tests show the satisfactory performance of the platform. On the other hand, the promising results from user tests support the potential impact of the presented platform, opening up new opportunities in the broadcast sector, among others.This work has been partially funded by the European Union’s Horizon 2020 program, under agreement nº 762111 (VRTogether project), and partially by ACCIÓ, under agreement COMRDI18-1-0008 (ViVIM project). Work by Mario Montagud has been additionally funded by the Spanish Ministry of Science, Innovation and Universities with a Juan de la Cierva – Incorporación grant (reference IJCI-2017-34611). The authors would also like to thank the EU H2020 VRTogether project consortium for their relevant and valuable contributions.Peer ReviewedPostprint (author's final draft

    Depth perceptual video coding for free viewpoint video based on H.264/AVC

    Get PDF
    A novel scheme for depth sequences compression, based on a perceptual coding algorithm, is proposed. A depth sequence describes the object position in the 3D scene, and is used, in Free Viewpoint Video, for the generation of synthetic video sequences. In perceptual video coding the human visual system characteristics are exploited to improve the compression efficiency. As depth sequences are never shown, the perceptual video coding, assessed over them, is not effective. The proposed algorithm is based on a novel perceptual rate distortion optimization process, assessed over the perceptual distortion of the rendered views generated through the encoded depth sequences. The experimental results show the effectiveness of the proposed method, able to obtain a very considerable improvement of the rendered view perceptual quality

    A pipeline for multiparty volumetric video conferencing: Transmission of point clouds over low latency DASH

    Get PDF
    The advent of affordable 3D capture and display hardware is making volumetric videoconferencing feasible. This technology increases the immersion of the participants, breaking the flat restriction of 2D screens, by allowing them to collaborate and interact in shared virtual reality spaces. In this paper we introduce the design and development of an architecture intended for volumetric videoconferencing that provides a highly realistic 3D representation of the participants, based on pointclouds. A pointcloud representation is suitable for real-time applications like video conferencing, due to its low-complexity and because it does not need a time consuming reconstruction process. As transport protocol we selected low latency DASH, due to its popularity and client-based adaptation mechanisms for tiling. This paper presents the architectural design, details the implementation, and provides some referential results. The demo will showcase the system in action, enabling volumetric videoconferencing using pointclouds

    A pipeline for multiparty volumetric video conferencing: Transmission of point clouds over low latency DASH

    Get PDF
    The advent of affordable 3D capture and display hardware is making volumetric videoconferencing feasible. This technology increases the immersion of the participants, breaking the flat restriction of 2D screens, by allowing them to collaborate and interact in shared virtual reality spaces. In this paper we introduce the design and development of an architecture intended for volumetric videoconferencing that provides a highly realistic 3D representation of the participants, based on pointclouds. A pointcloud representation is suitable for real-time applications like video conferencing, due to its low-complexity and because it does not need a time consuming reconstruction process. As transport protocol we selected low latency DASH, due to its popularity and client-based adaptation mechanisms for tiling. This paper presents the architectural design, details the implementation, and provides some referential results. The demo will showcase the system in action, enabling volumetric videoconferencing using pointclouds

    Artificial intelligence-assisted quantification of COVID-19 pneumonia burden from computed tomography improves prediction of adverse outcomes over visual scoring systems

    Get PDF
    Objective:We aimed to evaluate the effectiveness of utilizing artificial intelligence (AI) to quantify the extent of pneumonia from chest CT scans, and to determine its ability to predict clinical deterioration or mortality in patients admitted to the hospital with COVID-19 in comparison to semi-quantitative visual scoring systems.Methods:A deep-learning algorithm was utilized to quantify the pneumonia burden, while semi-quantitative pneumonia severity scores were estimated through visual means. The primary outcome was clinical deterioration, the composite end point including admission to the intensive care unit, need for invasive mechanical ventilation, or vasopressor therapy, as well as in-hospital death.Results:The final population comprised 743 patients (mean age 65  ±  17 years, 55% men), of whom 175 (23.5%) experienced clinical deterioration or death. The area under the receiver operating characteristic curve (AUC) for predicting the primary outcome was significantly higher for AI-assisted quantitative pneumonia burden (0.739, p = 0.021) compared with the visual lobar severity score (0.711, p < 0.001) and visual segmental severity score (0.722, p = 0.042). AI-assisted pneumonia assessment exhibited lower performance when applied for calculation of the lobar severity score (AUC of 0.723, p = 0.021). Time taken for AI-assisted quantification of pneumonia burden was lower (38 ± 10 s) compared to that of visual lobar (328 ± 54 s, p < 0.001) and segmental (698 ± 147 s, p < 0.001) severity scores.Conclusion:Utilizing AI-assisted quantification of pneumonia burden from chest CT scans offers a more accurate prediction of clinical deterioration in patients with COVID-19 compared to semi-quantitative severity scores, while requiring only a fraction of the analysis time.Advances in knowledge:Quantitative pneumonia burden assessed using AI demonstrated higher performance for predicting clinical deterioration compared to current semi-quantitative scoring systems. Such an AI system has the potential to be applied for image-based triage of COVID-19 patients in clinical practice

    Complexity and Quality Optimization for Multi-View plus Depth Video Coding

    Full text link
    El vídeo 3D, la televisión con elección libre del punto de vista y otros sistemas de vídeo tridimensional, han representado durante años, y todavía representan, una tendencia emergente dentro de las tecnologías de vídeo digital. Una de las representaciones más típicas de vídeo en 3D es el formato Multivista con Profundidad (Multiview plus Depth –MVD). Una escena representada en MVD se captura desde varias cámaras (puntos de vista), capturando diferentes representaciones de la escena desde una gran cantidad de direcciones. Para cada punto de vista se obtiene dos tipos de información: la textura de la escena, representada como una secuencia de vídeo 2D tradicional, con sus componentes de color habituales (RGB o similar), y la geometría de la escena, representada como una secuencia de vídeo en niveles de gris, llamada mapa de profundidad, que contiene la información relacionada con la distancia de los objetos a la cámara. Gracias a las múltiples representaciones de textura más profundidad, una escena 3D puede reconstruirse completamente, proporcionando al usuario la percepción de inmersión en la misma. Dado que la etapa de compresión es uno de los pasos más importantes en la representación digital de vídeo, la necesidad de codificar eficientemente la información aumenta cuando esta es utilizada para representar la escena en los sistemas 3D. Teniendo en cuenta que un escenario MVD involucra una cantidad creciente de datos debido a los múltiples puntos de vista, y que además cada uno de ellos incluye la nueva información de profundidad, las técnicas de codificación han tenido que evolucionar para minimizar el impacto del creciente volumen de datos y para adaptarse a las características de la información de profundidad. El trabajo presentado en esta tesis se centra en la adaptación de los métodos tradicionales de compresión basados en AVC/H.264 al entorno MVD. El objetivo perseguido es reducir la carga computacional, que se incrementa dramáticamente por la gran cantidad de representaciones de vídeo, pero también se busca aumentar la eficiencia del proceso de codificación en términos de tasa-distorsión, centrándose en la calidad del vídeo 3D renderizado a través de las múltiples representaciones de color más profundidad. La primera área de investigación ha sido la reducción de la carga computacional de la etapa de Decisión del Modo (Mode Decision –MD), que es una de las de mayor carga computacional del proceso de codificación. La información de geometría proporcionada por los mapas de profundidad ha sido explotada y utilizada para predecir la geometría y el movimiento de los objetos en la escena. Por otro lado, se ha realizado un análisis de la información de profundidad para tener un conocimiento sobre el movimiento en la escena, y que ha proporcionado el entendimiento de cómo está correlacionada la información de movimiento de la componente de textura y de la de profundidad. A continuación, el trabajo se centró en la reducción de la carga computacional de la codificación de los mapas de profundidad usando la etapa de Estimación del Movimiento (Motion Estimation –ME) además de la de MD, y explotando la correlación existente entre el movimiento de la textura y el de la profundidad. Como resultado, la carga computacional se ha reducido considerablemente en el proceso de compresión con una pérdida de calidad despreciable en la mayoría de los casos. En comparación con la búsqueda exhaustiva de modos y de vectores de movimiento de un codificador AVC/H.264 tradicional, el tiempo consumido se reduce hasta un 40 % en la compresión de la textura y hasta un 58 % en la compresión de la profundidad. Sin embargo, la reducción de la carga computacional no ha sido el único objetivo del trabajo presentado en esta tesis. Se ha explorado un área considerablemente novedosa, introduciendo nuevos paradigmas de codificación perceptual para la compresión de la profundidad. La última parte de esta tesis se ha centrado en la aplicación de metodologías de percepción, ampliamente explotadas en las técnicas tradicionales de compresión de vídeo 2D, para la compresión de la profundidad. La profundidad se usa solo para fines de reconstrucción 3D como en el caso de la generación de vistas sintéticas. Como esta información nunca se muestra al usuario, los artefactos debidos a su compresión afectarán solo a las representaciones reconstruidas en las vistas sintéticas de la textura. El trabajo de percepción mostrado en esta tesis se ha centrado en adaptar las técnicas tradicionales de compresión perceptiva 2D al formato de representación MVD, optimizando la calidad perceptiva de las vistas sintéticas. El rendimiento de las técnicas perceptivas propuestas para la compresión de profundidad se ha evaluado utilizando métricas de calidad perceptiva, obteniendo una reducción de la tasa de bits de hasta el 13% con una mejora de hasta 0,3 dB según las mediciones de Bjontergaard. ----------ABSTRACT---------- 3D Video, Free Viewpoint TV and other three-dimensional imaging systems have represented, and still represent, the emerging trend for digital video technologies. Multi View plus Depth (MVD) is one of the most typical 3D video representations. An MVD scene is recorded from several viewpoints, capturing many different representations from a wide amount of directions. For each viewpoint, two video components are captured: the scene texture, represented as a traditional 2D video with the usual color components (RGB or similar), and the scene geometry, represented as a graylevel image, called depth map, containing the information related to the distance of the scene objects from the viewpoint. Thanks to the multiple texture and depth representations, a 3D scene can be fully reconstructed, providing to the user the perception of immersion. As for the previous imaging technologies, given that the compression is one of the most important steps of a digital video representation pipeline, also in 3D video has risen the need of encoding efficiently the information used to represent the scene. Considering that an MVD scenario involves an increasing amount of data due to the multiple viewpoints, and also includes new information like the depth maps, the encoding techniques have evolved in order to minimize the impact of the data increasing and to adapt to the depth characteristics. The work presented in this thesis focuses on adapting the traditional compression methods based on AVC/H.264 to the MVD environment, aiming to reduce the computational load, dramatically increased by the high amount of video representations, but also to increase the efficiency of the encoding process in terms of rate-distortion, focusing on the quality of the 3D video rendered through the multiple texture and depth representations. The first area of research has been the reduction of the computational load of the Mode Decision (MD) stage, which is one of the most computationally expensive of the encoding process. The geometry information provided by the depth maps has been exploited and used to predict geometry and motion of the objects in the scene. On the other hand, analyzing the depth in order to have a knowledge about the motion of the scene has provided an understanding of how the motion information of texture and depth components are correlated to each other. Then, the work has focused on the reduction of the computational load of the depth maps compression, this time involving both MD and Motion Estimation (ME), exploiting the correlation between the motion of the texture and of the depth. The computational load has been considerably reduced in the compression process of both texture and depth maps, reaching up to 40% of reduction in time consumption in the compression of the texture, and up to 58% of reduction in the compression of the depth, when compared to the full search of modes and motion vector of a traditional AVC/H.264 encoder. In both cases, the quality loss has been negligible. However, the computational load reduction has not been the only goal of the work presented in this thesis. A considerably novel area has been explored, introducing new perceptual encoding paradigms for the compression of the depth. The last part of this thesis focuses on the application of perceptual methodologies, widely exploited in traditional 2D video compression techniques, but for the compression of the depth. The depth is used only for 3D reconstruction purposes as the generation of the synthetic views, and as it is never shown to the audience, the compression artifacts would affect only the reconstructed representations. The perceptual work shown in this thesis has then focused on adapting traditional 2D perceptual compression techniques to the MVD representation, optimizing the perceptual quality of the synthetic views. The performance of the proposed perceptual techniques applied to depth compression has been evaluated using perceptual quality metrics, reaching a reduction of the bit-rate up to 13% with an improvement of up to 0.3 dB according to the Bjontergaard measurements
    corecore