1,136 research outputs found

    From Capture to Display: A Survey on Volumetric Video

    Full text link
    Volumetric video, which offers immersive viewing experiences, is gaining increasing prominence. With its six degrees of freedom, it provides viewers with greater immersion and interactivity compared to traditional videos. Despite their potential, volumetric video services poses significant challenges. This survey conducts a comprehensive review of the existing literature on volumetric video. We firstly provide a general framework of volumetric video services, followed by a discussion on prerequisites for volumetric video, encompassing representations, open datasets, and quality assessment metrics. Then we delve into the current methodologies for each stage of the volumetric video service pipeline, detailing capturing, compression, transmission, rendering, and display techniques. Lastly, we explore various applications enabled by this pioneering technology and we present an array of research challenges and opportunities in the domain of volumetric video services. This survey aspires to provide a holistic understanding of this burgeoning field and shed light on potential future research trajectories, aiming to bring the vision of volumetric video to fruition.Comment: Submitte

    Real-time Streaming Video Denoising with Bidirectional Buffers

    Full text link
    Video streams are delivered continuously to save the cost of storage and device memory. Real-time denoising algorithms are typically adopted on the user device to remove the noise involved during the shooting and transmission of video streams. However, sliding-window-based methods feed multiple input frames for a single output and lack computation efficiency. Recent multi-output inference works propagate the bidirectional temporal feature with a parallel or recurrent framework, which either suffers from performance drops on the temporal edges of clips or can not achieve online inference. In this paper, we propose a Bidirectional Streaming Video Denoising (BSVD) framework, to achieve high-fidelity real-time denoising for streaming videos with both past and future temporal receptive fields. The bidirectional temporal fusion for online inference is considered not applicable in the MoViNet. However, we introduce a novel Bidirectional Buffer Block as the core module of our BSVD, which makes it possible during our pipeline-style inference. In addition, our method is concise and flexible to be utilized in both non-blind and blind video denoising. We compare our model with various state-of-the-art video denoising models qualitatively and quantitatively on synthetic and real noise. Our method outperforms previous methods in terms of restoration fidelity and runtime. Our source code is publicly available at https://github.com/ChenyangQiQi/BSVDComment: Accepted to ACM MM 2022; Github link: https://github.com/ChenyangQiQi/BSVD

    A baseline roadmap for advanced wireless research beyond 5G

    Get PDF
    This paper presents a baseline roadmap for the evolution of 5G new radio over the next decade. Three timescales are considered, namely short-term (2022-ish), medium-term (2025-ish), and long-term (2030-ish). The evolution of the target key performance indicators (KPIs) is first analyzed by accounting for forecasts on the emerging use cases and their requirements, together with assumptions on the pace of technology advancements. The baseline roadmap is derived next by capturing the top-10 and next the top-5 technology trends envisioned to bring significant added value at each timescale. Being intrinsically predictive, our proposed baseline roadmap cannot assert with certainty the values of the target KPIs and the shortlisting of the technology trends. It is, however, aimed at driving discussions and collecting feedback from the wireless research community for future tuning and refinement as the 5G evolution journey progresses.This research was funded by the European Union's Horizon 2020 research and innovation programme under grant number No 824994

    Enhancing the broadcasted TV consumption experience with broadband omnidirectional video content

    Full text link
    [EN] The current wide range of heterogeneous consumption devices and delivery technologies, offers the opportunity to provide related contents in order to enhance and enrich the TV consumption experience. This paper describes a solution to handle the delivery and synchronous consumption of traditional broadcast TV content and related broadband omnidirectional video content. The solution is intended to support both hybrid (broadcast/broadband) delivery technologies and has been designed to be compatible with the Hybrid Broadcast Broadband TV (HbbTV) standard. In particular, some specifications of HbbTV, such as the use of global timestamps or discovery mechanisms, have been adopted. However, additional functionalities have been designed to achieve accurate synchronization and to support the playout of omnidirectional video content in current consumption devices. In order to prove that commercial hybrid environments could be immediately enhanced with this type of content, the proposed solution has been included in a testbed, and objectively and subjectively evaluated. Regarding the omnidirectional video content, the two most common types of projections are supported: equirectangular and cube map. The results of the objective assessment show that the playout of broadband delivered omnidirectional video content in companion devices can be accurately synchronized with the playout on TV of traditional broadcast 2D content. The results of the subjective assessment show the high interest of users in this type of new enriched and immersive experience that contributes to enhance their Quality of Experience (QoE) and engagement.This work was supported by the Generalitat Valenciana, Investigacion Competitiva Proyectos, through the Research and Development Program Grants for Research Groups to be Consolidated, under Grant AICO/2017/059 and Grant AICO/2017Marfil-Reguero, D.; Boronat, F.; López, J.; Vidal Meló, A. (2019). Enhancing the broadcasted TV consumption experience with broadband omnidirectional video content. IEEE Access. 7:171864-171883. https://doi.org/10.1109/ACCESS.2019.2956084S171864171883

    Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

    Full text link
    Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    Light field image processing: an overview

    Get PDF
    Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data

    Multi-party holomeetings: toward a new era of low-cost volumetric holographic meetings in virtual reality

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Fueled by advances in multi-party communications, increasingly mature immersive technologies being adopted, and the COVID-19 pandemic, a new wave of social virtual reality (VR) platforms have emerged to support socialization, interaction, and collaboration among multiple remote users who are integrated into shared virtual environments. Social VR aims to increase levels of (co-)presence and interaction quality by overcoming the limitations of 2D windowed representations in traditional multi-party video conferencing tools, although most existing solutions rely on 3D avatars to represent users. This article presents a social VR platform that supports real-time volumetric holographic representations of users that are based on point clouds captured by off-the-shelf RGB-D sensors, and it analyzes the platform’s potential for conducting interactive holomeetings (i.e., holoconferencing scenarios). This work evaluates such a platform’s performance and readiness for conducting meetings with up to four users, and it provides insights into aspects of the user experience when using single-camera and low-cost capture systems in scenarios with both frontal and side viewpoints. Overall, the obtained results confirm the platform’s maturity and the potential of holographic communications for conducting interactive multi-party meetings, even when using low-cost systems and single-camera capture systems in scenarios where users are sitting or have a limited translational movement along the X, Y, and Z axes within the 3D virtual environment (commonly known as 3 Degrees of Freedom plus, 3DoF+).The authors would like to thank the members of the EU H2020 VR-Together consortium for their valuable contributions, especially Marc Martos and Mohamad Hjeij for their support in developing and evaluating tasks. This work has been partially funded by: the EU’s Horizon 2020 program, under agreement nº 762111 (VR-Together project); by ACCIÓ (Generalitat de Catalunya), under agreement COMRDI18-1-0008 (ViVIM project); and by Cisco Research and the Silicon Valley Community Foundation, under the grant Extended Reality Multipoint Control Unit (ID: 1779376). The work by Mario Montagud has been additionally funded by Spain’s Agencia Estatal de Investigación under grant RYC2020-030679-I (AEI / 10.13039/501100011033) and by Fondo Social Europeo. The work of David Rincón was supported by Spain’s Agencia Estatal de Investigación within the Ministerio de Ciencia e Innovación under Project PID2019-108713RB-C51 MCIN/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version
    corecore