106 research outputs found

    3D Scene Geometry Estimation from 360^\circ Imagery: A Survey

    Full text link
    This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360^\circ, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and mapping). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends.Comment: Published in ACM Computing Survey

    Interactive Panorama VR360 for Corporate Communications: An Industrial Scenario Case Study

    Get PDF
    This case study explores interactive panorama implementation for corporate communications virtual reality 360 (VR360) application. Interactive panorama permits spherical panorama digital imagery being presented in three hundred sixty degrees visual experience instead of single angle limitation in conventional static image. The study proposes the use of interactive panorama as a corporate communications tool being considered from three experimental PEP aspects. The exploration of PEP framework seeks the suitability of implementing three key aspects of people, equipment and product for interactive panorama virtual reality 360 experience. With the developing advances of interactive panorama, it has been introduced by online social media providers as essential feature which allows individual and corporate users to post 360 content. This case study takes advantage on the actual on-going marketing experience and technical insight of a participating case company and several industrial scenario use cases. In this paper we describe interactive panorama content creation, PEP framework, a user study and directions for future work

    Video Upright Adjustment and Stabilization

    Get PDF
    Upright adjustment, Video stabilization, Camera pathWe propose a novel video upright adjustment method that can reliably correct slanted video contents that are often found in casual videos. Our approach combines deep learning and Bayesian inference to estimate accurate rotation angles from video frames. We train a convolutional neural network to obtain initial estimates of the rotation angles of input video frames. The initial estimates from the network are temporally inconsistent and inaccurate. To resolve this, we use Bayesian inference. We analyze estimation errors of the network, and derive an error model. We then use the error model to formulate video upright adjustment as a maximum a posteriori problem where we estimate consistent rotation angles from the initial estimates, while respecting relative rotations between consecutive frames. Finally, we propose a joint approach to video stabilization and upright adjustment, which minimizes information loss caused by separately handling stabilization and upright adjustment. Experimental results show that our video upright adjustment method can effectively correct slanted video contents, and its combination with video stabilization can achieve visually pleasing results from shaky and slanted videos.openI. INTRODUCTION 1.1. Related work II. ROTATION ESTIMATION NETWORK III. ERROR ANALYSIS IV. VIDEO UPRIGHT ADJUSTMENT 4.1. Initial angle estimation 4.2. Robust angle estimation 4.3. Optimization 4.4. Warping V. JOINT UPRIGHT ADJUSTMENT AND STABILIZATION 5.1. Bundled camera paths for video stabilization 5.2. Joint approach VI. EXPERIMENTS VII. CONCLUSION ReferencesCNN)을 훈련시킨다. 신경망의 초기 추정치는 완전히 정확하지 않으며 시간적으로도 일관되지 않는다. 이를 해결하기 위해 베이지안 인퍼런스를 사용한다. 본 논문은 신경망의 추정 오류를 분석하고 오류 모델을 도출한다. 그런 다음 오류 모델을 사용하여 연속 프레임 간의 상대 회전 각도(Relative rotation angle)를 반영하면서 초기 추정치로부터 시간적으로 일관된 회전 각도를 추정하는 최대 사후 문제(Maximum a posteriori problem)로 동영상 수평 보정을 공식화한다. 마지막으로, 동영상 수평 보정 및 동영상 안정화(Video stabilization)에 대한 동시 접근 방법을 제안하여 수평 보정과 안정화를 별도로 수행할 때 발생하는 공간 정보 손실과 연산량을 최소화하며 안정화의 성능을 최대화한다. 실험 결과에 따르면 동영상 수평 보정으로 기울어진 동영상을 효과적으로 보정할 수 있으며 동영상 안정화 방법과 결합하여 흔들리고 기울어진 동영상으로부터 시각적으로 만족스러운 새로운 동영상을 획득할 수 있다.본 논문은 일반인들이 촬영한 동영상에서 흔히 발생하는 문제인 기울어짐을 제거하여 수평이 올바른 동영상을 획득할 수 있게 하는 동영상 수평 보정(Video upright adjustment) 방법을 제안한다. 본 논문의 접근 방식은 딥 러닝(Deep learning)과 베이지안 인퍼런스(Bayesian inference)를 결합하여 동영상 프레임(Frame)에서 정확한 각도를 추정한다. 먼저 입력 동영상 프레임의 회전 각도의 초기 추정치를 얻기 위해 회선 신경망(Convolutional neural networkMasterdCollectio

    Learning geometric and lighting priors from natural images

    Get PDF
    Comprendre les images est d’une importance cruciale pour une pléthore de tâches, de la composition numérique au ré-éclairage d’une image, en passant par la reconstruction 3D d’objets. Ces tâches permettent aux artistes visuels de réaliser des chef-d’oeuvres ou d’aider des opérateurs à prendre des décisions de façon sécuritaire en fonction de stimulis visuels. Pour beaucoup de ces tâches, les modèles physiques et géométriques que la communauté scientifique a développés donnent lieu à des problèmes mal posés possédant plusieurs solutions, dont généralement une seule est raisonnable. Pour résoudre ces indéterminations, le raisonnement sur le contexte visuel et sémantique d’une scène est habituellement relayé à un artiste ou un expert qui emploie son expérience pour réaliser son travail. Ceci est dû au fait qu’il est généralement nécessaire de raisonner sur la scène de façon globale afin d’obtenir des résultats plausibles et appréciables. Serait-il possible de modéliser l’expérience à partir de données visuelles et d’automatiser en partie ou en totalité ces tâches ? Le sujet de cette thèse est celui-ci : la modélisation d’a priori par apprentissage automatique profond pour permettre la résolution de problèmes typiquement mal posés. Plus spécifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photométrie, 2) l’estimation d’illumination extérieure à partir d’une seule image et 3) l’estimation de calibration de caméra à partir d’une seule image avec un contenu générique. Ces trois sujets seront abordés avec une perspective axée sur les données. Chacun de ces axes comporte des analyses de performance approfondies et, malgré la réputation d’opacité des algorithmes d’apprentissage machine profonds, nous proposons des études sur les indices visuels captés par nos méthodes.Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods

    Transitioning360: Content-aware NFoV Virtual Camera Paths for 360° Video Playback

    Get PDF
    Despite the increasing number of head-mounted displays, many 360° VR videos are still being viewed by users on existing 2D displays. To this end, a subset of the 360° video content is often shown inside a manually or semi-automatically selected normal-field-of-view (NFoV) window. However, during the playback, simply watching an NFoV video can easily miss concurrent off-screen content. We present Transitioning360, a tool for 360° video navigation and playback on 2D displays by transitioning between multiple NFoV views that track potentially interesting targets or events. Our method computes virtual NFoV camera paths considering content awareness and diversity in an offline preprocess. During playback, the user can watch any NFoV view corresponding to a precomputed camera path. Moreover, our interface shows other candidate views, providing a sense of concurrent events. At any time, the user can transition to other candidate views for fast navigation and exploration. Experimental results including a user study demonstrate that the viewing experience using our method is more enjoyable and convenient than previous methods

    Transitioning360: Content-aware NFoV Virtual Camera Paths for 360° Video Playback

    Get PDF
    Despite the increasing number of head-mounted displays, many 360° VR videos are still being viewed by users on existing 2D displays. To this end, a subset of the 360° video content is often shown inside a manually or semi-automatically selected normal-field-of-view (NFoV) window. However, during the playback, simply watching an NFoV video can easily miss concurrent off-screen content. We present Transitioning360, a tool for 360° video navigation and playback on 2D displays by transitioning between multiple NFoV views that track potentially interesting targets or events. Our method computes virtual NFoV camera paths considering content awareness and diversity in an offline preprocess. During playback, the user can watch any NFoV view corresponding to a precomputed camera path. Moreover, our interface shows other candidate views, providing a sense of concurrent events. At any time, the user can transition to other candidate views for fast navigation and exploration. Experimental results including a user study demonstrate that the viewing experience using our method is more enjoyable and convenient than previous methods

    Image-Based Rendering Of Real Environments For Virtual Reality

    Get PDF

    Assembling convolution neural networks for automatic viewing transformation

    Get PDF
    Images taken under different camera poses are rotated or distorted, which leads to poor perception experiences. This paper proposes a new framework to automatically transform the images to the conformable view setting by assembling different convolution neural networks. Specifically, a referential 3D ground plane is firstly derived from the RGB image and a novel projection mapping algorithm is developed to achieve automatic viewing transformation. Extensive experimental results demonstrate that the proposed method outperforms the state-ofthe-art vanishing points based methods by a large margin in terms of accuracy and robustness

    Navigating Immersive and Interactive VR Environments With Connected 360° Panoramas

    Get PDF
    Emerging research is expanding the idea of using 360-degree spherical panoramas of real-world environments for use in 360 VR experiences beyond video and image viewing. However, most of these experiences are strictly guided, with few opportunities for interaction or exploration. There is a desire to develop experiences with cohesive virtual environments created with 360 VR that allow for choice in navigation, versus scripted experiences with limited interaction. Unlike standard VR with the freedom of synthetic graphics, there are challenges in designing appropriate user interfaces (UIs) for 360 VR navigation within the limitations of fixed assets. To tackle this gap, we designed RealNodes, a software system that presents an interactive and explorable 360 VR environment. We also developed four visual guidance UIs for 360 VR navigation. The results of a pilot study showed that choice of UI had a significant effect on task completion times, showing one of our methods, Arrow, was best. Arrow also exhibited positive but non-significant trends in average measures with preference, user engagement, and simulator-sickness. RealNodes, the UI designs, and the pilot study results contribute preliminary information that inspire future investigation of how to design effective explorable scenarios in 360 VR and visual guidance metaphors for navigation in applications using 360 VR environments

    Data-driven depth and 3D architectural layout estimation of an interior environment from monocular panoramic input

    Get PDF
    Recent years have seen significant interest in the automatic 3D reconstruction of indoor scenes, leading to a distinct and very-active sub-field within 3D reconstruction. The main objective is to convert rapidly measured data representing real-world indoor environments into models encompassing geometric, structural, and visual abstractions. This thesis focuses on the particular subject of extracting geometric information from single panoramic images, using either visual data alone or sparse registered depth information. The appeal of this setup lies in the efficiency and cost-effectiveness of data acquisition using 360o images. The challenge, however, is that creating a comprehensive model from mostly visual input is extremely difficult, due to noise, missing data, and clutter. My research has concentrated on leveraging prior information, in the form of architectural and data-driven priors derived from large annotated datasets, to develop end-to-end deep learning solutions for specific tasks in the structured reconstruction pipeline. My first contribution consists in a deep neural network architecture for estimating a depth map from a single monocular indoor panorama, operating directly on the equirectangular projection. Leveraging the characteristics of indoor 360-degree images and recognizing the impact of gravity on indoor scene design, the network efficiently encodes the scene into vertical spherical slices. By exploiting long- and short- term relationships among these slices, it recovers an equirectangular depth map directly from the corresponding RGB image. My second contribution generalizes the approach to handle multimodal input, also covering the situation in which the equirectangular input image is paired with a sparse depth map, as provided from common capture setups. Depth is inferred using an efficient single-branch network with a dynamic gating system, processing both dense visual data and sparse geometric data. Additionally, a new augmentation strategy enhances the model's robustness to various types of sparsity, including those from structured light sensors and LiDAR setups. While the first two contributions focus on per-pixel geometric information, my third contribution addresses the recovery of the 3D shape of permanent room surfaces from a single panoramic image. Unlike previous methods, this approach tackles the problem in 3D, expanding the reconstruction space. It employs a graph convolutional network to directly infer the room structure as a 3D mesh, deforming a graph- encoded tessellated sphere mapped to the spherical panorama. Gravity- aligned features are actively incorporated using a projection layer with multi-head self-attention, and specialized losses guide plausible solutions in the presence of clutter and occlusions. The benchmarks on publicly available data show that all three methods provided significant improvements over the state-of-the-art
    corecore