4 research outputs found
Field-of-View IoU for Object Detection in 360{\deg} Images
360{\deg} cameras have gained popularity over the last few years. In this
paper, we propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and
360Augmentation for object detection in 360{\deg} images. Although most object
detection neural networks designed for the perspective images are applicable to
360{\deg} images in equirectangular projection (ERP) format, their performance
deteriorates owing to the distortion in ERP images. Our method can be readily
integrated with existing perspective object detectors and significantly
improves the performance. The FoV-IoU computes the intersection-over-union of
two Field-of-View bounding boxes in a spherical image which could be used for
training, inference, and evaluation while 360Augmentation is a data
augmentation technique specific to 360{\deg} object detection task which
randomly rotates a spherical image and solves the bias due to the
sphere-to-plane projection. We conduct extensive experiments on the 360indoor
dataset with different types of perspective object detectors and show the
consistent effectiveness of our method
Transitioning360: Content-aware NFoV Virtual Camera Paths for 360° Video Playback
Despite the increasing number of head-mounted displays, many 360° VR videos are still being viewed by users on existing 2D displays. To this end, a subset of the 360° video content is often shown inside a manually or semi-automatically selected normal-field-of-view (NFoV) window. However, during the playback, simply watching an NFoV video can easily miss concurrent off-screen content. We present Transitioning360, a tool for 360° video navigation and playback on 2D displays by transitioning between multiple NFoV views that track potentially interesting targets or events. Our method computes virtual NFoV camera paths considering content awareness and diversity in an offline preprocess. During playback, the user can watch any NFoV view corresponding to a precomputed camera path. Moreover, our interface shows other candidate views, providing a sense of concurrent events. At any time, the user can transition to other candidate views for fast navigation and exploration. Experimental results including a user study demonstrate that the viewing experience using our method is more enjoyable and convenient than previous methods
Transitioning360: Content-aware NFoV Virtual Camera Paths for 360° Video Playback
Despite the increasing number of head-mounted displays, many 360° VR videos are still being viewed by users on existing 2D displays. To this end, a subset of the 360° video content is often shown inside a manually or semi-automatically selected normal-field-of-view (NFoV) window. However, during the playback, simply watching an NFoV video can easily miss concurrent off-screen content. We present Transitioning360, a tool for 360° video navigation and playback on 2D displays by transitioning between multiple NFoV views that track potentially interesting targets or events. Our method computes virtual NFoV camera paths considering content awareness and diversity in an offline preprocess. During playback, the user can watch any NFoV view corresponding to a precomputed camera path. Moreover, our interface shows other candidate views, providing a sense of concurrent events. At any time, the user can transition to other candidate views for fast navigation and exploration. Experimental results including a user study demonstrate that the viewing experience using our method is more enjoyable and convenient than previous methods
Fusing Multimedia Data Into Dynamic Virtual Environments
In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments.
First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism.
We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality.
Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education