181 research outputs found

    Visual Distortions in 360-degree Videos.

    Get PDF
    Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers' visual perception and the immersive experience at large is still unknown-thus, it is an open research topic-this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications

    Image-Based Rendering Of Real Environments For Virtual Reality

    Get PDF

    Novel Methods and Algorithms for Presenting 3D Scenes

    Get PDF
    In recent years, improvements in the acquisition and creation of 3D models gave rise to an increasing availability of 3D content and to a widening of the audience such content is created for, which brought into focus the need for effective ways to visualize and interact with it. Until recently, the task of virtual inspection of a 3D object or navigation inside a 3D scene was carried out by using human machine interaction (HMI) metaphors controlled through mouse and keyboard events. However, this interaction approach may be cumbersome for the general audience. Furthermore, the inception and spread of touch-based mobile devices, such as smartphones and tablets, redefined the interaction problem entirely, since neither mouse nor keyboards are available anymore. The problem is made even worse by the fact that these devices are typically lower power if compared to desktop machines, while high-quality rendering is a computationally intensive task. In this thesis, we present a series of novel methods for the easy presentation of 3D content both when it is already available in a digitized form and when it must be acquired from the real world by image-based techniques. In the first case, we propose a method which takes as input the 3D scene of interest and an example video, and it automatically produces a video of the input scene that resembles the given video example. In other words, our algorithm allows the user to replicate an existing video, for example, a video created by a professional animator, on a different 3D scene. In the context of image-based techniques, exploiting the inherent spatial organization of photographs taken for the 3D reconstruction of a scene, we propose an intuitive interface for the smooth stereoscopic navigation of the acquired scene providing an immersive experience without the need of a complete 3D reconstruction. Finally, we propose an interactive framework for improving low-quality 3D reconstructions obtained through image-based reconstruction algorithms. Using few strokes on the input images, the user can specify high-level geometric hints to improve incomplete or noisy reconstructions which are caused by various quite common conditions often arising for objects such as buildings, streets and numerous other human-made functional elements

    Impact of Imaging and Distance Perception in VR Immersive Visual Experience

    Get PDF
    Virtual reality (VR) headsets have evolved to include unprecedented viewing quality. Meanwhile, they have become lightweight, wireless, and low-cost, which has opened to new applications and a much wider audience. VR headsets can now provide users with greater understanding of events and accuracy of observation, making decision-making faster and more effective. However, the spread of immersive technologies has shown a slow take-up, with the adoption of virtual reality limited to a few applications, typically related to entertainment. This reluctance appears to be due to the often-necessary change of operating paradigm and some scepticism towards the "VR advantage". The need therefore arises to evaluate the contribution that a VR system can make to user performance, for example to monitoring and decision-making. This will help system designers understand when immersive technologies can be proposed to replace or complement standard display systems such as a desktop monitor. In parallel to the VR headsets evolution there has been that of 360 cameras, which are now capable to instantly acquire photographs and videos in stereoscopic 3D (S3D) modality, with very high resolutions. 360° images are innately suited to VR headsets, where the captured view can be observed and explored through the natural rotation of the head. Acquired views can even be experienced and navigated from the inside as they are captured. The combination of omnidirectional images and VR headsets has opened to a new way of creating immersive visual representations. We call it: photo-based VR. This represents a new methodology that combines traditional model-based rendering with high-quality omnidirectional texture-mapping. Photo-based VR is particularly suitable for applications related to remote visits and realistic scene reconstruction, useful for monitoring and surveillance systems, control panels and operator training. The presented PhD study investigates the potential of photo-based VR representations. It starts by evaluating the role of immersion and user’s performance in today's graphical visual experience, to then use it as a reference to develop and evaluate new photo-based VR solutions. With the current literature on photo-based VR experience and associated user performance being very limited, this study builds new knowledge from the proposed assessments. We conduct five user studies on a few representative applications examining how visual representations can be affected by system factors (camera and display related) and how it can influence human factors (such as realism, presence, and emotions). Particular attention is paid to realistic depth perception, to support which we develop target solutions for photo-based VR. They are intended to provide users with a correct perception of space dimension and objects size. We call it: true-dimensional visualization. The presented work contributes to unexplored fields including photo-based VR and true-dimensional visualization, offering immersive system designers a thorough comprehension of the benefits, potential, and type of applications in which these new methods can make the difference. This thesis manuscript and its findings have been partly presented in scientific publications. In particular, five conference papers on Springer and the IEEE symposia, [1], [2], [3], [4], [5], and one journal article in an IEEE periodical [6], have been published

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Multi-task near-field perception for autonomous driving using surround-view fisheye cameras

    Get PDF
    Die Bildung der Augen führte zum Urknall der Evolution. Die Dynamik änderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch Mängel. Der Mensch hat über Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser Fähigkeiten für Computer ist entscheidend für verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented Realität und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360° Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. Jüngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen für die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die für die Entwicklung von Echtzeit-Anwendungen zur Verfügung steht. Aufgrund dieses Engpasses kommt es häufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer Rechenkomplexität für verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Überwindung von Rechenengpässen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks

    Stereoscopic high dynamic range imaging

    Get PDF
    Two modern technologies show promise to dramatically increase immersion in virtual environments. Stereoscopic imaging captures two images representing the views of both eyes and allows for better depth perception. High dynamic range (HDR) imaging accurately represents real world lighting as opposed to traditional low dynamic range (LDR) imaging. HDR provides a better contrast and more natural looking scenes. The combination of the two technologies in order to gain advantages of both has been, until now, mostly unexplored due to the current limitations in the imaging pipeline. This thesis reviews both fields, proposes stereoscopic high dynamic range (SHDR) imaging pipeline outlining the challenges that need to be resolved to enable SHDR and focuses on capture and compression aspects of that pipeline. The problems of capturing SHDR images that would potentially require two HDR cameras and introduce ghosting, are mitigated by capturing an HDR and LDR pair and using it to generate SHDR images. A detailed user study compared four different methods of generating SHDR images. Results demonstrated that one of the methods may produce images perceptually indistinguishable from the ground truth. Insights obtained while developing static image operators guided the design of SHDR video techniques. Three methods for generating SHDR video from an HDR-LDR video pair are proposed and compared to the ground truth SHDR videos. Results showed little overall error and identified a method with the least error. Once captured, SHDR content needs to be efficiently compressed. Five SHDR compression methods that are backward compatible are presented. The proposed methods can encode SHDR content to little more than that of a traditional single LDR image (18% larger for one method) and the backward compatibility property encourages early adoption of the format. The work presented in this thesis has introduced and advanced capture and compression methods for the adoption of SHDR imaging. In general, this research paves the way for a novel field of SHDR imaging which should lead to improved and more realistic representation of captured scenes
    corecore