4,914 research outputs found

    Estimation of signal distortion using effective sampling density for light field-based free viewpoint video

    Get PDF
    In a light field-based free viewpoint video (LF-based FVV) system, effective sampling density (ESD) is defined as the number of rays per unit area of the scene that has been acquired and is selected in the rendering process for reconstructing an unknown ray. This paper extends the concept of ESD and shows that ESD is a tractable metric that quantifies the joint impact of the imperfections of LF acquisition and rendering. By deriving and analyzing ESD for the commonly used LF acquisition and rendering methods, it is shown that ESD is an effective indicator determined by system parameters and can be used to directly estimate output video distortion without access to the ground truth. This claim is verified by extensive numerical simulations and comparison to PSNR. Furthermore, an empirical relationship between the output distortion (in PSNR) and the calculated ESD is established to allow direct assessment of the overall video distortion without an actual implementation of the system. A small scale subjective user study is also conducted which indicates a correlation of 0.91 between ESD and perceived quality

    Optimized Data Representation for Interactive Multiview Navigation

    Get PDF
    In contrary to traditional media streaming services where a unique media content is delivered to different users, interactive multiview navigation applications enable users to choose their own viewpoints and freely navigate in a 3-D scene. The interactivity brings new challenges in addition to the classical rate-distortion trade-off, which considers only the compression performance and viewing quality. On the one hand, interactivity necessitates sufficient viewpoints for richer navigation; on the other hand, it requires to provide low bandwidth and delay costs for smooth navigation during view transitions. In this paper, we formally describe the novel trade-offs posed by the navigation interactivity and classical rate-distortion criterion. Based on an original formulation, we look for the optimal design of the data representation by introducing novel rate and distortion models and practical solving algorithms. Experiments show that the proposed data representation method outperforms the baseline solution by providing lower resource consumptions and higher visual quality in all navigation configurations, which certainly confirms the potential of the proposed data representation in practical interactive navigation systems

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    Characteristics of flight simulator visual systems

    Get PDF
    The physical parameters of the flight simulator visual system that characterize the system and determine its fidelity are identified and defined. The characteristics of visual simulation systems are discussed in terms of the basic categories of spatial, energy, and temporal properties corresponding to the three fundamental quantities of length, mass, and time. Each of these parameters are further addressed in relation to its effect, its appropriate units or descriptors, methods of measurement, and its use or importance to image quality

    Large-Scale Light Field Capture and Reconstruction

    Get PDF
    This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. To overcome these three challenges, we propose: (i) a novel self-calibration method, which takes advantage of the geometric constraints from the scene and the cameras, for estimating the rigid transformations from the camera coordinate frame of one Kinect V2 to the camera coordinate frames of 12-nearest RGB cameras; (ii) a novel coarse-to-fine approach for recovering the rigid transformation from the coordinate system of one Kinect to the coordinate system of the other by means of local color and geometry information; (iii) several novel algorithms that can be categorized into two groups for reconstructing a DSLF from an input SSLF, including novel view synthesis methods, which are inspired by the state-of-the-art video frame interpolation algorithms, and Epipolar-Plane Image (EPI) inpainting methods, which are inspired by the Shearlet Transform (ST)-based DSLF reconstruction approaches

    Recording, compression and representation of dense light fields

    Get PDF
    The concept of light fields allows image based capture of scenes, providing, on a recorded dataset, many of the features available in computer graphics, like simulation of different viewpoints, or change of core camera parameters, including depth of field. Due to the increase in the recorded dimension from two for a regular image to four for a light field recording, previous works mainly concentrate on small or undersampled light field recordings. This thesis is concerned with the recording of a dense light field dataset, including the estimation of suitable sampling parameters, as well as the implementation of the required capture, storage and processing methods. Towards this goal, the influence of an optical system on the, possibly bandunlimited, light field signal is examined, deriving the required sampling rates from the bandlimiting effects of the camera and optics. To increase storage capacity and bandwidth a very fast image compression methods is introduced, providing an order of magnitude faster compression than previous methods, reducing the I/O bottleneck for light field processing. A fiducial marker system is provided for the calibration of the recorded dataset, which provides a higher number of reference points than previous methods, improving camera pose estimation. In conclusion this work demonstrates the feasibility of dense sampling of a large light field, and provides a dataset which may be used for evaluation or as a reference for light field processing tasks like interpolation, rendering and sampling.Das Konzept des Lichtfelds erlaubt eine bildbasierte Erfassung von Szenen und ermöglicht es, auf den erfassten Daten viele Effekte aus der Computergrafik zu berechnen, wie das Simulieren alternativer Kamerapositionen oder die VerĂ€nderung zentraler Parameter, wie zum Beispiel der TiefenschĂ€rfe. Aufgrund der enorm vergrĂ¶ĂŸerte Datenmenge die fĂŒr eine Aufzeichnung benötigt wird, da Lichtfelder im Vergleich zu den zwei Dimensionen herkömmlicher Kameras ĂŒber vier Dimensionen verfĂŒgen, haben frĂŒhere Arbeiten sich vor allem mit kleinen oder unterabgetasteten Lichtfeldaufnahmen beschĂ€ftigt. Diese Arbeit hat das Ziel eine dichte Aufnahme eines Lichtfeldes vorzunehmen. Dies beinhaltet die Berechnung adĂ€quater Abtastparameter, sowie die Implementierung der benötigten Aufnahme-, Verarbeitungs- und Speicherprozesse. In diesem Zusammenhang werden die bandlimitierenden Effekte des optischen Aufnahmesystems auf das möglicherweise nicht bandlimiterte Signal des Lichtfeldes untersucht und die benötigten Abtastraten davon abgeleitet. Um die Bandbreite und KapazitĂ€t des Speichersystems zu erhöhen wird ein neues, extrem schnelles Verfahren der Bildkompression eingefĂŒhrt, welches um eine GrĂ¶ĂŸenordnung schneller operiert als bisherige Methoden. FĂŒr die Kalibrierung der Kamerapositionen des aufgenommenen Datensatzes wird ein neues System von sich selbst identifizierenden Passmarken vorgestellt, welches im Vergleich zu frĂŒheren Methoden mehr Referenzpunkte auf gleichem Raum zu VerfĂŒgung stellen kann und so die Kamerakalibrierung verbessert. Kurz zusammengefasst demonstriert diese Arbeit die DurchfĂŒhrbarkeit der Aufnahme eines großen und dichten Lichtfeldes, und stellt einen entsprechenden Datensatz zu VerfĂŒgung. Der Datensatz ist geeignet als Referenz fĂŒr die Untersuchung von Methoden zur Verarbeitung von Lichtfeldern, sowie fĂŒr die Evaluation von Methoden zur Interpolation, zur Abtastung und zum Rendern

    Evaluation of learning-based techniques in novel view synthesis

    Get PDF
    Abstract. Novel view synthesis is a long-standing topic at the intersection of computer vision and computer graphics, where the fundamental goal is to synthesize an image from a novel viewpoint given a sparse set of reference images. The rapid development of deep learning has introduced a wide range of new ideas and methods in novel view synthesis where parts of the synthesis process are considered as a supervised learning problem. Specifically, neural scene representations paired with volume rendering have achieved state of the art results in novel view synthesis, but still remains a nascent field facing a lack of literature. This thesis presents an overview of learning-based view synthesis, experiments with state-of-the-art view synthesis methods, evaluates them quantitatively and qualitatively and finally discusses their properties. Furthermore, we introduce a novel multi-view stereo dataset captured with a hand-held camera and demonstrate the process of collecting and preparing multi-view stereo datasets for view synthesis. The findings in this thesis indicate that learning-based view synthesis methods excel at synthesizing plausible views from challenging scenes, including situations with complex geometry as well as transparent and reflective materials. Furthermore, we found that it is possible to render such scenes in real-time and greatly reduce the time to prepare a scene for view synthesis by using a pre-trained network that aggregates information from nearby views.Koneoppimisen soveltaminen uuden nÀkymÀn synteesissÀ. TiivistelmÀ. Uuden nÀkymÀn synteesi on pitkÀaikainen aihe konenÀön ja tietokonegrafiikan risteyksessÀ, jossa tavoitteena on syntetisoida kuva uudesta nÀkökulmasta annetun kuvajoukon perusteella. SyvÀoppimisen nopea kehitys on synnyttÀnyt laajan kirjon uusia ideoita ja menetelmiÀ uuden nÀkymÀn synteesissÀ, jossa osia synteesiprosessista pidetÀÀn valvottuna oppimisongelmana. Erityisesti neuraaliset tilaesitykset yhdistettynÀ tilavuusrenderointiin ovat saavuttaneet huippuluokan tuloksia uuden nÀkymÀn synteesissÀ, mutta aihe on vielÀ kehittyvÀ tieteenala. TÀssÀ opinnÀytetyössÀ esitetÀÀn yleiskatsaus oppimispohjaiseen nÀkymÀn synteesiin, suoritetaan kokeellista tutkimusta uusimmilla synteesimenetelmillÀ, arvioidaan niitÀ kvantitatiivisesti ja kvalitatiivisesti sekÀ lopuksi keskustellaan niiden ominaisuuksista. LisÀksi esitellÀÀn uusi stereokuvien muodostama tietoainesto ja esitetÀÀn prosessi, jolla kerÀtÀÀn ja valmistellaan kyseisiÀ tietoaineistoja nÀkymÀn synteesiÀ varten. TyössÀ havaitaan, ettÀ oppimispohjaiset nÀkymÀsynteesimenetelmÀt piirtÀvÀt erittÀin aidolta nÀyttÀviÀ nÀkymiÀ tietoaineiston pohjalta jopa tilanteissa, missÀ esiintyy monimutkaista geometriaa sekÀ lÀpinÀkyviÀ ja heijastavia materiaaleja. LisÀksi havaitsimme, ettÀ syntetisointi on mahdollista suorittaa reaaliajassa ja ettÀ syntetisoinnin valmisteluaikaa voidaan myös lyhentÀÀ kÀyttÀmÀllÀ ennalta koulutettua verkkoa, joka kokoaa tietoja lÀheisistÀ nÀkymistÀ

    Optimized Camera Handover Scheme in Free Viewpoint Video Streaming

    Get PDF
    Free-viewpoint video (FVV) is a promising approach that allows users to control their viewpoint and generate virtual views from any desired perspective. The individual user viewpoints are synthetized from two or more camera streams and correspondent depth sequences. In case of continuous viewpoint changes, the camera inputs of the view synthesis process must be changed in a seamless way, in order to avoid the starvation of the viewpoint synthesizer algorithm. Starvation occurs when the desired user viewpoint cannot be synthetized with the currently streamed camera views, thus the FVV playout interrupts. In this paper we proposed three camera handover schemes (TCC, MA, SA) based on viewpoint prediction in order to minimize the probability of playout stalls and find the tradeoff between the image quality and the camera handover frequency. Our simulation results show that the introduced camera switching methods can reduce the handover frequency with more than 40%, hence the viewpoint synthesis starvation and the playout interruption can be minimized. By providing seamless viewpoint changes, the quality of experience can be significantly improved, making the new FVV service more attractive in the future
    • 

    corecore