4,914 research outputs found
Estimation of signal distortion using effective sampling density for light field-based free viewpoint video
In a light field-based free viewpoint video (LF-based FVV) system, effective sampling density (ESD) is defined as the number of rays per unit area of the scene that has been acquired and is selected in the rendering process for reconstructing an unknown ray. This paper extends the concept of ESD and shows that ESD is a tractable metric that quantifies the joint impact of the imperfections of LF acquisition and rendering. By deriving and analyzing ESD for the commonly used LF acquisition and rendering methods, it is shown that ESD is an effective indicator determined by system parameters and can be used to directly estimate output video distortion without access to the ground truth. This claim is verified by extensive numerical simulations and comparison to PSNR. Furthermore, an empirical relationship between the output distortion (in PSNR) and the calculated ESD is established to allow direct assessment of the overall video distortion without an actual implementation of the system. A small scale subjective user study is also conducted which indicates a correlation of 0.91 between ESD and perceived quality
Optimized Data Representation for Interactive Multiview Navigation
In contrary to traditional media streaming services where a unique media
content is delivered to different users, interactive multiview navigation
applications enable users to choose their own viewpoints and freely navigate in
a 3-D scene. The interactivity brings new challenges in addition to the
classical rate-distortion trade-off, which considers only the compression
performance and viewing quality. On the one hand, interactivity necessitates
sufficient viewpoints for richer navigation; on the other hand, it requires to
provide low bandwidth and delay costs for smooth navigation during view
transitions. In this paper, we formally describe the novel trade-offs posed by
the navigation interactivity and classical rate-distortion criterion. Based on
an original formulation, we look for the optimal design of the data
representation by introducing novel rate and distortion models and practical
solving algorithms. Experiments show that the proposed data representation
method outperforms the baseline solution by providing lower resource
consumptions and higher visual quality in all navigation configurations, which
certainly confirms the potential of the proposed data representation in
practical interactive navigation systems
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Recommended from our members
Camera positioning for 3D panoramic image rendering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known
camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated
using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation
of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with
a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects.
To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential
of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality
Characteristics of flight simulator visual systems
The physical parameters of the flight simulator visual system that characterize the system and determine its fidelity are identified and defined. The characteristics of visual simulation systems are discussed in terms of the basic categories of spatial, energy, and temporal properties corresponding to the three fundamental quantities of length, mass, and time. Each of these parameters are further addressed in relation to its effect, its appropriate units or descriptors, methods of measurement, and its use or importance to image quality
Large-Scale Light Field Capture and Reconstruction
This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. To overcome these three challenges, we propose: (i) a novel self-calibration method, which takes advantage of the geometric constraints from the scene and the cameras, for estimating the rigid transformations from the camera coordinate frame of one Kinect V2 to the camera coordinate frames of 12-nearest RGB cameras; (ii) a novel coarse-to-fine approach for recovering the rigid transformation from the coordinate system of one Kinect to the coordinate system of the other by means of local color and geometry information; (iii) several novel algorithms that can be categorized into two groups for reconstructing a DSLF from an input SSLF, including novel view synthesis methods, which are inspired by the state-of-the-art video frame interpolation algorithms, and Epipolar-Plane Image (EPI) inpainting methods, which are inspired by the Shearlet Transform (ST)-based DSLF reconstruction approaches
Recording, compression and representation of dense light fields
The concept of light fields allows image based capture of scenes, providing, on a recorded dataset, many of the features available in computer graphics, like simulation of different viewpoints, or change of core camera parameters, including depth of field. Due to the increase in the recorded dimension from two for a regular image to four for a light field recording, previous works mainly concentrate on small or undersampled light field recordings. This thesis is concerned with the recording of a dense light field dataset, including the estimation of suitable sampling parameters, as well as the implementation of the required capture, storage and processing methods. Towards this goal, the influence of an optical system on the, possibly bandunlimited, light field signal is examined, deriving the required sampling rates from the bandlimiting effects of the camera and optics. To increase storage capacity and bandwidth a very fast image compression methods is introduced, providing an order of magnitude faster compression than previous methods, reducing the I/O bottleneck for light field processing. A fiducial marker system is provided for the calibration of the recorded dataset, which provides a higher number of reference points than previous methods, improving camera pose estimation. In conclusion this work demonstrates the feasibility of dense sampling of a large light field, and provides a dataset which may be used for evaluation or as a reference for light field processing tasks like interpolation, rendering and sampling.Das Konzept des Lichtfelds erlaubt eine bildbasierte Erfassung von Szenen und ermöglicht es, auf den erfassten Daten viele Effekte aus der Computergrafik zu berechnen, wie das Simulieren alternativer Kamerapositionen oder die VerĂ€nderung zentraler Parameter, wie zum Beispiel der TiefenschĂ€rfe. Aufgrund der enorm vergröĂerte Datenmenge die fĂŒr eine Aufzeichnung benötigt wird, da Lichtfelder im Vergleich zu den zwei Dimensionen herkömmlicher Kameras ĂŒber vier Dimensionen verfĂŒgen, haben frĂŒhere Arbeiten sich vor allem mit kleinen oder unterabgetasteten Lichtfeldaufnahmen beschĂ€ftigt. Diese Arbeit hat das Ziel eine dichte Aufnahme eines Lichtfeldes vorzunehmen. Dies beinhaltet die Berechnung adĂ€quater Abtastparameter, sowie die Implementierung der benötigten Aufnahme-, Verarbeitungs- und Speicherprozesse. In diesem Zusammenhang werden die bandlimitierenden Effekte des optischen Aufnahmesystems auf das möglicherweise nicht bandlimiterte Signal des Lichtfeldes untersucht und die benötigten Abtastraten davon abgeleitet. Um die Bandbreite und KapazitĂ€t des Speichersystems zu erhöhen wird ein neues, extrem schnelles Verfahren der Bildkompression eingefĂŒhrt, welches um eine GröĂenordnung schneller operiert als bisherige Methoden. FĂŒr die Kalibrierung der Kamerapositionen des aufgenommenen Datensatzes wird ein neues System von sich selbst identifizierenden Passmarken vorgestellt, welches im Vergleich zu frĂŒheren Methoden mehr Referenzpunkte auf gleichem Raum zu VerfĂŒgung stellen kann und so die Kamerakalibrierung verbessert. Kurz zusammengefasst demonstriert diese Arbeit die DurchfĂŒhrbarkeit der Aufnahme eines groĂen und dichten Lichtfeldes, und stellt einen entsprechenden Datensatz zu VerfĂŒgung. Der Datensatz ist geeignet als Referenz fĂŒr die Untersuchung von Methoden zur Verarbeitung von Lichtfeldern, sowie fĂŒr die Evaluation von Methoden zur Interpolation, zur Abtastung und zum Rendern
Evaluation of learning-based techniques in novel view synthesis
Abstract. Novel view synthesis is a long-standing topic at the intersection of computer vision and computer graphics, where the fundamental goal is to synthesize an image from a novel viewpoint given a sparse set of reference images. The rapid development of deep learning has introduced a wide range of new ideas and methods in novel view synthesis where parts of the synthesis process are considered as a supervised learning problem. Specifically, neural scene representations paired with volume rendering have achieved state of the art results in novel view synthesis, but still remains a nascent field facing a lack of literature.
This thesis presents an overview of learning-based view synthesis, experiments with state-of-the-art view synthesis methods, evaluates them quantitatively and qualitatively and finally discusses their properties. Furthermore, we introduce a novel multi-view stereo dataset captured with a hand-held camera and demonstrate the process of collecting and preparing multi-view stereo datasets for view synthesis.
The findings in this thesis indicate that learning-based view synthesis methods excel at synthesizing plausible views from challenging scenes, including situations with complex geometry as well as transparent and reflective materials. Furthermore, we found that it is possible to render such scenes in real-time and greatly reduce the time to prepare a scene for view synthesis by using a pre-trained network that aggregates information from nearby views.Koneoppimisen soveltaminen uuden nÀkymÀn synteesissÀ. TiivistelmÀ. Uuden nÀkymÀn synteesi on pitkÀaikainen aihe konenÀön ja tietokonegrafiikan risteyksessÀ, jossa tavoitteena on syntetisoida kuva uudesta nÀkökulmasta annetun kuvajoukon perusteella. SyvÀoppimisen nopea kehitys on synnyttÀnyt laajan kirjon uusia ideoita ja menetelmiÀ uuden nÀkymÀn synteesissÀ, jossa osia synteesiprosessista pidetÀÀn valvottuna oppimisongelmana. Erityisesti neuraaliset tilaesitykset yhdistettynÀ tilavuusrenderointiin ovat saavuttaneet huippuluokan tuloksia uuden nÀkymÀn synteesissÀ, mutta aihe on vielÀ kehittyvÀ tieteenala.
TÀssÀ opinnÀytetyössÀ esitetÀÀn yleiskatsaus oppimispohjaiseen nÀkymÀn synteesiin, suoritetaan kokeellista tutkimusta uusimmilla synteesimenetelmillÀ, arvioidaan niitÀ kvantitatiivisesti ja kvalitatiivisesti sekÀ lopuksi keskustellaan niiden ominaisuuksista. LisÀksi esitellÀÀn uusi stereokuvien muodostama tietoainesto ja esitetÀÀn prosessi, jolla kerÀtÀÀn ja valmistellaan kyseisiÀ tietoaineistoja nÀkymÀn synteesiÀ varten.
TyössÀ havaitaan, ettÀ oppimispohjaiset nÀkymÀsynteesimenetelmÀt piirtÀvÀt erittÀin aidolta nÀyttÀviÀ nÀkymiÀ tietoaineiston pohjalta jopa tilanteissa, missÀ esiintyy monimutkaista geometriaa sekÀ lÀpinÀkyviÀ ja heijastavia materiaaleja. LisÀksi havaitsimme, ettÀ syntetisointi on mahdollista suorittaa reaaliajassa ja ettÀ syntetisoinnin valmisteluaikaa voidaan myös lyhentÀÀ kÀyttÀmÀllÀ ennalta koulutettua verkkoa, joka kokoaa tietoja lÀheisistÀ nÀkymistÀ
Optimized Camera Handover Scheme in Free Viewpoint Video Streaming
Free-viewpoint video (FVV) is a promising approach that allows users to control their viewpoint and generate virtual views from any desired perspective. The individual user viewpoints are synthetized from two or more camera streams and correspondent depth sequences. In case of continuous viewpoint changes, the camera inputs of the view synthesis process must be changed in a seamless way, in order to avoid the starvation of the viewpoint synthesizer algorithm. Starvation occurs when the desired user viewpoint cannot be synthetized with the currently streamed camera views, thus the FVV playout interrupts. In this paper we proposed three camera handover schemes (TCC, MA, SA) based on viewpoint prediction in order to minimize the probability of playout stalls and find the tradeoff between the image quality and the camera handover frequency. Our simulation results show that the introduced camera switching methods can reduce the handover frequency with more than 40%, hence the viewpoint synthesis starvation and the playout interruption can be minimized. By providing seamless viewpoint changes, the quality of experience can be significantly improved, making the new FVV service more attractive in the future
- âŠ