2,160 research outputs found
Consistent Video Filtering for Camera Arrays
International audienceVisual formats have advanced beyond single-view images and videos: 3D movies are commonplace, researchers have developed multi-view navigation systems, and VR is helping to push light field cameras to mass market. However, editing tools for these media are still nascent, and even simple filtering operations like color correction or stylization are problematic: naively applying image filters per frame or per view rarely produces satisfying results due to time and space inconsistencies. Our method preserves and stabilizes filter effects while being agnostic to the inner working of the filter. It captures filter effects in the gradient domain, then uses \emph{input} frame gradients as a reference to impose temporal and spatial consistency. Our least-squares formulation adds minimal overhead compared to naive data processing. Further, when filter cost is high, we introduce a filter transfer strategy that reduces the number of per-frame filtering computations by an order of magnitude, with only a small reduction in visual quality. We demonstrate our algorithm on several camera array formats including stereo videos, light fields, and wide baselines
Acquisition, compression and rendering of depth and texture for multi-view video
Three-dimensional (3D) video and imaging technologies is an emerging trend in the development of digital video systems, as we presently witness the appearance of 3D displays, coding systems, and 3D camera setups. Three-dimensional multi-view video is typically obtained from a set of synchronized cameras, which are capturing the same scene from different viewpoints. This technique especially enables applications such as freeviewpoint video or 3D-TV. Free-viewpoint video applications provide the feature to interactively select and render a virtual viewpoint of the scene. A 3D experience such as for example in 3D-TV is obtained if the data representation and display enable to distinguish the relief of the scene, i.e., the depth within the scene. With 3D-TV, the depth of the scene can be perceived using a multi-view display that renders simultaneously several views of the same scene. To render these multiple views on a remote display, an efficient transmission, and thus compression of the multi-view video is necessary. However, a major problem when dealing with multiview video is the intrinsically large amount of data to be compressed, decompressed and rendered. We aim at an efficient and flexible multi-view video system, and explore three different aspects. First, we develop an algorithm for acquiring a depth signal from a multi-view setup. Second, we present efficient 3D rendering algorithms for a multi-view signal. Third, we propose coding techniques for 3D multi-view signals, based on the use of an explicit depth signal. This motivates that the thesis is divided in three parts. The first part (Chapter 3) addresses the problem of 3D multi-view video acquisition. Multi-view video acquisition refers to the task of estimating and recording a 3D geometric description of the scene. A 3D description of the scene can be represented by a so-called depth image, which can be estimated by triangulation of the corresponding pixels in the multiple views. Initially, we focus on the problem of depth estimation using two views, and present the basic geometric model that enables the triangulation of corresponding pixels across the views. Next, we review two calculation/optimization strategies for determining corresponding pixels: a local and a one-dimensional optimization strategy. Second, to generalize from the two-view case, we introduce a simple geometric model for estimating the depth using multiple views simultaneously. Based on this geometric model, we propose a new multi-view depth-estimation technique, employing a one-dimensional optimization strategy that (1) reduces the noise level in the estimated depth images and (2) enforces consistent depth images across the views. The second part (Chapter 4) details the problem of multi-view image rendering. Multi-view image rendering refers to the process of generating synthetic images using multiple views. Two different rendering techniques are initially explored: a 3D image warping and a mesh-based rendering technique. Each of these methods has its limitations and suffers from either high computational complexity or low image rendering quality. As a consequence, we present two image-based rendering algorithms that improves the balance on the aforementioned issues. First, we derive an alternative formulation of the relief texture algorithm which was extented to the geometry of multiple views. The proposed technique features two advantages: it avoids rendering artifacts ("holes") in the synthetic image and it is suitable for execution on a standard Graphics Processor Unit (GPU). Second, we propose an inverse mapping rendering technique that allows a simple and accurate re-sampling of synthetic pixels. Experimental comparisons with 3D image warping show an improvement of rendering quality of 3.8 dB for the relief texture mapping and 3.0 dB for the inverse mapping rendering technique. The third part concentrates on the compression problem of multi-view texture and depth video (Chapters 5–7). In Chapter 5, we extend the standard H.264/MPEG-4 AVC video compression algorithm for handling the compression of multi-view video. As opposed to the Multi-view Video Coding (MVC) standard that encodes only the multi-view texture data, the proposed encoder peforms the compression of both the texture and the depth multi-view sequences. The proposed extension is based on exploiting the correlation between the multiple camera views. To this end, two different approaches for predictive coding of views have been investigated: a block-based disparity-compensated prediction technique and a View Synthesis Prediction (VSP) scheme. Whereas VSP relies on an accurate depth image, the block-based disparity-compensated prediction scheme can be performed without any geometry information. Our encoder adaptively selects the most appropriate prediction scheme using a rate-distortion criterion for an optimal prediction-mode selection. We present experimental results for several texture and depth multi-view sequences, yielding a quality improvement of up to 0.6 dB for the texture and 3.2 dB for the depth, when compared to solely performing H.264/MPEG-4AVC disparitycompensated prediction. Additionally, we discuss the trade-off between the random-access to a user-selected view and the coding efficiency. Experimental results illustrating and quantifying this trade-off are provided. In Chapter 6, we focus on the compression of a depth signal. We present a novel depth image coding algorithm which concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using parameterized piecewiselinear functions and sharp edges by a straight line, so that it is more efficient than a conventional transform-based encoder. To optimize the quality of the coding system for a given bit rate, a special global rate-distortion optimization balances the rate against the accuracy of the signal representation. For typical bit rates, i.e., between 0.01 and 0.25 bit/pixel, experiments have revealed that the coder outperforms a standard JPEG-2000 encoder by 0.6-3.0 dB. Preliminary results were published in the Proceedings of 26th Symposium on Information Theory in the Benelux. In Chapter 7, we propose a novel joint depth-texture bit-allocation algorithm for the joint compression of texture and depth images. The described algorithm combines the depth and texture Rate-Distortion (R-D) curves, to obtain a single R-D surface that allows the optimization of the joint bit-allocation in relation to the obtained rendering quality. Experimental results show an estimated gain of 1 dB compared to a compression performed without joint bit-allocation optimization. Besides this, our joint R-D model can be readily integrated into an multi-view H.264/MPEG-4 AVC coder because it yields the optimal compression setting with a limited computation effort
Representation and coding of 3D video data
Livrable D4.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.1 du projet
Recommended from our members
Naturalistic depth perception
textMaking inferences about the 3-dimensional spatial structure of natural scenes is a critical visual function. While spatial discrimination both in depth and on the image plane has been well characterized for simple stimuli, little is known about our ability to discriminate depth in natural scenes, particularly at far distances. To begin filling in this gap we: (i) developed a database of 80 stereoscopic images paired with the corresponding measured distance information, (ii) used these scenes as psychophysical stimuli and measured near-far discrimination acuity in 4 observers as a function of distance and the visual angle separating the targets, (iii) made additional measurements under patched-eye (monocular) viewing conditions to evaluate the importance of binocular vision in depth discrimination as a function of viewing geometries. We find that binocular thresholds are roughly a constant Weber fraction of the distance for absolute distances ranging from 4 to 28 meters. Further, measured thresholds were around 1% for small separations, and increased to 4% for stimuli separated by 10 deg. Thus, the ability to discriminate depth in natural scenes is very good out to considerable distances. To investigate the basis of this discrimination ability, monocular thresholds were measured. We found that monocular thresholds were elevated for distances less than 15 meters, but were comparable to binocular thresholds for greater distances. Accurate depth perception depends on combining (fusing) multiple sources of sensory information. Thus binocular thresholds probably involve fusing separate monocular and disparity-derived estimates. Under the assumption of Gaussian distributed independent estimates, Bayes rule provides a simple reliability-weighted summation model of cue combination. Using disparity threshold measurements by Blakemore (1970), and the current monocular thresholds, parameter-free predictions were generated for the current binocular thresholds. These predictions were in broad agreement with the data, suggesting that the disparity and monocular cues are separable and combined optimally in natural scenes.Psycholog
A Nonlinear Force-Free Magnetic Field Approximation Suitable for Fast Forward-Fitting to Coronal Loops. II. Numeric Code and Tests
Based on a second-order approximation of nonlinear force-free magnetic field
solutions in terms of uniformly twisted field lines derived in Paper I, we
develop here a numeric code that is capable to forward-fit such analytical
solutions to arbitrary magnetogram (or vector magnetograph) data combined with
(stereoscopically triangulated) coronal loop 3D coordinates. We test the code
here by forward-fitting to six potential field and six nonpotential field cases
simulated with our analytical model, as well as by forward-fitting to an
exactly force-free solution of the Low and Lou (1990) model. The
forward-fitting tests demonstrate: (i) a satisfactory convergence behavior
(with typical misalignment angles of ), (ii)
relatively fast computation times (from seconds to a few minutes), and (iii)
the high fidelity of retrieved force-free -parameters ( for simulations and for the Low and Lou model). The
salient feature of this numeric code is the relatively fast computation of a
quasi-forcefree magnetic field, which closely matches the geometry of coronal
loops in active regions, and complements the existing {\sl nonlinear force-free
field (NLFFF)} codes based on photospheric magnetograms without coronal
constraints.Comment: Solar PHysics, (in press), 25 pages, 11 figure
No reference quality assessment of stereo video based on saliency and sparsity
With the popularity of video technology, stereoscopic video quality assessment (SVQA) has become increasingly important. Existing SVQA methods cannot achieve good performance because the videos' information is not fully utilized. In this paper, we consider various information in the videos together, construct a simple model to combine and analyze the diverse features, which is based on saliency and sparsity. First, we utilize the 3-D saliency map of sum map, which remains the basic information of stereoscopic video, as a valid tool to evaluate the videos' quality. Second, we use the sparse representation to decompose the sum map of 3-D saliency into coefficients, then calculate the features based on sparse coefficients to obtain the effective expression of videos' message. Next, in order to reduce the relevance between the features, we put them into stacked auto-encoder, mapping vectors to higher dimensional space, and adding the sparse restraint, then input them into support vector machine subsequently, and finally, get the quality assessment scores. Within that process, we take the advantage of saliency and sparsity to extract and simplify features. Through the later experiment, we can see the proposed method is fitting well with the subjective scores
- …