Search CORE

21,123 research outputs found

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Author: Gao Xinyu
Jiao Shaohui
Jin Xiaogang
Yang Ziyi
Zhang Yuqing
Zhou Wen
Publication venue
Publication date: 22/09/2023
Field of study

Implicit neural representation has opened up new avenues for dynamic scene reconstruction and rendering. Nonetheless, state-of-the-art methods of dynamic neural rendering rely heavily on these implicit representations, which frequently struggle with accurately capturing the intricate details of objects in the scene. Furthermore, implicit methods struggle to achieve real-time rendering in general dynamic scenes, limiting their use in a wide range of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using explicit 3D Gaussians and learns Gaussians in canonical space with a deformation field to model monocular dynamic scenes. We also introduced a smoothing training mechanism with no extra overhead to mitigate the impact of inaccurate poses in real datasets on the smoothness of time interpolation tasks. Through differential gaussian rasterization, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time synthesis, and real-time rendering

arXiv.org e-Print Archive

Soft bilateral filtering shadows using multiple image-based algorithms

Author: Ali HH
Kolivand H
Sunar MS
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This study introduces Soft Bilateral Filtering Shadows method of dynamic scenes, which uses multi-matrices of the light sample points due to lack realism in soft shadows generation in real time. While geometry-based shadow algorithm requires one pass per polygon for rendering shadow that requires time-consuming, the adopted shadow map algorithm needs a single rendering pass for each sample point of the light source to generate shadow at low cost. This method renders a complex scenes and accurately eliminating the inherent deficiencies in shadow maps. In order to compute shadow maps, view matrices were used for each sample point of the extended light source. Then penumbra region was used for interpolation based on bilateral filtering to create the soft shadows. They depend on multiple shadow maps which provide antialiasing shadow maps. The method uses fragment shader for rendering multiple shadow maps with penumbra and umbra regions. The main contribution of this article is introducing interpolation bilaterally of image-based shadows. This method makes the most effect of the computation significantly appear at the edges of the penumbra region. Furthermore, the filtering allows to obtain on the soft shadow marvelously at the lowest number possible of the light sample points. The generated soft shadows have good performance and high quality therefore, they are suitable for interactive applications. © 2016 Springer Science+Business Media New Yor

LJMU Research Online (Liverpool John Moores University)

Universiti Teknologi Malaysia Institutional Repository

Wireless Software Synchronization of Multiple Distributed Cameras

Author: Ansari Sameer
Chen Jiawen
Garg Rahul
Wadhwa Neal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/06/2019
Field of study

We present a method for precisely time-synchronizing the capture of image sequences from a collection of smartphone cameras connected over WiFi. Our method is entirely software-based, has only modest hardware requirements, and achieves an accuracy of less than 250 microseconds on unmodified commodity hardware. It does not use image content and synchronizes cameras prior to capture. The algorithm operates in two stages. In the first stage, we designate one device as the leader and synchronize each client device's clock to it by estimating network delay. Once clocks are synchronized, the second stage initiates continuous image streaming, estimates the relative phase of image timestamps between each client and the leader, and shifts the streams into alignment. We quantitatively validate our results on a multi-camera rig imaging a high-precision LED array and qualitatively demonstrate significant improvements to multi-view stereo depth estimation and stitching of dynamic scenes. We release as open source 'libsoftwaresync', an Android implementation of our system, to inspire new types of collective capture applications.Comment: Main: 9 pages, 10 figures. Supplemental: 3 pages, 5 figure

arXiv.org e-Print Archive

Crossref

SceneFlowFields: Dense Interpolation of Sparse Scene Flow Correspondences

Author: Bailer Christian
Kuschk Georg
Schuster René
Stricker Didier
Wasenmüller Oliver
Publication venue
Publication date: 27/10/2017
Field of study

While most scene flow methods use either variational optimization or a strong rigid motion assumption, we show for the first time that scene flow can also be estimated by dense interpolation of sparse matches. To this end, we find sparse matches across two stereo image pairs that are detected without any prior regularization and perform dense interpolation preserving geometric and motion boundaries by using edge information. A few iterations of variational energy minimization are performed to refine our results, which are thoroughly evaluated on the KITTI benchmark and additionally compared to state-of-the-art on MPI Sintel. For application in an automotive context, we further show that an optional ego-motion model helps to boost performance and blends smoothly into our approach to produce a segmentation of the scene into static and dynamic parts.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV), 201

arXiv.org e-Print Archive

Crossref

High-speed Video from Asynchronous Camera Array

Author: Lu Si
Publication venue
Publication date: 01/01/2019
Field of study

This paper presents a method for capturing high-speed video using an asynchronous camera array. Our method sequentially fires each sensor in a camera array with a small time offset and assembles captured frames into a high-speed video according to the time stamps. The resulting video, however, suffers from parallax jittering caused by the viewpoint difference among sensors in the camera array. To address this problem, we develop a dedicated novel view synthesis algorithm that transforms the video frames as if they were captured by a single reference sensor. Specifically, for any frame from a non-reference sensor, we find the two temporally neighboring frames captured by the reference sensor. Using these three frames, we render a new frame with the same time stamp as the non-reference frame but from the viewpoint of the reference sensor. Specifically, we segment these frames into super-pixels and then apply local content-preserving warping to warp them to form the new frame. We employ a multi-label Markov Random Field method to blend these warped frames. Our experiments show that our method can produce high-quality and high-speed video of a wide variety of scenes with large parallax, scene dynamics, and camera motion and outperforms several baseline and state-of-the-art approaches.Comment: 10 pages, 82 figures, Published at IEEE WACV 201

arXiv.org e-Print Archive

Crossref

PDXScholar (Portland State University)

Neural View-Interpolation for Sparse Light Field Video

Author: Bemana M.
Myszkowski K.
Ritschel T.
Seidel H.
Publication venue
Publication date: 01/01/2019
Field of study

We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution

MPG.PuRe