2,241 research outputs found
Cascaded Scene Flow Prediction using Semantic Segmentation
Given two consecutive frames from a pair of stereo cameras, 3D scene flow
methods simultaneously estimate the 3D geometry and motion of the observed
scene. Many existing approaches use superpixels for regularization, but may
predict inconsistent shapes and motions inside rigidly moving objects. We
instead assume that scenes consist of foreground objects rigidly moving in
front of a static background, and use semantic cues to produce pixel-accurate
scene flow estimates. Our cascaded classification framework accurately models
3D scenes by iteratively refining semantic segmentation masks, stereo
correspondences, 3D rigid motion estimates, and optical flow fields. We
evaluate our method on the challenging KITTI autonomous driving benchmark, and
show that accounting for the motion of segmented vehicles leads to
state-of-the-art performance.Comment: International Conference on 3D Vision (3DV), 2017 (oral presentation
HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
Diffusion models have emerged as the best approach for generative modeling of
2D images. Part of their success is due to the possibility of training them on
millions if not billions of images with a stable learning objective. However,
extending these models to 3D remains difficult for two reasons. First, finding
a large quantity of 3D training data is much more complex than for 2D images.
Second, while it is conceptually trivial to extend the models to operate on 3D
rather than 2D grids, the associated cubic growth in memory and compute
complexity makes this infeasible. We address the first challenge by introducing
a new diffusion setup that can be trained, end-to-end, with only posed 2D
images for supervision; and the second challenge by proposing an image
formation model that decouples model memory from spatial memory. We evaluate
our method on real-world data, using the CO3D dataset which has not been used
to train 3D generative models before. We show that our diffusion models are
scalable, train robustly, and are competitive in terms of sample quality and
fidelity to existing approaches for 3D generative modeling.Comment: CVPR 2023 conference; project page at:
https://holodiffusion.github.io
DiTTO: Diffusion-inspired Temporal Transformer Operator
Solving partial differential equations (PDEs) using a data-driven approach
has become increasingly common. The recent development of the operator learning
paradigm has enabled the solution of a broader range of PDE-related problems.
We propose an operator learning method to solve time-dependent PDEs
continuously in time without needing any temporal discretization. The proposed
approach, named DiTTO, is inspired by latent diffusion models. While diffusion
models are usually used in generative artificial intelligence tasks, their
time-conditioning mechanism is extremely useful for PDEs. The
diffusion-inspired framework is combined with elements from the Transformer
architecture to improve its capabilities.
We demonstrate the effectiveness of the new approach on a wide variety of
PDEs in multiple dimensions, namely the 1-D Burgers' equation, 2-D
Navier-Stokes equations, and the acoustic wave equation in 2-D and 3-D. DiTTO
achieves state-of-the-art results in terms of accuracy for these problems. We
also present a method to improve the performance of DiTTO by using fast
sampling concepts from diffusion models. Finally, we show that DiTTO can
accurately perform zero-shot super-resolution in time
- …