1,799 research outputs found
Decoupled Diffusion Models with Explicit Transition Probability
Recent diffusion probabilistic models (DPMs) have shown remarkable abilities
of generated content, however, they often suffer from complex forward
processes, resulting in inefficient solutions for the reversed process and
prolonged sampling times. In this paper, we aim to address the aforementioned
challenges by focusing on the diffusion process itself that we propose to
decouple the intricate diffusion process into two comparatively simpler process
to improve the generative efficacy and speed. In particular, we present a novel
diffusion paradigm named DDM (Decoupled Diffusion Models) based on the Ito
diffusion process, in which the image distribution is approximated by an
explicit transition probability while the noise path is controlled by the
standard Wiener process. We find that decoupling the diffusion process reduces
the learning difficulty and the explicit transition probability improves the
generative speed significantly. We prove a new training objective for DPM,
which enables the model to learn to predict the noise and image components
separately. Moreover, given the novel forward diffusion equation, we derive the
reverse denoising formula of DDM that naturally supports fewer steps of
generation without ordinary differential equation (ODE) based accelerators. Our
experiments demonstrate that DDM outperforms previous DPMs by a large margin in
fewer function evaluations setting and gets comparable performances in long
function evaluations setting. We also show that our framework can be applied to
image-conditioned generation and high-resolution image synthesis, and that it
can generate high-quality images with only 10 function evaluations
Tensorformer: Normalized Matrix Attention Transformer for High-quality Point Cloud Reconstruction
Surface reconstruction from raw point clouds has been studied for decades in
the computer graphics community, which is highly demanded by modeling and
rendering applications nowadays. Classic solutions, such as Poisson surface
reconstruction, require point normals as extra input to perform reasonable
results. Modern transformer-based methods can work without normals, while the
results are less fine-grained due to limited encoding performance in local
fusion from discrete points. We introduce a novel normalized matrix attention
transformer (Tensorformer) to perform high-quality reconstruction. The proposed
matrix attention allows for simultaneous point-wise and channel-wise message
passing, while the previous vector attention loses neighbor point information
across different channels. It brings more degree of freedom in feature learning
and thus facilitates better modeling of local geometries. Our method achieves
state-of-the-art on two commonly used datasets, ShapeNetCore and ABC, and
attains 4% improvements on IOU on ShapeNet. Our implementation will be released
upon acceptance
2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds
The commonly adopted detect-then-match approach to registration finds
difficulties in the cross-modality cases due to the incompatible keypoint
detection and inconsistent feature description. We propose, 2D3D-MATR, a
detection-free method for accurate and robust registration between images and
point clouds. Our method adopts a coarse-to-fine pipeline where it first
computes coarse correspondences between downsampled patches of the input image
and the point cloud and then extends them to form dense correspondences between
pixels and points within the patch region. The coarse-level patch matching is
based on transformer which jointly learns global contextual constraints with
self-attention and cross-modality correlations with cross-attention. To resolve
the scale ambiguity in patch matching, we construct a multi-scale pyramid for
each image patch and learn to find for each point patch the best matching image
patch at a proper resolution level. Extensive experiments on two public
benchmarks demonstrate that 2D3D-MATR outperforms the previous state-of-the-art
P2-Net by around percentage points on inlier ratio and over points on
registration recall. Our code and models are available at
https://github.com/minhaolee/2D3DMATR.Comment: Accepted by ICCV 202
- …