125 research outputs found
LLA-FLOW: A Lightweight Local Aggregation on Cost Volume for Optical Flow Estimation
Lack of texture often causes ambiguity in matching, and handling this issue
is an important challenge in optical flow estimation. Some methods insert
stacked transformer modules that allow the network to use global information of
cost volume for estimation. But the global information aggregation often incurs
serious memory and time costs during training and inference, which hinders
model deployment. We draw inspiration from the traditional local region
constraint and design the local similarity aggregation (LSA) and the shifted
local similarity aggregation (SLSA). The aggregation for cost volume is
implemented with lightweight modules that act on the feature maps. Experiments
on the final pass of Sintel show the lower cost required for our approach while
maintaining competitive performance
Elucidating the solution space of extended reverse-time SDE for diffusion models
Diffusion models (DMs) demonstrate potent image generation capabilities in
various generative modeling tasks. Nevertheless, their primary limitation lies
in slow sampling speed, requiring hundreds or thousands of sequential function
evaluations through large neural networks to generate high-quality images.
Sampling from DMs can be seen alternatively as solving corresponding stochastic
differential equations (SDEs) or ordinary differential equations (ODEs). In
this work, we formulate the sampling process as an extended reverse-time SDE
(ER SDE), unifying prior explorations into ODEs and SDEs. Leveraging the
semi-linear structure of ER SDE solutions, we offer exact solutions and
arbitrarily high-order approximate solutions for VP SDE and VE SDE,
respectively. Based on the solution space of the ER SDE, we yield mathematical
insights elucidating the superior performance of ODE solvers over SDE solvers
in terms of fast sampling. Additionally, we unveil that VP SDE solvers stand on
par with their VE SDE counterparts. Finally, we devise fast and training-free
samplers, ER-SDE-Solvers, achieving state-of-the-art performance across all
stochastic samplers. Experimental results demonstrate achieving 3.45 FID in 20
function evaluations and 2.24 FID in 50 function evaluations on the ImageNet
dataset
Dual Correlation Network for Efficient Video Semantic Segmentation
Video data bring a big challenge to semantic segmentation due to the large volume of data and strong inter-frame redundancy. In this paper, we propose a dual local and global correlation network tailored for efficient video semantic segmentation. It consists of three modules: 1) a local attention based module, which measures correlation and achieves feature aggregation in a local region between key frame and non-key frame; 2) a consistent constraint module, which considers long-range correlation among pixels from a global view for promoting intra-frame semantic consistency of non-key frame; and 3) a key frame decision module, which selects key frames adaptively based on the ability of feature transferring. Extensive experiments on the Cityscapes and Camvid video datasets demonstrate that our proposed method could reduce inference time significantly while maintaining high accuracy. The implementation is available at https://github.com/An01168/DCNVSS
An {\alpha}-Matte Boundary Defocus Model Based Cascaded Network for Multi-focus Image Fusion
Capturing an all-in-focus image with a single camera is difficult since the
depth of field of the camera is usually limited. An alternative method to
obtain the all-in-focus image is to fuse several images focusing at different
depths. However, existing multi-focus image fusion methods cannot obtain clear
results for areas near the focused/defocused boundary (FDB). In this paper, a
novel {\alpha}-matte boundary defocus model is proposed to generate realistic
training data with the defocus spread effect precisely modeled, especially for
areas near the FDB. Based on this {\alpha}-matte defocus model and the
generated data, a cascaded boundary aware convolutional network termed MMF-Net
is proposed and trained, aiming to achieve clearer fusion results around the
FDB. More specifically, the MMF-Net consists of two cascaded sub-nets for
initial fusion and boundary fusion, respectively; these two sub-nets are
designed to first obtain a guidance map of FDB and then refine the fusion near
the FDB. Experiments demonstrate that with the help of the new {\alpha}-matte
boundary defocus model, the proposed MMF-Net outperforms the state-of-the-art
methods both qualitatively and quantitatively.Comment: 10 pages, 8 figures, journal Unfortunately, I cannot spell one of the
authors' name coorectl
- …