20 research outputs found
There and Back Again: Self-supervised Multispectral Correspondence Estimation
Across a wide range of applications, from autonomous vehicles to medical
imaging, multi-spectral images provide an opportunity to extract additional
information not present in color images. One of the most important steps in
making this information readily available is the accurate estimation of dense
correspondences between different spectra.
Due to the nature of cross-spectral images, most correspondence solving
techniques for the visual domain are simply not applicable. Furthermore, most
cross-spectral techniques utilize spectra-specific characteristics to perform
the alignment. In this work, we aim to address the dense correspondence
estimation problem in a way that generalizes to more than one spectrum. We do
this by introducing a novel cycle-consistency metric that allows us to
self-supervise. This, combined with our spectra-agnostic loss functions, allows
us to train the same network across multiple spectra.
We demonstrate our approach on the challenging task of dense RGB-FIR
correspondence estimation. We also show the performance of our unmodified
network on the cases of RGB-NIR and RGB-RGB, where we achieve higher accuracy
than similar self-supervised approaches. Our work shows that cross-spectral
correspondence estimation can be solved in a common framework that learns to
generalize alignment across spectra
Algorithms for multi-frame image super-resolution under applicative noise based on deep neural networks
Π Π°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°ΡΡΡΡ Π°Π»Π³ΠΎΡΠΈΡΠΌΡ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΠΌΠ½ΠΎΠ³ΠΎΠΊΠ°Π΄ΡΠΎΠ²ΠΎΠ³ΠΎ ΡΠ²Π΅ΡΡ
ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΡ, ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡΡΠΈΠ΅ Π²ΠΎΡΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°ΡΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Ρ Π²ΡΡΠΎΠΊΠΈΠΌ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΠ΅ΠΌ Π·Π° ΡΡΠ΅Ρ Π½Π°ΠΊΠΎΠΏΠ»Π΅Π½ΠΈΡ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Ρ Π½ΠΈΠ·ΠΊΠΈΠΌ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΠ΅ΠΌ Π² ΡΡΠ»ΠΎΠ²ΠΈΡΡ
Π°ΠΏΠΏΠ»ΠΈΠΊΠ°ΡΠΈΠ²Π½ΡΡ
ΠΏΠΎΠΌΠ΅Ρ
. ΠΠΎΠ·Π΄Π΅ΠΉΡΡΠ²ΠΈΠ΅ Π°ΠΏΠΏΠ»ΠΈΠΊΠ°ΡΠΈΠ²Π½ΡΡ
ΠΏΠΎΠΌΠ΅Ρ
ΠΏΡΠΎΡΠ²Π»ΡΠ΅ΡΡΡ Π² ΠΏΠΎΡΠ²Π»Π΅Π½ΠΈΠΈ Π»ΠΎΠΊΠ°Π»ΡΠ½ΡΡ
ΡΡΠ°ΡΡΠΊΠΎΠ² Π°Π½ΠΎΠΌΠ°Π»ΡΠ½ΡΡ
Π½Π°Π±Π»ΡΠ΄Π΅Π½ΠΈΠΉ Π½Π° ΠΊΠ°ΠΆΠ΄ΠΎΠΌ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΈ ΠΈ ΡΠ°ΠΊΠΆΠ΅ ΡΠ²Π»ΡΠ΅ΡΡΡ ΡΠ°ΠΊΡΠΎΡΠΎΠΌ ΠΏΠΎΠ½ΠΈΠΆΠ΅Π½ΠΈΡ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΡ. Π Π΅ΡΠ΅Π½ΠΈΡ Π΄Π°Π½Π½ΠΎΠΉ Π·Π°Π΄Π°ΡΠΈ Π΄ΠΎ Π½Π°ΡΡΠΎΡΡΠ΅Π³ΠΎ Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΡΠ΄Π΅Π»ΡΠ»ΠΎΡΡ Π½Π΅Π΄ΠΎΡΡΠ°ΡΠΎΡΠ½ΠΎ Π²Π½ΠΈΠΌΠ°Π½ΠΈΡ, ΠΏΡΠΈ ΡΡΠΎΠΌ ΠΏΠ΅ΡΡΠΏΠ΅ΠΊΡΠΈΠ²Π½ΡΠΌ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠΌ Π΄Π»Ρ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ, Π²ΠΊΠ»ΡΡΠ°Ρ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΠ΅ ΠΌΠ½ΠΎΠ³ΠΎΠΊΠ°Π΄ΡΠΎΠ²ΠΎΠ³ΠΎ ΡΠ²Π΅ΡΡ
ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΡ, ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ Π³Π»ΡΠ±ΠΎΠΊΠΈΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ. Π ΡΠ°Π±ΠΎΡΠ΅ ΡΠ°ΡΡΠΌΠΎΡΡΠ΅Π½Ρ ΡΡΡΠ΅ΡΡΠ²ΡΡΡΠΈΠ΅ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Ρ ΠΊ ΡΠ΅ΡΠ΅Π½ΠΈΡ Π΄Π°Π½Π½ΠΎΠΉ Π·Π°Π΄Π°ΡΠΈ ΠΈ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ Π½ΠΎΠ²ΡΠΉ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡΠΉ Π½Π° ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠΈ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΈΡ
ΡΠ²ΡΡΡΠΎΡΠ½ΡΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ. ΠΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΡΡ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΠΌΠΎΠ³ΠΎ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π° ΠΈ ΡΠ΅Π°Π»ΠΈΠ·ΡΠ΅ΠΌΡΡ
Π½Π° Π΅Π³ΠΎ ΠΎΡΠ½ΠΎΠ²Π΅ Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ² ΡΠ²Π»ΡΠ΅ΡΡΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΠ΅ ΠΈΡΠ΅ΡΠ°ΡΠΈΠ²Π½ΠΎΠΉ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π²Ρ
ΠΎΠ΄Π½ΠΎΠΉ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Ρ Π½ΠΈΠ·ΠΊΠΈΠΌ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΠ΅ΠΌ Ρ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ΠΌ Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ Π½Π° ΡΠ°Π·Π½ΡΡ
ΡΡΠ°ΠΏΠ°Ρ
ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ, Π²ΠΊΠ»ΡΡΠ°Ρ ΡΠ΅Π³ΠΈΡΡΡΠ°ΡΠΈΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π½ΠΈΠ·ΠΊΠΎΠ³ΠΎ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΡ, ΡΠ΅Π³ΠΌΠ΅Π½ΡΠ°ΡΠΈΡ ΠΈ Π²ΡΡΠ²Π»Π΅Π½ΠΈΠ΅ ΡΡΠ°ΡΡΠΊΠΎΠ², ΠΏΠΎΡΠ°ΠΆΠ΅Π½Π½ΡΡ
Π°ΠΏΠΏΠ»ΠΈΠΊΠ°ΡΠΈΠ²Π½ΡΠΌΠΈ ΠΏΠΎΠΌΠ΅Ρ
Π°ΠΌΠΈ, Π° ΡΠ°ΠΊΠΆΠ΅ ΠΏΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΡ, Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½Π½ΡΠ΅ Π½Π΅ΠΏΠΎΡΡΠ΅Π΄ΡΡΠ²Π΅Π½Π½ΠΎ Π½Π° ΠΏΠΎΠ²ΡΡΠ΅Π½ΠΈΠ΅ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΡ. ΠΠ°Π½Π½ΡΠΉ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΠΊΠΎΠΌΠ±ΠΈΠ½ΠΈΡΠΎΠ²Π°ΡΡ ΡΠΈΠ»ΡΠ½ΡΠ΅ ΡΡΠΎΡΠΎΠ½Ρ ΡΡΡΠ΅ΡΡΠ²ΡΡΡΠΈΡ
Π°Π½Π°Π»ΠΎΠ³ΠΎΠ² ΠΈ ΡΡΡΡΠ°Π½ΠΈΡΡ ΠΈΡ
ΠΎΡΠ½ΠΎΠ²Π½ΡΠ΅ Π½Π΅Π΄ΠΎΡΡΠ°ΡΠΊΠΈ, ΡΠ²ΡΠ·Π°Π½Π½ΡΠ΅ Ρ Π½Π΅ΠΎΠ±Ρ
ΠΎΠ΄ΠΈΠΌΠΎΡΡΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΠΏΡΠΈΠ±Π»ΠΈΠΆΠ΅Π½Π½ΡΡ
ΠΌΠ°ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π΄Π°Π½Π½ΡΡ
, ΠΊΠΎΡΠΎΡΡΠ΅ ΡΡΠ΅Π±ΡΡΡΡΡ Π΄Π»Ρ ΡΠΈΠ½ΡΠ΅Π·Π° Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ² ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π² ΡΠ°ΠΌΠΊΠ°Ρ
ΡΡΠ°ΡΠΈΡΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΡΠ΅ΠΎΡΠΈΠΈ ΡΠ΅ΡΠ΅Π½ΠΈΠΉ. ΠΠ»Ρ ΠΎΠ±Π½ΠΎΠ²Π»Π΅Π½ΠΈΡ ΡΠ΅ΠΊΡΡΠ΅ΠΉ ΠΎΡΠ΅Π½ΠΊΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Π²ΡΡΠΎΠΊΠΎΠ³ΠΎ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΡ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π° ΡΠΏΠ΅ΡΠΈΠ°Π»ΡΠ½Π°Ρ ΡΠ²ΡΡΡΠΎΡΠ½Π°Ρ Π½Π΅ΠΉΡΠΎΠ½Π½Π°Ρ ΡΠ΅ΡΡ, ΠΎΡΠ³Π°Π½ΠΈΠ·ΠΎΠ²Π°Π½Π½Π°Ρ Π² Π²ΠΈΠ΄Π΅ Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½Π½ΠΎΠ³ΠΎ Π°ΡΠΈΠΊΠ»ΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π³ΡΠ°ΡΠ°. ΠΡΠΎΠ²Π΅Π΄Π΅Π½Ρ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Π»ΡΠ½ΡΠ΅ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ, ΠΏΠΎΠΊΠ°Π·Π°Π²ΡΠΈΠ΅ ΡΠ°Π±ΠΎΡΠΎΡΠΏΠΎΡΠΎΠ±Π½ΠΎΡΡΡ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΈ Π΅Π³ΠΎ ΠΏΡΠ΅ΠΈΠΌΡΡΠ΅ΡΡΠ²ΠΎ ΠΏΠΎ ΡΠΎΡΠ½ΠΎΡΡΠΈ Π²ΠΎΡΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½ΠΈΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Ρ Π²ΡΡΠΎΠΊΠΈΠΌ ΡΠ°Π·ΡΠ΅ΡΠ΅Π½ΠΈΠ΅ΠΌ ΠΏΠΎ ΡΡΠ°Π²Π½Π΅Π½ΠΈΡ Ρ Π°Π»ΡΡΠ΅ΡΠ½Π°ΡΠΈΠ²Π½ΡΠΌΠΈ Π²Π°ΡΠΈΠ°Π½ΡΠ°ΠΌΠΈ ΡΠ΅ΡΠ΅Π½ΠΈΡ Π·Π°Π΄Π°ΡΠΈ
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network
architecture for optical flow. RAFT extracts per-pixel features, builds
multi-scale 4D correlation volumes for all pairs of pixels, and iteratively
updates a flow field through a recurrent unit that performs lookups on the
correlation volumes. RAFT achieves state-of-the-art performance. On KITTI, RAFT
achieves an F1-all error of 5.10%, a 16% error reduction from the best
published result (6.10%). On Sintel (final pass), RAFT obtains an
end-point-error of 2.855 pixels, a 30% error reduction from the best published
result (4.098 pixels). In addition, RAFT has strong cross-dataset
generalization as well as high efficiency in inference time, training speed,
and parameter count. Code is available at https://github.com/princeton-vl/RAFT.Comment: fixed a formatting issue, Eq 7. no change in conten
Probabilistic Pixel-Adaptive Refinement Networks
Encoder-decoder networks have found widespread use in various dense
prediction tasks. However, the strong reduction of spatial resolution in the
encoder leads to a loss of location information as well as boundary artifacts.
To address this, image-adaptive post-processing methods have shown beneficial
by leveraging the high-resolution input image(s) as guidance data. We extend
such approaches by considering an important orthogonal source of information:
the network's confidence in its own predictions. We introduce probabilistic
pixel-adaptive convolutions (PPACs), which not only depend on image guidance
data for filtering, but also respect the reliability of per-pixel predictions.
As such, PPACs allow for image-adaptive smoothing and simultaneously
propagating pixels of high confidence into less reliable regions, while
respecting object boundaries. We demonstrate their utility in refinement
networks for optical flow and semantic segmentation, where PPACs lead to a
clear reduction in boundary artifacts. Moreover, our proposed refinement step
is able to substantially improve the accuracy on various widely used
benchmarks.Comment: To appear at CVPR 202