177 research outputs found
Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes
Unsupervised deep learning for optical flow computation has achieved
promising results. Most existing deep-net based methods rely on image
brightness consistency and local smoothness constraint to train the networks.
Their performance degrades at regions where repetitive textures or occlusions
occur. In this paper, we propose Deep Epipolar Flow, an unsupervised optical
flow method which incorporates global geometric constraints into network
learning. In particular, we investigate multiple ways of enforcing the epipolar
constraint in flow estimation. To alleviate a "chicken-and-egg" type of problem
encountered in dynamic scenes where multiple motions may be present, we propose
a low-rank constraint as well as a union-of-subspaces constraint for training.
Experimental results on various benchmarking datasets show that our method
achieves competitive performance compared with supervised methods and
outperforms state-of-the-art unsupervised deep-learning methods.Comment: CVPR 201
Deep Planar Parallax for Monocular Depth Estimation
Recent research has highlighted the utility of Planar Parallax Geometry in
monocular depth estimation. However, its potential has yet to be fully realized
because networks rely heavily on appearance for depth prediction. Our in-depth
analysis reveals that utilizing flow-pretrain can optimize the network's usage
of consecutive frame modeling, leading to substantial performance enhancement.
Additionally, we propose Planar Position Embedding (PPE) to handle dynamic
objects that defy static scene assumptions and to tackle slope variations that
are challenging to differentiate. Comprehensive experiments on autonomous
driving datasets, namely KITTI and the Waymo Open Dataset (WOD), prove that our
Planar Parallax Network (PPNet) significantly surpasses existing learning-based
methods in performance
Güdümsüz derinlik ve hareket kestirmi üzerine detayli bir analiz
Recent years have shown unprecedented success in depth estimation by jointly solving unsupervised depth estimation and pose estimation. In this study, we perform a thorough analysis for such an approach. Initially, pose estimation performances of classical techniques, such as COLMAP, are compared against recent unsupervised learning-based techniques. Simulation results indicate the superiority of Bundle Adjustment step in classical techniques. Next, the effect of the number of input frames to the pose estimator network is investigated in detail. The experiments performed at this step revealed that the state-of-the-art can be improved by providing extra frames to the pose estimator network. Finally, the semantic labels of objects in the scene are utilized individually during pose and depth estimation stages. For this purpose, pre-trained semantic segmentation networks are utilized. The effect of computing losses from different regions of the scene and averaging different pose estimations with learnable weights are investigated. The poses and losses corresponding to different semantic classes are summed with learnable weights yielding comparable results against state-of-the-art methods.Derinlik kestirimi konusunda güdümsüz derinlik ve hareket kestirimi yöntemlerinin eşzamanlı eğitimi ile geçmiş yıllarda eşsiz bir başarı sağlanmıştır. Bu çalışmada ise böyle bir yaklaşımın detaylı bir analizi yapılmıştır. Öncelikle, COLMAP [1] gibi klasik yöntemler ile yeni güdümsüz ögrenme tabanlı yaklaşımların hareket kestirimi per formansları karşılaştırılmıştır. Simülasyon sonuçları Demet Düzeltimi tabanlı yöntemlerin üstünlügüne işaret etmektedir. Sonra, hareket kestirimi yapay sinir ağına girdi olarak verilen kare sayısının etkileri detaylıca incelenmiştir. Son teknoloji yaklaşımların fazladan kare saglanarak iyileştirilebileceği bu aşamadaki deneyler ile gös terilmiştir. Son olarak, bir sahnedeki farklı semantik nesnelerden hareket ve derinlik kestirmi sırasında ayrı ayrı yararlanılmıştır. Bu amaçla ise önceden egitilmiş bölüt leme algoritmaları kullanılmıştır. Bir sahnenin farklı semantik sınıflarına ait farklı hareket kestirimlerinin ögrenilebilen katsayılar ile doğrusal kombinasyonunu alma nın etkileri araştırılmıştır. Farklı semantik sınıflara ait olan hareket ve maliyetlerin ögrenilebilen katsayılar ile doğrusal kombinasyonunun alınması ile son teknoloji ile karşılaştırılabilir sonuçlar elde edilmiştir.M.S. - Master of Scienc
Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation
Unsupervised learning of optical flow, which leverages the supervision from
view synthesis, has emerged as a promising alternative to supervised methods.
However, the objective of unsupervised learning is likely to be unreliable in
challenging scenes. In this work, we present a framework to use more reliable
supervision from transformations. It simply twists the general unsupervised
learning pipeline by running another forward pass with transformed data from
augmentation, along with using transformed predictions of original data as the
self-supervision signal. Besides, we further introduce a lightweight network
with multiple frames by a highly-shared flow decoder. Our method consistently
gets a leap of performance on several benchmarks with the best accuracy among
deep unsupervised methods. Also, our method achieves competitive results to
recent fully supervised methods while with much fewer parameters.Comment: Accepted to CVPR 2020, https://github.com/lliuz/ARFlo
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
- …