177 research outputs found

    Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes

    Full text link
    Unsupervised deep learning for optical flow computation has achieved promising results. Most existing deep-net based methods rely on image brightness consistency and local smoothness constraint to train the networks. Their performance degrades at regions where repetitive textures or occlusions occur. In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning. In particular, we investigate multiple ways of enforcing the epipolar constraint in flow estimation. To alleviate a "chicken-and-egg" type of problem encountered in dynamic scenes where multiple motions may be present, we propose a low-rank constraint as well as a union-of-subspaces constraint for training. Experimental results on various benchmarking datasets show that our method achieves competitive performance compared with supervised methods and outperforms state-of-the-art unsupervised deep-learning methods.Comment: CVPR 201

    Deep Planar Parallax for Monocular Depth Estimation

    Full text link
    Recent research has highlighted the utility of Planar Parallax Geometry in monocular depth estimation. However, its potential has yet to be fully realized because networks rely heavily on appearance for depth prediction. Our in-depth analysis reveals that utilizing flow-pretrain can optimize the network's usage of consecutive frame modeling, leading to substantial performance enhancement. Additionally, we propose Planar Position Embedding (PPE) to handle dynamic objects that defy static scene assumptions and to tackle slope variations that are challenging to differentiate. Comprehensive experiments on autonomous driving datasets, namely KITTI and the Waymo Open Dataset (WOD), prove that our Planar Parallax Network (PPNet) significantly surpasses existing learning-based methods in performance

    Güdümsüz derinlik ve hareket kestirmi üzerine detayli bir analiz

    Get PDF
    Recent years have shown unprecedented success in depth estimation by jointly solving unsupervised depth estimation and pose estimation. In this study, we perform a thorough analysis for such an approach. Initially, pose estimation performances of classical techniques, such as COLMAP, are compared against recent unsupervised learning-based techniques. Simulation results indicate the superiority of Bundle Adjustment step in classical techniques. Next, the effect of the number of input frames to the pose estimator network is investigated in detail. The experiments performed at this step revealed that the state-of-the-art can be improved by providing extra frames to the pose estimator network. Finally, the semantic labels of objects in the scene are utilized individually during pose and depth estimation stages. For this purpose, pre-trained semantic segmentation networks are utilized. The effect of computing losses from different regions of the scene and averaging different pose estimations with learnable weights are investigated. The poses and losses corresponding to different semantic classes are summed with learnable weights yielding comparable results against state-of-the-art methods.Derinlik kestirimi konusunda güdümsüz derinlik ve hareket kestirimi yöntemlerinin eşzamanlı eğitimi ile geçmiş yıllarda eşsiz bir başarı sağlanmıştır. Bu çalışmada ise böyle bir yaklaşımın detaylı bir analizi yapılmıştır. Öncelikle, COLMAP [1] gibi klasik yöntemler ile yeni güdümsüz ögrenme tabanlı yaklaşımların hareket kestirimi per formansları karşılaştırılmıştır. Simülasyon sonuçları Demet Düzeltimi tabanlı yöntemlerin üstünlügüne işaret etmektedir. Sonra, hareket kestirimi yapay sinir ağına girdi olarak verilen kare sayısının etkileri detaylıca incelenmiştir. Son teknoloji yaklaşımların fazladan kare saglanarak iyileştirilebileceği bu aşamadaki deneyler ile gös terilmiştir. Son olarak, bir sahnedeki farklı semantik nesnelerden hareket ve derinlik kestirmi sırasında ayrı ayrı yararlanılmıştır. Bu amaçla ise önceden egitilmiş bölüt leme algoritmaları kullanılmıştır. Bir sahnenin farklı semantik sınıflarına ait farklı hareket kestirimlerinin ögrenilebilen katsayılar ile doğrusal kombinasyonunu alma nın etkileri araştırılmıştır. Farklı semantik sınıflara ait olan hareket ve maliyetlerin ögrenilebilen katsayılar ile doğrusal kombinasyonunun alınması ile son teknoloji ile karşılaştırılabilir sonuçlar elde edilmiştir.M.S. - Master of Scienc

    Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation

    Full text link
    Unsupervised learning of optical flow, which leverages the supervision from view synthesis, has emerged as a promising alternative to supervised methods. However, the objective of unsupervised learning is likely to be unreliable in challenging scenes. In this work, we present a framework to use more reliable supervision from transformations. It simply twists the general unsupervised learning pipeline by running another forward pass with transformed data from augmentation, along with using transformed predictions of original data as the self-supervision signal. Besides, we further introduce a lightweight network with multiple frames by a highly-shared flow decoder. Our method consistently gets a leap of performance on several benchmarks with the best accuracy among deep unsupervised methods. Also, our method achieves competitive results to recent fully supervised methods while with much fewer parameters.Comment: Accepted to CVPR 2020, https://github.com/lliuz/ARFlo

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world
    corecore