1,662 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    DynamicSurf: Dynamic Neural RGB-D Surface Reconstruction with an Optimizable Feature Grid

    Full text link
    We propose DynamicSurf, a model-free neural implicit surface reconstruction method for high-fidelity 3D modelling of non-rigid surfaces from monocular RGB-D video. To cope with the lack of multi-view cues in monocular sequences of deforming surfaces, one of the most challenging settings for 3D reconstruction, DynamicSurf exploits depth, surface normals, and RGB losses to improve reconstruction fidelity and optimisation time. DynamicSurf learns a neural deformation field that maps a canonical representation of the surface geometry to the current frame. We depart from current neural non-rigid surface reconstruction models by designing the canonical representation as a learned feature grid which leads to faster and more accurate surface reconstruction than competing approaches that use a single MLP. We demonstrate DynamicSurf on public datasets and show that it can optimize sequences of varying frames with 6Ă—6\times speedup over pure MLP-based approaches while achieving comparable results to the state-of-the-art methods. Project is available at https://mirgahney.github.io//DynamicSurf.io/

    Traffic Scene Perception for Automated Driving with Top-View Grid Maps

    Get PDF
    Ein automatisiertes Fahrzeug muss sichere, sinnvolle und schnelle Entscheidungen auf Basis seiner Umgebung treffen. Dies benötigt ein genaues und recheneffizientes Modell der Verkehrsumgebung. Mit diesem Umfeldmodell sollen Messungen verschiedener Sensoren fusioniert, gefiltert und nachfolgenden Teilsysteme als kompakte, aber aussagekräftige Information bereitgestellt werden. Diese Arbeit befasst sich mit der Modellierung der Verkehrsszene auf Basis von Top-View Grid Maps. Im Vergleich zu anderen Umfeldmodellen ermöglichen sie eine frühe Fusion von Distanzmessungen aus verschiedenen Quellen mit geringem Rechenaufwand sowie eine explizite Modellierung von Freiraum. Nach der Vorstellung eines Verfahrens zur Bodenoberflächenschätzung, das die Grundlage der Top-View Modellierung darstellt, werden Methoden zur Belegungs- und Elevationskartierung für Grid Maps auf Basis von mehreren, verrauschten, teilweise widersprüchlichen oder fehlenden Distanzmessungen behandelt. Auf der resultierenden, sensorunabhängigen Repräsentation werden anschließend Modelle zur Detektion von Verkehrsteilnehmern sowie zur Schätzung von Szenenfluss, Odometrie und Tracking-Merkmalen untersucht. Untersuchungen auf öffentlich verfügbaren Datensätzen und einem Realfahrzeug zeigen, dass Top-View Grid Maps durch on-board LiDAR Sensorik geschätzt und verlässlich sicherheitskritische Umgebungsinformationen wie Beobachtbarkeit und Befahrbarkeit abgeleitet werden können. Schließlich werden Verkehrsteilnehmer als orientierte Bounding Boxen mit semantischen Klassen, Geschwindigkeiten und Tracking-Merkmalen aus einem gemeinsamen Modell zur Objektdetektion und Flussschätzung auf Basis der Top-View Grid Maps bestimmt

    Self-Supervised Learning for Geometry

    Get PDF
    This thesis focuses on two fundamental problems in robotic vision, scene geometry understanding and camera tracking. While both tasks have been the subject of research in robotic vision, numerous geometric solutions have been proposed in the past decades. In this thesis, we cast the geometric problems as machine learning problems, specifically, deep learning problems. Differ from conventional supervised learning methods that using expensive annotations as the supervisory signal, we advocate for the use of geometry as a supervisory signal to improve the perceptual capabilities in robots, namely Geometry Self-supervision. With the geometry self-supervision, we allow robots to learn and infer the 3D structure of the scene and ego-motion by watching videos, instead of expensive ground-truth annotation in traditional supervised learning problems. Followed by showing the use of geometry for deep learning, we show the possibilities of integrating self-supervised models with traditional geometry-based methods as a hybrid solution for solving the mapping and tracking problem. We focus on an end-to-end mapping problem from stereo data in the first part of this thesis, namely Deep Stereo Matching. Stereo matching is one of the oldest problems in computer vision. Classical approaches to stereo matching typically rely on handcrafted features and a multiple-step solution. Recent deep learning methods utilize deep neural networks to achieve end-to-end trained approaches while significantly outperforming classic methods. We propose a novel data acquisition pipeline using an untethered device (Microsoft HoloLens) with a Time-of-Flight (ToF) depth camera and stereo cameras to collect real-world data. A novel semi-supervised method is proposed to train networks with ground-truth supervision and self-supervision. The large scale real-world stereo dataset with semi-dense annotation and dense self-supervision allow our deep stereo matching network to generalize better when compared to prior arts. Mapping and tracking using a single camera (Monocular) is a harder problem when compared to that using a stereo camera due to varies well-known challenges. In the second part of this thesis, We decouple the problem into single view depth estimation (mapping) and two view visual odometry (tracking) and propose a self-supervised framework, namely SelfTAM, which jointly learns the depth estimator and the odometry estimator. The self-supervised problem is usually formulated as an energy minimization problem consist of an energy of data consistency in multi-view (e.g. photometric) and an energy of prior regularization (e.g. depth smoothness prior). We strengthen the supervision signal with a deep feature consistency energy term and a surface normal regularization term. Though our method trains models with stereo sequence such that a real-world scaling factor is naturally incorporated, only monocular data is required in the inference stage. In the last part of this thesis, we revisit the basics of visual odometry and explore the best practice to integrate deep learning models with geometry-based visual odometry methods. A robust visual odometry system, DF-VO, is proposed. We use deep networks to establish 2D-2D/3D-2D correspondences and pick the best correspondences from the dense predictions. Feeding the high-quality correspondences into traditional VO methods, e.g. Epipolar Geometry and Prospective-n-Points, we can solve visual odometry problem within a more robust framework. With the proposed self-supervised training, we can even allow the models to perform online adaptation in the run-time and take a step toward a lifelong learning visual odometry system.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    • …
    corecore