21 research outputs found

    EV-IMO: Motion Segmentation Dataset and Learning Pipeline for Event Cameras

    Full text link
    We present the first event-based learning approach for motion segmentation in indoor scenes and the first event-based dataset - EV-IMO - which includes accurate pixel-wise motion masks, egomotion and ground truth depth. Our approach is based on an efficient implementation of the SfM learning pipeline using a low parameter neural network architecture on event data. In addition to camera egomotion and a dense depth map, the network estimates pixel-wise independently moving object segmentation and computes per-object 3D translational velocities for moving objects. We also train a shallow network with just 40k parameters, which is able to compute depth and egomotion. Our EV-IMO dataset features 32 minutes of indoor recording with up to 3 fast moving objects simultaneously in the camera field of view. The objects and the camera are tracked by the VICON motion capture system. By 3D scanning the room and the objects, accurate depth map ground truth and pixel-wise object masks are obtained, which are reliable even in poor lighting conditions and during fast motion. We then train and evaluate our learning pipeline on EV-IMO and demonstrate that our approach far surpasses its rivals and is well suited for scene constrained robotics applications.Comment: 8 pages, 6 figures. Submitted to 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019

    BIO-INSPIRED MOTION PERCEPTION: FROM GANGLION CELLS TO AUTONOMOUS VEHICLES

    Get PDF
    Animals are remarkable at navigation, even in extreme situations. Through motion perception, animals compute their own movements (egomotion) and find other objects (prey, predator, obstacles) and their motions in the environment. Analogous to animals, artificial systems such as robots also need to know where they are relative to structure and segment obstacles to avoid collisions. Even though substantial progress has been made in the development of artificial visual systems, they still struggle to achieve robust and generalizable solutions. To this end, I propose a bio-inspired framework that narrows the gap between natural and artificial systems. The standard approaches in robot motion perception seek to reconstruct a three-dimensional model of the scene and then use this model to estimate egomotion and object segmentation. However, the scene reconstruction process is data-heavy and computationally expensive and fails to deal with high-speed and dynamic scenarios. On the contrary, biological visual systems excel in the aforementioned difficult situation by extracting only minimal information sufficient for motion perception tasks. I derive minimalist/purposive ideas from biological processes throughout this thesis and develop mathematical solutions for robot motion perception problems. In this thesis, I develop a full range of solutions that utilize bio-inspired motion representation and learning approaches for motion perception tasks. Particularly, I focus on egomotion estimation and motion segmentation tasks. I have four main contributions: 1. First, I introduce NFlowNet, a neural network to estimate normal flow (bio-inspired motion filters). Normal flow estimation presents a new avenue for solving egomotion in a robust and qualitative framework. 2. Utilizing normal flow, I propose the DiffPoseNet framework to estimate egomotion by formulating the qualitative constraint in a differentiable optimization layer, which allows for end-to-end learning. 3. Further, utilizing a neuromorphic event camera, a retina-inspired vision sensor, I develop 0-MMS, a model-based optimization approach that employs event spikes to segment the scene into multiple moving parts in high-speed dynamic lighting scenarios. 4. To improve the precision of event-based motion perception across time, I develop SpikeMS, a novel bio-inspired learning approach that fully capitalizes on the rich temporal information in event spikes

    Object Detection with Deep Learning to Accelerate Pose Estimation for Automated Aerial Refueling

    Get PDF
    Remotely piloted aircraft (RPAs) cannot currently refuel during flight because the latency between the pilot and the aircraft is too great to safely perform aerial refueling maneuvers. However, an AAR system removes this limitation by allowing the tanker to directly control the RP A. The tanker quickly finding the relative position and orientation (pose) of the approaching aircraft is the first step to create an AAR system. Previous work at AFIT demonstrates that stereo camera systems provide robust pose estimation capability. This thesis first extends that work by examining the effects of the cameras\u27 resolution on the quality of pose estimation. Next, it demonstrates a deep learning approach to accelerate the pose estimation process. The results show that this pose estimation process is precise and fast enough to safely perform AAR

    Learned Inertial Odometry for Autonomous Drone Racing

    Full text link
    Inertial odometry is an attractive solution to the problem of state estimation for agile quadrotor flight. It is inexpensive, lightweight, and it is not affected by perceptual degradation. However, only relying on the integration of the inertial measurements for state estimation is infeasible. The errors and time-varying biases present in such measurements cause the accumulation of large drift in the pose estimates. Recently, inertial odometry has made significant progress in estimating the motion of pedestrians. State-of-the-art algorithms rely on learning a motion prior that is typical of humans but cannot be transferred to drones. In this work, we propose a learning-based odometry algorithm that uses an inertial measurement unit (IMU) as the only sensor modality for autonomous drone racing tasks. The core idea of our system is to couple a model-based filter, driven by the inertial measurements, with a learning-based module that has access to the control commands. We show that our inertial odometry algorithm is superior to the state-of-the-art filter-based and optimization-based visual- inertial odometry as well as the state-of-the-art learned-inertial odometry. Additionally, we show that our system is comparable to a visual-inertial odometry solution that uses a camera and exploits the known gate location and appearance. We believe that the application in autonomous drone racing paves the way for novel research in inertial odometry for agile quadrotor flight. We will release the code upon acceptance

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    LEARNING OF DENSE OPTICAL FLOW, MOTION AND DEPTH, FROM SPARSE EVENT CAMERAS

    Get PDF
    With recent advances in the field of autonomous driving, autonomous agents need to safely navigate around humans or other moving objects in unconstrained, highly dynamic environments. In this thesis, we demonstrate the feasibility of reconstructing dense depth, optical flow and motion information from a neuromorphic imaging device, called Dynamic Vision Sensor (DVS). The DVS only records sparse and asynchronous events when the changes of lighting occur at camera pixels. Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only. To tackle this problem of reconstructing dense information from sparse information, we introduce the Evenly-Cascaded convolutional Network (ECN), a bio-inspired multi-level, multi-resolution neural network architecture. The network features an evenly-shaped design, and utilization of both high and low level features. With just 150k parameters, our self-supervised pipeline is able to surpass pipelines that are 100x larger. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation in wild outdoor scenes. Due to the lightweight design, the inference part of the network runs at 250 FPS on a single GPU, making the pipeline ready for realtime robotics applications. Our experiments demonstrate significant improvements upon previous works that used deep learning on event data, as well as the ability of our pipeline to perform well during both day and night. We also extend our pipeline to dynamic indoor scenes with independent moving objects. In addition to camera egomotion and a dense depth map, the network utilizes a mixture model to segment and compute per-object 3D translational velocities for moving objects. For this indoor task we are able to train a shallow network with just 40k parameters, which computes qualitative depth and egomotion. Our analysis of the training shows modern neural networks are trained on tangled signals. This tangling effect can be imagined as a blurring introduced both by nature and by the training process. We propose to untangle the data with network deconvolution. We notice significantly better convergence without using any standard normalization techniques, which suggests us deconvolution is what we need

    Motion Segmentation and Egomotion Estimation with Event-Based Cameras

    Get PDF
    Computer vision has been dominated by classical, CMOS frame-based imaging sensors for many years. Yet, motion is not well represented in classical cameras and vision techniques - a consequence of traditional vision being frame-based and only existing 'in the moment' while motion is a continuous entity. With the introduction of neuromorphic hardware, such as the event-based cameras, we are ready to cross the bridge of frame based vision and develop a new concept - motion-based vision. The event-based sensor provides dense temporal information about changes on the scene - it can ‘see’ the motion at an equivalent of almost infinite framerate, making a perfect fit for creating dense, long term motion trajectories and allowing for a significantly more efficient, generic and at the same time accurate motion perception. By its design, an event-based sensor accommodates a large dynamic range, provides high temporal resolution and low latency -- ideal properties for applications where high quality motion estimation and tolerance towards challenging lighting conditions are desirable. The price for these properties is indeed heavy - event-based sensors produce a lot of noise, their resolution is relatively low and their data - typically referred to as event cloud - is asynchronous and sparse. Event sensors offer new opportunities for robust visual perception so much needed in autonomous robotics, but challenges associated with the sensor output, such as high noise, relatively low spatial resolution and sparsity, ask for different visual processing approaches. In this dissertation we develop methods and frameworks for motion segmentation and egomotion estimation on event-based data, starting with a simple optimization-based approach for camera motion compensation and object tracking and continuing by developing several deep learning pipelines, while continuing to explore the connection between the shapes of the event clouds and scene motion. We collect EV-IMO - the first pixelwise-annotated dataset for motion segmentation for event cameras and propose a 3D graph-based learning approach for motion segmentation in (x, y, t) domain. Finally we develop a set of mathematical constraints for event streams which leverage their temporal density and connect the shape of the event cloud with camera and object motion
    corecore