529 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Non-Parametric Learning for Monocular Visual Odometry

    Get PDF
    This thesis addresses the problem of incremental localization from visual information, a scenario commonly known as visual odometry. Current visual odometry algorithms are heavily dependent on camera calibration, using a pre-established geometric model to provide the transformation between input (optical flow estimates) and output (vehicle motion estimates) information. A novel approach to visual odometry is proposed in this thesis where the need for camera calibration, or even for a geometric model, is circumvented by the use of machine learning principles and techniques. A non-parametric Bayesian regression technique, the Gaussian Process (GP), is used to elect the most probable transformation function hypothesis from input to output, based on training data collected prior and during navigation. Other than eliminating the need for a geometric model and traditional camera calibration, this approach also allows for scale recovery even in a monocular configuration, and provides a natural treatment of uncertainties due to the probabilistic nature of GPs. Several extensions to the traditional GP framework are introduced and discussed in depth, and they constitute the core of the contributions of this thesis to the machine learning and robotics community. The proposed framework is tested in a wide variety of scenarios, ranging from urban and off-road ground vehicles to unconstrained 3D unmanned aircrafts. The results show a significant improvement over traditional visual odometry algorithms, and also surpass results obtained using other sensors, such as laser scanners and IMUs. The incorporation of these results to a SLAM scenario, using a Exact Sparse Information Filter (ESIF), is shown to decrease global uncertainty by exploiting revisited areas of the environment. Finally, a technique for the automatic segmentation of dynamic objects is presented, as a way to increase the robustness of image information and further improve visual odometry results

    Non-Parametric Learning for Monocular Visual Odometry

    Get PDF
    This thesis addresses the problem of incremental localization from visual information, a scenario commonly known as visual odometry. Current visual odometry algorithms are heavily dependent on camera calibration, using a pre-established geometric model to provide the transformation between input (optical flow estimates) and output (vehicle motion estimates) information. A novel approach to visual odometry is proposed in this thesis where the need for camera calibration, or even for a geometric model, is circumvented by the use of machine learning principles and techniques. A non-parametric Bayesian regression technique, the Gaussian Process (GP), is used to elect the most probable transformation function hypothesis from input to output, based on training data collected prior and during navigation. Other than eliminating the need for a geometric model and traditional camera calibration, this approach also allows for scale recovery even in a monocular configuration, and provides a natural treatment of uncertainties due to the probabilistic nature of GPs. Several extensions to the traditional GP framework are introduced and discussed in depth, and they constitute the core of the contributions of this thesis to the machine learning and robotics community. The proposed framework is tested in a wide variety of scenarios, ranging from urban and off-road ground vehicles to unconstrained 3D unmanned aircrafts. The results show a significant improvement over traditional visual odometry algorithms, and also surpass results obtained using other sensors, such as laser scanners and IMUs. The incorporation of these results to a SLAM scenario, using a Exact Sparse Information Filter (ESIF), is shown to decrease global uncertainty by exploiting revisited areas of the environment. Finally, a technique for the automatic segmentation of dynamic objects is presented, as a way to increase the robustness of image information and further improve visual odometry results

    Fully Onboard AI-powered Human-Drone Pose Estimation on Ultra-low Power Autonomous Flying Nano-UAVs

    Get PDF
    Many emerging applications of nano-sized unmanned aerial vehicles (UAVs), with a few form-factor, revolve around safely interacting with humans in complex scenarios, for example, monitoring their activities or looking after people needing care. Such sophisticated autonomous functionality must be achieved while dealing with severe constraints in payload, battery, and power budget ( 100). In this work, we attack a complex task going from perception to control: to estimate and maintain the nano-UAV’s relative 3D pose with respect to a person while they freely move in the environment – a task that, to the best of our knowledge, has never previously been targeted with fully onboard computation on a nano-sized UAV. Our approach is centered around a novel vision-based deep neural network (DNN), called PULP-Frontnet, designed for deployment on top of a parallel ultra-low-power (PULP) processor aboard a nano-UAV. We present a vertically integrated approach starting from the DNN model design, training, and dataset augmentation down to 8-bit quantization and deployment in-field. PULP-Frontnet can operate in real-time (up to 135frame/), consuming less than 87 for processing at peak throughput and down to 0.43/frame in the most energy-efficient operating point. Field experiments demonstrate a closed-loop top-notch autonomous navigation capability, with a tiny 27-grams Crazyflie 2.1 nano-UAV. Compared against an ideal sensing setup, onboard pose inference yields excellent drone behavior in terms of median absolute errors, such as positional (onboard: 41, ideal: 26) and angular (onboard: 3.7, ideal: 4.1). We publicly release videos and the source code of our work

    Fully Onboard AI-Powered Human-Drone Pose Estimation on Ultralow-Power Autonomous Flying Nano-UAVs

    Get PDF
    Many emerging applications of nano-sized unmanned aerial vehicles (UAVs), with a few cm(2) form-factor, revolve around safely interacting with humans in complex scenarios, for example, monitoring their activities or looking after people needing care. Such sophisticated autonomous functionality must be achieved while dealing with severe constraints in payload, battery, and power budget (similar to 100 mW). In this work, we attack a complex task going from perception to control: to estimate and maintain the nano-UAV's relative 3-D pose with respect to a person while they freely move in the environment-a task that, to the best of our knowledge, has never previously been targeted with fully onboard computation on a nano-sized UAV. Our approach is centered around a novel vision-based deep neural network (DNN), called Frontnet, designed for deployment on top of a parallel ultra-low power (PULP) processor aboard a nano-UAV. We present a vertically integrated approach starting from the DNN model design, training, and dataset augmentation down to 8-bit quantization and deployment in-field. PULP-Frontnet can operate in real-time (up to 135 frame/s), consuming less than 87 mW for processing at peak throughput and down to 0.43 mJ/frame in the most energy-efficient operating point. Field experiments demonstrate a closed-loop top-notch autonomous navigation capability, with a tiny 27-g Crazyflie 2.1 nano-UAV. Compared against an ideal sensing setup, onboard pose inference yields excellent drone behavior in terms of median absolute errors, such as positional (onboard: 41 cm, ideal: 26 cm) and angular (onboard: 3.7 degrees, ideal: 4.1 degrees). We publicly release videos and the source code of our work

    Online Structured Learning for Real-Time Computer Vision Gaming Applications

    Get PDF
    In recent years computer vision has played an increasingly important role in the development of computer games, and it now features as one of the core technologies for many gaming platforms. The work in this thesis addresses three problems in real-time computer vision, all of which are motivated by their potential application to computer games. We rst present an approach for real-time 2D tracking of arbitrary objects. In common with recent research in this area we incorporate online learning to provide an appearance model which is able to adapt to the target object and its surrounding background during tracking. However, our approach moves beyond the standard framework of tracking using binary classication and instead integrates tracking and learning in a more principled way through the use of structured learning. As well as providing a more powerful framework for adaptive visual object tracking, our approach also outperforms state-of-the-art tracking algorithms on standard datasets. Next we consider the task of keypoint-based object tracking. We take the traditional pipeline of matching keypoints followed by geometric verication and show how this can be embedded into a structured learning framework in order to provide principled adaptivity to a given environment. We also propose an approximation method allowing us to take advantage of recently developed binary image descriptors, meaning our approach is suitable for real-time application even on low-powered portable devices. Experimentally, we clearly see the benet that online adaptation using structured learning can bring to this problem. Finally, we present an approach for approximately recovering the dense 3D structure of a scene which has been mapped by a simultaneous localisation and mapping system. Our approach is guided by the constraints of the low-powered portable hardware we are targeting, and we develop a system which coarsely models the scene using a small number of planes. To achieve this, we frame the task as a structured prediction problem and introduce online learning into our approach to provide adaptivity to a given scene. This allows us to use relatively simple multi-view information coupled with online learning of appearance to efficiently produce coarse reconstructions of a scene

    Combined Learned and Classical Methods for Real-Time Visual Perception in Autonomous Driving

    Full text link
    Autonomy, robotics, and Artificial Intelligence (AI) are among the main defining themes of next-generation societies. Of the most important applications of said technologies is driving automation which spans from different Advanced Driver Assistance Systems (ADAS) to full self-driving vehicles. Driving automation is promising to reduce accidents, increase safety, and increase access to mobility for more people such as the elderly and the handicapped. However, one of the main challenges facing autonomous vehicles is robust perception which can enable safe interaction and decision making. With so many sensors to perceive the environment, each with its own capabilities and limitations, vision is by far one of the main sensing modalities. Cameras are cheap and can provide rich information of the observed scene. Therefore, this dissertation develops a set of visual perception algorithms with a focus on autonomous driving as the target application area. This dissertation starts by addressing the problem of real-time motion estimation of an agent using only the visual input from a camera attached to it, a problem known as visual odometry. The visual odometry algorithm can achieve low drift rates over long-traveled distances. This is made possible through the innovative local mapping approach used. This visual odometry algorithm was then combined with my multi-object detection and tracking system. The tracking system operates in a tracking-by-detection paradigm where an object detector based on convolution neural networks (CNNs) is used. Therefore, the combined system can detect and track other traffic participants both in image domain and in 3D world frame while simultaneously estimating vehicle motion. This is a necessary requirement for obstacle avoidance and safe navigation. Finally, the operational range of traditional monocular cameras was expanded with the capability to infer depth and thus replace stereo and RGB-D cameras. This is accomplished through a single-stream convolution neural network which can output both depth prediction and semantic segmentation. Semantic segmentation is the process of classifying each pixel in an image and is an important step toward scene understanding. Literature survey, algorithms descriptions, and comprehensive evaluations on real-world datasets are presented.Ph.D.College of Engineering & Computer ScienceUniversity of Michiganhttps://deepblue.lib.umich.edu/bitstream/2027.42/153989/1/Mohamed Aladem Final Dissertation.pdfDescription of Mohamed Aladem Final Dissertation.pdf : Dissertatio
    • …
    corecore