23 research outputs found

    Deep Forward and Inverse Perceptual Models for Tracking and Prediction

    Full text link
    We consider the problems of learning forward models that map state to high-dimensional images and inverse models that map high-dimensional images to state in robotics. Specifically, we present a perceptual model for generating video frames from state with deep networks, and provide a framework for its use in tracking and prediction tasks. We show that our proposed model greatly outperforms standard deconvolutional methods and GANs for image generation, producing clear, photo-realistic images. We also develop a convolutional neural network model for state estimation and compare the result to an Extended Kalman Filter to estimate robot trajectories. We validate all models on a real robotic system.Comment: 8 pages, International Conference on Robotics and Automation (ICRA) 201

    Structure and motion from scene registration

    Get PDF
    We propose a method for estimating the 3D structure and the dense 3D motion (scene flow) of a dynamic nonrigid 3D scene, using a camera array. The core idea is to use a dense multi-camera array to construct a novel, dense 3D volumetric representation of the 3D space where each voxel holds an estimated intensity value and a confidence measure of this value. The problem of 3D structure and 3D motion estimation of a scene is thus reduced to a nonrigid registration of two volumes - hence the term ”Scene Registration”. Registering two dense 3D scalar volumes does not require recovering the 3D structure of the scene as a preprocessing step, nor does it require explicit reasoning about occlusions. From this nonrigid registration we accurately extract the 3D scene flow and the 3D structure of the scene, and successfully recover the sharp discontinuities in both time and space. We demonstrate the advantages of our method on a number of challenging synthetic and real data sets

    YOIO: You Only Iterate Once by mining and fusing multiple necessary global information in the optical flow estimation

    Full text link
    Occlusions pose a significant challenge to optical flow algorithms that even rely on global evidences. We consider an occluded point to be one that is imaged in the reference frame but not in the next. Estimating the motion of these points is extremely difficult, particularly in the two-frame setting. Previous work only used the current frame as the only input, which could not guarantee providing correct global reference information for occluded points, and had problems such as long calculation time and poor accuracy in predicting optical flow at occluded points. To enable both high accuracy and efficiency, We fully mine and utilize the spatiotemporal information provided by the frame pair, design a loopback judgment algorithm to ensure that correct global reference information is obtained, mine multiple necessary global information, and design an efficient refinement module that fuses these global information. Specifically, we propose a YOIO framework, which consists of three main components: an initial flow estimator, a multiple global information extraction module, and a unified refinement module. We demonstrate that optical flow estimates in the occluded regions can be significantly improved in only one iteration without damaging the performance in non-occluded regions. Compared with GMA, the optical flow prediction accuracy of this method in the occluded area is improved by more than 10%, and the occ_out area exceeds 15%, while the calculation time is 27% shorter. This approach, running up to 18.9fps with 436*1024 image resolution, obtains new state-of-the-art results on the challenging Sintel dataset among all published and unpublished approaches that can run in real-time, suggesting a new paradigm for accurate and efficient optical flow estimation.Comment: arXiv admin note: text overlap with arXiv:2104.02409 by other author

    Cycling near misses: A review of the current methods, challenges and the potential of an AI-embedded system

    Get PDF
    Whether for commuting or leisure, cycling is a growing transport mode in many countries. However, cycling is still perceived by many as a dangerous activity. Because the mode share of cycling tends to be low, serious incidents related to cycling are rare. Nevertheless, the fear of getting hit or falling while cycling hinders its expansion as a transport mode and it has been shown that focusing on killed and seriously injured casualties alone only touches the tip of the iceberg. Compared with reported incidents, there are many more incidents in which the person on the bike was destabilised or needed to take action to avoid a crash; so-called near misses. Because of their frequency, data related to near misses can provide much more information about the risk factors associated with cycling. The quality and coverage of this information depends on the method of data collection; from survey data to video data, and processing; from manual to automated. There remains a gap in our understanding of how best to identify and predict near misses and draw statistically significant conclusions, which may lead to better intervention measures and the creation of a safer environment for people on bikes. In this paper, we review the literature on cycling near misses, focusing on the data collection methods adopted, the scope and the risk factors identified. In doing so, we demonstrate that, while many near misses are a result of a combination of different factors that may or may not be transport-related, the current approach of tackling these factors may not be adequate for understanding the interconnections between all risk factors. To address this limitation, we highlight the potential of extracting data using a unified input (images/videos) relying on computer vision methods to automatically extract the wide spectrum of near miss risk factors, in addition to detecting the types of events associated with near misses
    corecore