23 research outputs found
Deep Forward and Inverse Perceptual Models for Tracking and Prediction
We consider the problems of learning forward models that map state to
high-dimensional images and inverse models that map high-dimensional images to
state in robotics. Specifically, we present a perceptual model for generating
video frames from state with deep networks, and provide a framework for its use
in tracking and prediction tasks. We show that our proposed model greatly
outperforms standard deconvolutional methods and GANs for image generation,
producing clear, photo-realistic images. We also develop a convolutional neural
network model for state estimation and compare the result to an Extended Kalman
Filter to estimate robot trajectories. We validate all models on a real robotic
system.Comment: 8 pages, International Conference on Robotics and Automation (ICRA)
201
Structure and motion from scene registration
We propose a method for estimating the 3D structure and the dense 3D motion (scene flow) of a dynamic nonrigid 3D scene, using a camera array. The core idea is to use a dense multi-camera array to construct a novel, dense 3D volumetric representation of the 3D space where each voxel holds an estimated intensity value and a confidence measure of this value. The problem of 3D structure and 3D motion estimation of a scene is thus reduced to a nonrigid registration of two volumes - hence the term ”Scene Registration”. Registering two dense 3D scalar volumes does not require recovering the 3D structure of the scene as a preprocessing step, nor does it require explicit reasoning about occlusions. From this nonrigid registration we accurately extract the 3D scene flow and the 3D structure of the scene, and successfully recover the sharp discontinuities in both time and space. We demonstrate the advantages of our method on a number of challenging synthetic and real data sets
YOIO: You Only Iterate Once by mining and fusing multiple necessary global information in the optical flow estimation
Occlusions pose a significant challenge to optical flow algorithms that even
rely on global evidences. We consider an occluded point to be one that is
imaged in the reference frame but not in the next. Estimating the motion of
these points is extremely difficult, particularly in the two-frame setting.
Previous work only used the current frame as the only input, which could not
guarantee providing correct global reference information for occluded points,
and had problems such as long calculation time and poor accuracy in predicting
optical flow at occluded points. To enable both high accuracy and efficiency,
We fully mine and utilize the spatiotemporal information provided by the frame
pair, design a loopback judgment algorithm to ensure that correct global
reference information is obtained, mine multiple necessary global information,
and design an efficient refinement module that fuses these global information.
Specifically, we propose a YOIO framework, which consists of three main
components: an initial flow estimator, a multiple global information extraction
module, and a unified refinement module. We demonstrate that optical flow
estimates in the occluded regions can be significantly improved in only one
iteration without damaging the performance in non-occluded regions. Compared
with GMA, the optical flow prediction accuracy of this method in the occluded
area is improved by more than 10%, and the occ_out area exceeds 15%, while the
calculation time is 27% shorter. This approach, running up to 18.9fps with
436*1024 image resolution, obtains new state-of-the-art results on the
challenging Sintel dataset among all published and unpublished approaches that
can run in real-time, suggesting a new paradigm for accurate and efficient
optical flow estimation.Comment: arXiv admin note: text overlap with arXiv:2104.02409 by other author
Cycling near misses: A review of the current methods, challenges and the potential of an AI-embedded system
Whether for commuting or leisure, cycling is a growing transport mode in many countries. However, cycling is still perceived by many as a dangerous activity. Because the mode share of cycling tends to be low, serious incidents related to cycling are rare. Nevertheless, the fear of getting hit or falling while cycling hinders its expansion as a transport mode and it has been shown that focusing on killed and seriously injured casualties alone only touches the tip of the iceberg. Compared with reported incidents, there are many more incidents in which the person on the bike was destabilised or needed to take action to avoid a crash; so-called near misses. Because of their frequency, data related to near misses can provide much more information about the risk factors associated with cycling. The quality and coverage of this information depends on the method of data collection; from survey data to video data, and processing; from manual to automated. There remains a gap in our understanding of how best to identify and predict near misses and draw statistically significant conclusions, which may lead to better intervention measures and the creation of a safer environment for people on bikes. In this paper, we review the literature on cycling near misses, focusing on the data collection methods adopted, the scope and the risk factors identified. In doing so, we demonstrate that, while many near misses are a result of a combination of different factors that may or may not be transport-related, the current approach of tackling these factors may not be adequate for understanding the interconnections between all risk factors. To address this limitation, we highlight the potential of extracting data using a unified input (images/videos) relying on computer vision methods to automatically extract the wide spectrum of near miss risk factors, in addition to detecting the types of events associated with near misses