17,383 research outputs found
End-to-end Recurrent Multi-Object Tracking and Trajectory Prediction with Relational Reasoning
The majority of contemporary object-tracking approaches do not model
interactions between objects. This contrasts with the fact that objects' paths
are not independent: a cyclist might abruptly deviate from a previously planned
trajectory in order to avoid colliding with a car. Building upon HART, a neural
class-agnostic single-object tracker, we introduce a multi-object tracking
method MOHART capable of relational reasoning. Importantly, the entire system,
including the understanding of interactions and relations between objects, is
class-agnostic and learned simultaneously in an end-to-end fashion. We explore
a number of relational reasoning architectures and show that
permutation-invariant models outperform non-permutation-invariant alternatives.
We also find that architectures using a single permutation invariant operation
like DeepSets, despite, in theory, being universal function approximators, are
nonetheless outperformed by a more complex architecture based on multi-headed
attention. The latter better accounts for complex physical interactions in a
challenging toy experiment. Further, we find that modelling interactions leads
to consistent performance gains in tracking as well as future trajectory
prediction on three real-world datasets (MOTChallenge, UA-DETRAC, and Stanford
Drone dataset), particularly in the presence of ego-motion, occlusions, crowded
scenes, and faulty sensor inputs
PnPNet: End-to-End Perception and Prediction with Tracking in the Loop
We tackle the problem of joint perception and motion forecasting in the
context of self-driving vehicles. Towards this goal we propose PnPNet, an
end-to-end model that takes as input sequential sensor data, and outputs at
each time step object tracks and their future trajectories. The key component
is a novel tracking module that generates object tracks online from detections
and exploits trajectory level features for motion forecasting. Specifically,
the object tracks get updated at each time step by solving both the data
association problem and the trajectory estimation problem. Importantly, the
whole model is end-to-end trainable and benefits from joint optimization of all
tasks. We validate PnPNet on two large-scale driving datasets, and show
significant improvements over the state-of-the-art with better occlusion
recovery and more accurate future prediction.Comment: CVPR202
FutureMapping: The Computational Structure of Spatial AI Systems
We discuss and predict the evolution of Simultaneous Localisation and Mapping
(SLAM) into a general geometric and semantic `Spatial AI' perception capability
for intelligent embodied devices. A big gap remains between the visual
perception performance that devices such as augmented reality eyewear or
comsumer robots will require and what is possible within the constraints
imposed by real products. Co-design of algorithms, processors and sensors will
be needed. We explore the computational structure of current and future Spatial
AI algorithms and consider this within the landscape of ongoing hardware
developments
Self-Selective Correlation Ship Tracking Method for Smart Ocean System
In recent years, with the development of the marine industry, navigation
environment becomes more complicated. Some artificial intelligence
technologies, such as computer vision, can recognize, track and count the
sailing ships to ensure the maritime security and facilitates the management
for Smart Ocean System. Aiming at the scaling problem and boundary effect
problem of traditional correlation filtering methods, we propose a
self-selective correlation filtering method based on box regression (BRCF). The
proposed method mainly include: 1) A self-selective model with negative samples
mining method which effectively reduces the boundary effect in strengthening
the classification ability of classifier at the same time; 2) A bounding box
regression method combined with a key points matching method for the scale
prediction, leading to a fast and efficient calculation. The experimental
results show that the proposed method can effectively deal with the problem of
ship size changes and background interference. The success rates and precisions
were higher than Discriminative Scale Space Tracking (DSST) by over 8
percentage points on the marine traffic dataset of our laboratory. In terms of
processing speed, the proposed method is higher than DSST by nearly 22 Frames
Per Second (FPS)
Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning
A key challenge for autonomous driving is safe trajectory planning in
cluttered, urban environments with dynamic obstacles, such as pedestrians,
bicyclists, and other vehicles. A reliable prediction of the future
environment, including the behavior of dynamic agents, would allow planning
algorithms to proactively generate a trajectory in response to a rapidly
changing environment. We present a novel framework that predicts the future
occupancy state of the local environment surrounding an autonomous agent by
learning a motion model from occupancy grid data using a neural network. We
take advantage of the temporal structure of the grid data by utilizing a
convolutional long-short term memory network in the form of the PredNet
architecture. This method is validated on the KITTI dataset and demonstrates
higher accuracy and better predictive power than baseline methods.Comment: 8 pages, updated final draft, accepted into Intelligent
Transportation Systems Conference (ITSC) 201
End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks
In this work we present a novel end-to-end framework for tracking and
classifying a robot's surroundings in complex, dynamic and only partially
observable real-world environments. The approach deploys a recurrent neural
network to filter an input stream of raw laser measurements in order to
directly infer object locations, along with their identity in both visible and
occluded areas. To achieve this we first train the network using unsupervised
Deep Tracking, a recently proposed theoretical framework for end-to-end space
occupancy prediction. We show that by learning to track on a large amount of
unsupervised data, the network creates a rich internal representation of its
environment which we in turn exploit through the principle of inductive
transfer of knowledge to perform the task of it's semantic classification. As a
result, we show that only a small amount of labelled data suffices to steer the
network towards mastering this additional task. Furthermore we propose a novel
recurrent neural network architecture specifically tailored to tracking and
semantic classification in real-world robotics applications. We demonstrate the
tracking and classification performance of the method on real-world data
collected at a busy road junction. Our evaluation shows that the proposed
end-to-end framework compares favourably to a state-of-the-art, model-free
tracking solution and that it outperforms a conventional one-shot training
scheme for semantic classification
Underwater Multi-Robot Convoying using Visual Tracking by Detection
We present a robust multi-robot convoying approach that relies on visual
detection of the leading agent, thus enabling target following in unstructured
3-D environments. Our method is based on the idea of tracking-by-detection,
which interleaves efficient model-based object detection with temporal
filtering of image-based bounding box estimation. This approach has the
important advantage of mitigating tracking drift (i.e. drifting away from the
target object), which is a common symptom of model-free trackers and is
detrimental to sustained convoying in practice. To illustrate our solution, we
collected extensive footage of an underwater robot in ocean settings, and
hand-annotated its location in each frame. Based on this dataset, we present an
empirical comparison of multiple tracker variants, including the use of several
convolutional neural networks, both with and without recurrent connections, as
well as frequency-based model-free trackers. We also demonstrate the
practicality of this tracking-by-detection strategy in real-world scenarios by
successfully controlling a legged underwater robot in five degrees of freedom
to follow another robot's independent motion.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), 201
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
Monocular Plan View Networks for Autonomous Driving
Convolutions on monocular dash cam videos capture spatial invariances in the
image plane but do not explicitly reason about distances and depth. We propose
a simple transformation of observations into a bird's eye view, also known as
plan view, for end-to-end control. We detect vehicles and pedestrians in the
first person view and project them into an overhead plan view. This
representation provides an abstraction of the environment from which a deep
network can easily deduce the positions and directions of entities.
Additionally, the plan view enables us to leverage advances in 3D object
detection in conjunction with deep policy learning. We evaluate our monocular
plan view network on the photo-realistic Grand Theft Auto V simulator. A
network using both a plan view and front view causes less than half as many
collisions as previous detection-based methods and an order of magnitude fewer
collisions than pure pixel-based policies.Comment: 8 pages, 9 figure
CARRADA Dataset: Camera and Automotive Radar with Range-Angle-Doppler Annotations
High quality perception is essential for autonomous driving (AD) systems. To
reach the accuracy and robustness that are required by such systems, several
types of sensors must be combined. Currently, mostly cameras and laser scanners
(lidar) are deployed to build a representation of the world around the vehicle.
While radar sensors have been used for a long time in the automotive industry,
they are still under-used for AD despite their appealing characteristics
(notably, their ability to measure the relative speed of obstacles and to
operate even in adverse weather conditions). To a large extent, this situation
is due to the relative lack of automotive datasets with real radar signals that
are both raw and annotated. In this work, we introduce CARRADA, a dataset of
synchronized camera and radar recordings with range-angle-Doppler annotations.
We also present a semi-automatic annotation approach, which was used to
annotate the dataset, and a radar semantic segmentation baseline, which we
evaluate on several metrics. Both our code and dataset are available online.Comment: 8 pages, 5 figues. Accepted at ICPR 2020. Erratum: results in Table
III have been updated since the ICPR proceedings, models are selected using
the PP metric instead of the previously used PR metri
- …