1,635 research outputs found
Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty
Progress towards advanced systems for assisted and autonomous driving is
leveraging recent advances in recognition and segmentation methods. Yet, we are
still facing challenges in bringing reliable driving to inner cities, as those
are composed of highly dynamic scenes observed from a moving platform at
considerable speeds. Anticipation becomes a key element in order to react
timely and prevent accidents. In this paper we argue that it is necessary to
predict at least 1 second and we thus propose a new model that jointly predicts
ego motion and people trajectories over such large time horizons. We pay
particular attention to modeling the uncertainty of our estimates arising from
the non-deterministic nature of natural traffic scenes. Our experimental
results show that it is indeed possible to predict people trajectories at the
desired time horizons and that our uncertainty estimates are informative of the
prediction error. We also show that both sequence modeling of trajectories as
well as our novel method of long term odometry prediction are essential for
best performance.Comment: CVPR 201
VIENA2: A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before
the action is finalized. This is, for instance, the case in automated driving,
where a car needs to, e.g., avoid hitting pedestrians and respect traffic
lights. While solutions have been proposed to tackle subsets of the driving
anticipation tasks, by making use of diverse, task-specific sensors, there is
no single dataset or framework that addresses them all in a consistent manner.
In this paper, we therefore introduce a new, large-scale dataset, called
VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct
action classes. It contains more than 15K full HD, 5s long videos acquired in
various driving conditions, weathers, daytimes and environments, complemented
with a common and realistic set of sensor measurements. This amounts to more
than 2.25M frames, each annotated with an action label, corresponding to 600
samples per action class. We discuss our data acquisition strategy and the
statistics of our dataset, and benchmark state-of-the-art action anticipation
techniques, including a new multi-modal LSTM architecture with an effective
loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets
In this work, we explore the correlation between people trajectories and
their head orientations. We argue that people trajectory and head pose
forecasting can be modelled as a joint problem. Recent approaches on trajectory
forecasting leverage short-term trajectories (aka tracklets) of pedestrians to
predict their future paths. In addition, sociological cues, such as expected
destination or pedestrian interaction, are often combined with tracklets. In
this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between
positions and head orientations (vislets) thanks to a joint unconstrained
optimization of full covariance matrices during the LSTM backpropagation. We
additionally exploit the head orientations as a proxy for the visual attention,
when modeling social interactions. MX-LSTM predicts future pedestrians location
and head pose, increasing the standard capabilities of the current approaches
on long-term trajectory forecasting. Compared to the state-of-the-art, our
approach shows better performances on an extensive set of public benchmarks.
MX-LSTM is particularly effective when people move slowly, i.e. the most
challenging scenario for all other models. The proposed approach also allows
for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065
Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
Concept bottleneck models have been successfully used for explainable machine
learning by encoding information within the model with a set of human-defined
concepts. In the context of human-assisted or autonomous driving,
explainability models can help user acceptance and understanding of decisions
made by the autonomous vehicle, which can be used to rationalize and explain
driver or vehicle behavior. We propose a new approach using concept bottlenecks
as visual features for control command predictions and explanations of user and
vehicle behavior. We learn a human-understandable concept layer that we use to
explain sequential driving scenes while learning vehicle control commands. This
approach can then be used to determine whether a change in a preferred gap or
steering commands from a human (or autonomous vehicle) is led by an external
stimulus or change in preferences. We achieve competitive performance to latent
visual features while gaining interpretability within our model setup
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Unsupervised Event-based Learning of Optical Flow, Depth, and Egomotion
In this work, we propose a novel framework for unsupervised learning for
event cameras that learns motion information from only the event stream. In
particular, we propose an input representation of the events in the form of a
discretized volume that maintains the temporal distribution of the events,
which we pass through a neural network to predict the motion of the events.
This motion is used to attempt to remove any motion blur in the event image. We
then propose a loss function applied to the motion compensated event image that
measures the motion blur in this image. We train two networks with this
framework, one to predict optical flow, and one to predict egomotion and
depths, and evaluate these networks on the Multi Vehicle Stereo Event Camera
dataset, along with qualitative results from a variety of different scenes.Comment: 9 pages, 7 figure
- …