1,652 research outputs found
CAR-Net: Clairvoyant Attentive Recurrent Network
We present an interpretable framework for path prediction that leverages
dependencies between agents' behaviors and their spatial navigation
environment. We exploit two sources of information: the past motion trajectory
of the agent of interest and a wide top-view image of the navigation scene. We
propose a Clairvoyant Attentive Recurrent Network (CAR-Net) that learns where
to look in a large image of the scene when solving the path prediction task.
Our method can attend to any area, or combination of areas, within the raw
image (e.g., road intersections) when predicting the trajectory of the agent.
This allows us to visualize fine-grained semantic elements of navigation scenes
that influence the prediction of trajectories. To study the impact of space on
agents' trajectories, we build a new dataset made of top-view images of
hundreds of scenes (Formula One racing tracks) where agents' behaviors are
heavily influenced by known areas in the images (e.g., upcoming turns). CAR-Net
successfully attends to these salient regions. Additionally, CAR-Net reaches
state-of-the-art accuracy on the standard trajectory forecasting benchmark,
Stanford Drone Dataset (SDD). Finally, we show CAR-Net's ability to generalize
to unseen scenes.Comment: The 2nd and 3rd authors contributed equall
Context-Aware Trajectory Prediction
Human motion and behaviour in crowded spaces is influenced by several
factors, such as the dynamics of other moving agents in the scene, as well as
the static elements that might be perceived as points of attraction or
obstacles. In this work, we present a new model for human trajectory prediction
which is able to take advantage of both human-human and human-space
interactions. The future trajectory of humans, are generated by observing their
past positions and interactions with the surroundings. To this end, we propose
a "context-aware" recurrent neural network LSTM model, which can learn and
predict human motion in crowded spaces such as a sidewalk, a museum or a
shopping mall. We evaluate our model on a public pedestrian datasets, and we
contribute a new challenging dataset that collects videos of humans that
navigate in a (real) crowded space such as a big museum. Results show that our
approach can predict human trajectories better when compared to previous
state-of-the-art forecasting models.Comment: Submitted to BMVC 201
Modeling Cooperative Navigation in Dense Human Crowds
For robots to be a part of our daily life, they need to be able to navigate
among crowds not only safely but also in a socially compliant fashion. This is
a challenging problem because humans tend to navigate by implicitly cooperating
with one another to avoid collisions, while heading toward their respective
destinations. Previous approaches have used hand-crafted functions based on
proximity to model human-human and human-robot interactions. However, these
approaches can only model simple interactions and fail to generalize for
complex crowded settings. In this paper, we develop an approach that models the
joint distribution over future trajectories of all interacting agents in the
crowd, through a local interaction model that we train using real human
trajectory data. The interaction model infers the velocity of each agent based
on the spatial orientation of other agents in his vicinity. During prediction,
our approach infers the goal of the agent from its past trajectory and uses the
learned model to predict its future trajectory. We demonstrate the performance
of our method against a state-of-the-art approach on a public dataset and show
that our model outperforms when predicting future trajectories for longer
horizons.Comment: Accepted at ICRA 201
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets
In this work, we explore the correlation between people trajectories and
their head orientations. We argue that people trajectory and head pose
forecasting can be modelled as a joint problem. Recent approaches on trajectory
forecasting leverage short-term trajectories (aka tracklets) of pedestrians to
predict their future paths. In addition, sociological cues, such as expected
destination or pedestrian interaction, are often combined with tracklets. In
this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between
positions and head orientations (vislets) thanks to a joint unconstrained
optimization of full covariance matrices during the LSTM backpropagation. We
additionally exploit the head orientations as a proxy for the visual attention,
when modeling social interactions. MX-LSTM predicts future pedestrians location
and head pose, increasing the standard capabilities of the current approaches
on long-term trajectory forecasting. Compared to the state-of-the-art, our
approach shows better performances on an extensive set of public benchmarks.
MX-LSTM is particularly effective when people move slowly, i.e. the most
challenging scenario for all other models. The proposed approach also allows
for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065
Social and Scene-Aware Trajectory Prediction in Crowded Spaces
Mimicking human ability to forecast future positions or interpret complex
interactions in urban scenarios, such as streets, shopping malls or squares, is
essential to develop socially compliant robots or self-driving cars. Autonomous
systems may gain advantage on anticipating human motion to avoid collisions or
to naturally behave alongside people. To foresee plausible trajectories, we
construct an LSTM (long short-term memory)-based model considering three
fundamental factors: people interactions, past observations in terms of
previously crossed areas and semantics of surrounding space. Our model
encompasses several pooling mechanisms to join the above elements defining
multiple tensors, namely social, navigation and semantic tensors. The network
is tested in unstructured environments where complex paths emerge according to
both internal (intentions) and external (other people, not accessible areas)
motivations. As demonstrated, modeling paths unaware of social interactions or
context information, is insufficient to correctly predict future positions.
Experimental results corroborate the effectiveness of the proposed framework in
comparison to LSTM-based models for human path prediction.Comment: Accepted to ICCV 2019 Workshop on Assistive Computer Vision and
Robotics (ACVR
- …