5 research outputs found
Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums
With the fast development of AI-related techniques, the applications of
trajectory prediction are no longer limited to easier scenes and trajectories.
More and more heterogeneous trajectories with different representation forms,
such as 2D or 3D coordinates, 2D or 3D bounding boxes, and even
high-dimensional human skeletons, need to be analyzed and forecasted. Among
these heterogeneous trajectories, interactions between different elements
within a frame of trajectory, which we call the ``Dimension-Wise
Interactions'', would be more complex and challenging. However, most previous
approaches focus mainly on a specific form of trajectories, which means these
methods could not be used to forecast heterogeneous trajectories, not to
mention the dimension-wise interaction. Besides, previous methods mostly treat
trajectory prediction as a normal time sequence generation task, indicating
that these methods may require more work to directly analyze agents' behaviors
and social interactions at different temporal scales. In this paper, we bring a
new ``view'' for trajectory prediction to model and forecast trajectories
hierarchically according to different frequency portions from the spectral
domain to learn to forecast trajectories by considering their frequency
responses. Moreover, we try to expand the current trajectory prediction task by
introducing the dimension from ``another view'', thus extending its
application scenarios to heterogeneous trajectories vertically. Finally, we
adopt the bilinear structure to fuse two factors, including the frequency
response and the dimension-wise interaction, to forecast heterogeneous
trajectories via spectrums hierarchically in a generic way. Experiments show
that the proposed model outperforms most state-of-the-art methods on ETH-UCY,
Stanford Drone Dataset and nuScenes with heterogeneous trajectories, including
2D coordinates, 2D and 3D bounding boxes
BGM: Building a Dynamic Guidance Map without Visual Images for Trajectory Prediction
Visual images usually contain the informative context of the environment,
thereby helping to predict agents' behaviors. However, they hardly impose the
dynamic effects on agents' actual behaviors due to the respectively fixed
semantics. To solve this problem, we propose a deterministic model named BGM to
construct a guidance map to represent the dynamic semantics, which circumvents
to use visual images for each agent to reflect the difference of activities in
different periods. We first record all agents' activities in the scene within a
period close to the current to construct a guidance map and then feed it to a
Context CNN to obtain their context features. We adopt a Historical Trajectory
Encoder to extract the trajectory features and then combine them with the
context feature as the input of the social energy based trajectory decoder,
thus obtaining the prediction that meets the social rules. Experiments
demonstrate that BGM achieves state-of-the-art prediction accuracy on the two
widely used ETH and UCY datasets and handles more complex scenarios
View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums
Understanding and forecasting future trajectories of agents are critical for
behavior analysis, robot navigation, autonomous cars, and other related
applications. Previous methods mostly treat trajectory prediction as time
sequence generation. Different from them, this work studies agents'
trajectories in a "vertical" view, i.e., modeling and forecasting trajectories
from the spectral domain. Different frequency bands in the trajectory spectrums
could hierarchically reflect agents' motion preferences at different scales.
The low-frequency and high-frequency portions could represent their coarse
motion trends and fine motion variations, respectively. Accordingly, we propose
a hierarchical network V-Net, which contains two sub-networks, to
hierarchically model and predict agents' trajectories with trajectory
spectrums. The coarse-level keypoints estimation sub-network first predicts the
"minimal" spectrums of agents' trajectories on several "key" frequency
portions. Then the fine-level spectrum interpolation sub-network interpolates
the spectrums to reconstruct the final predictions. Experimental results
display the competitiveness and superiority of V-Net on both ETH-UCY
benchmark and the Stanford Drone Dataset.Comment: Accepted to ECCV 202
MSN: Multi-Style Network for Trajectory Prediction
Trajectory prediction aims at forecasting agents' possible future locations
considering their observations along with the video context. It is strongly
required by a lot of autonomous platforms like tracking, detection, robot
navigation, self-driving cars, and many other computer vision applications.
Whether it is agents' internal personality factors, interactive behaviors with
the neighborhood, or the influence of surroundings, all of them might represent
impacts on agents' future plannings. However, many previous methods model and
predict agents' behaviors with the same strategy or the ``single'' feature
distribution, making them challenging to give predictions with sufficient style
differences. This manuscript proposes the Multi-Style Network (MSN), which
utilizes style hypothesis and stylized prediction two sub-networks, to give
agents multi-style predictions in a novel categorical way adaptively. We use
agents' end-point plannings and their interaction context as the basis for the
behavior classification, so as to adaptively learn multiple diverse behavior
styles through a series of style channels in the network. Then, we assume one
by one that the target agents will plan their future behaviors according to
each of these categorized styles, thus utilizing different style channels to
give a series of predictions with significant style differences in parallel.
Experiments show that the proposed MSN outperforms current state-of-the-art
methods up to 10\% - 20\% quantitatively on two widely used datasets, and
presents better multi-style characteristics qualitatively