8 research outputs found
Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums
With the fast development of AI-related techniques, the applications of
trajectory prediction are no longer limited to easier scenes and trajectories.
More and more heterogeneous trajectories with different representation forms,
such as 2D or 3D coordinates, 2D or 3D bounding boxes, and even
high-dimensional human skeletons, need to be analyzed and forecasted. Among
these heterogeneous trajectories, interactions between different elements
within a frame of trajectory, which we call the ``Dimension-Wise
Interactions'', would be more complex and challenging. However, most previous
approaches focus mainly on a specific form of trajectories, which means these
methods could not be used to forecast heterogeneous trajectories, not to
mention the dimension-wise interaction. Besides, previous methods mostly treat
trajectory prediction as a normal time sequence generation task, indicating
that these methods may require more work to directly analyze agents' behaviors
and social interactions at different temporal scales. In this paper, we bring a
new ``view'' for trajectory prediction to model and forecast trajectories
hierarchically according to different frequency portions from the spectral
domain to learn to forecast trajectories by considering their frequency
responses. Moreover, we try to expand the current trajectory prediction task by
introducing the dimension from ``another view'', thus extending its
application scenarios to heterogeneous trajectories vertically. Finally, we
adopt the bilinear structure to fuse two factors, including the frequency
response and the dimension-wise interaction, to forecast heterogeneous
trajectories via spectrums hierarchically in a generic way. Experiments show
that the proposed model outperforms most state-of-the-art methods on ETH-UCY,
Stanford Drone Dataset and nuScenes with heterogeneous trajectories, including
2D coordinates, 2D and 3D bounding boxes
BGM: Building a Dynamic Guidance Map without Visual Images for Trajectory Prediction
Visual images usually contain the informative context of the environment,
thereby helping to predict agents' behaviors. However, they hardly impose the
dynamic effects on agents' actual behaviors due to the respectively fixed
semantics. To solve this problem, we propose a deterministic model named BGM to
construct a guidance map to represent the dynamic semantics, which circumvents
to use visual images for each agent to reflect the difference of activities in
different periods. We first record all agents' activities in the scene within a
period close to the current to construct a guidance map and then feed it to a
Context CNN to obtain their context features. We adopt a Historical Trajectory
Encoder to extract the trajectory features and then combine them with the
context feature as the input of the social energy based trajectory decoder,
thus obtaining the prediction that meets the social rules. Experiments
demonstrate that BGM achieves state-of-the-art prediction accuracy on the two
widely used ETH and UCY datasets and handles more complex scenarios
A Comprehensive Survey on Data-Efficient GANs in Image Generation
Generative Adversarial Networks (GANs) have achieved remarkable achievements
in image synthesis. These successes of GANs rely on large scale datasets,
requiring too much cost. With limited training data, how to stable the training
process of GANs and generate realistic images have attracted more attention.
The challenges of Data-Efficient GANs (DE-GANs) mainly arise from three
aspects: (i) Mismatch Between Training and Target Distributions, (ii)
Overfitting of the Discriminator, and (iii) Imbalance Between Latent and Data
Spaces. Although many augmentation and pre-training strategies have been
proposed to alleviate these issues, there lacks a systematic survey to
summarize the properties, challenges, and solutions of DE-GANs. In this paper,
we revisit and define DE-GANs from the perspective of distribution
optimization. We conclude and analyze the challenges of DE-GANs. Meanwhile, we
propose a taxonomy, which classifies the existing methods into three
categories: Data Selection, GANs Optimization, and Knowledge Sharing. Last but
not the least, we attempt to highlight the current problems and the future
directions.Comment: Under revie
TODE-Trans: Transparent Object Depth Estimation with Transformer
Transparent objects are widely used in industrial automation and daily life.
However, robust visual recognition and perception of transparent objects have
always been a major challenge. Currently, most commercial-grade depth cameras
are still not good at sensing the surfaces of transparent objects due to the
refraction and reflection of light. In this work, we present a
transformer-based transparent object depth estimation approach from a single
RGB-D input. We observe that the global characteristics of the transformer make
it easier to extract contextual information to perform depth estimation of
transparent areas. In addition, to better enhance the fine-grained features, a
feature fusion module (FFM) is designed to assist coherent prediction. Our
empirical evidence demonstrates that our model delivers significant
improvements in recent popular datasets, e.g., 25% gain on RMSE and 21% gain on
REL compared to previous state-of-the-art convolutional-based counterparts in
ClearGrasp dataset. Extensive results show that our transformer-based model
enables better aggregation of the object's RGB and inaccurate depth information
to obtain a better depth representation. Our code and the pre-trained model
will be available at https://github.com/yuchendoudou/TODE.Comment: Submitted to ICRA202
View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums
Understanding and forecasting future trajectories of agents are critical for
behavior analysis, robot navigation, autonomous cars, and other related
applications. Previous methods mostly treat trajectory prediction as time
sequence generation. Different from them, this work studies agents'
trajectories in a "vertical" view, i.e., modeling and forecasting trajectories
from the spectral domain. Different frequency bands in the trajectory spectrums
could hierarchically reflect agents' motion preferences at different scales.
The low-frequency and high-frequency portions could represent their coarse
motion trends and fine motion variations, respectively. Accordingly, we propose
a hierarchical network V-Net, which contains two sub-networks, to
hierarchically model and predict agents' trajectories with trajectory
spectrums. The coarse-level keypoints estimation sub-network first predicts the
"minimal" spectrums of agents' trajectories on several "key" frequency
portions. Then the fine-level spectrum interpolation sub-network interpolates
the spectrums to reconstruct the final predictions. Experimental results
display the competitiveness and superiority of V-Net on both ETH-UCY
benchmark and the Stanford Drone Dataset.Comment: Accepted to ECCV 202
MSN: Multi-Style Network for Trajectory Prediction
Trajectory prediction aims at forecasting agents' possible future locations
considering their observations along with the video context. It is strongly
required by a lot of autonomous platforms like tracking, detection, robot
navigation, self-driving cars, and many other computer vision applications.
Whether it is agents' internal personality factors, interactive behaviors with
the neighborhood, or the influence of surroundings, all of them might represent
impacts on agents' future plannings. However, many previous methods model and
predict agents' behaviors with the same strategy or the ``single'' feature
distribution, making them challenging to give predictions with sufficient style
differences. This manuscript proposes the Multi-Style Network (MSN), which
utilizes style hypothesis and stylized prediction two sub-networks, to give
agents multi-style predictions in a novel categorical way adaptively. We use
agents' end-point plannings and their interaction context as the basis for the
behavior classification, so as to adaptively learn multiple diverse behavior
styles through a series of style channels in the network. Then, we assume one
by one that the target agents will plan their future behaviors according to
each of these categorized styles, thus utilizing different style channels to
give a series of predictions with significant style differences in parallel.
Experiments show that the proposed MSN outperforms current state-of-the-art
methods up to 10\% - 20\% quantitatively on two widely used datasets, and
presents better multi-style characteristics qualitatively