22 research outputs found
Kernelized Similarity Learning and Embedding for Dynamic Texture Synthesis
Dynamic texture (DT) exhibits statistical stationarity in the spatial domain
and stochastic repetitiveness in the temporal dimension, indicating that
different frames of DT possess a high similarity correlation that is critical
prior knowledge. However, existing methods cannot effectively learn a promising
synthesis model for high-dimensional DT from a small number of training data.
In this paper, we propose a novel DT synthesis method, which makes full use of
similarity prior knowledge to address this issue. Our method bases on the
proposed kernel similarity embedding, which not only can mitigate the
high-dimensionality and small sample issues, but also has the advantage of
modeling nonlinear feature relationship. Specifically, we first raise two
hypotheses that are essential for DT model to generate new frames using
similarity correlation. Then, we integrate kernel learning and extreme learning
machine into a unified synthesis model to learn kernel similarity embedding for
representing DT. Extensive experiments on DT videos collected from the internet
and two benchmark datasets, i.e., Gatech Graphcut Textures and Dyntex,
demonstrate that the learned kernel similarity embedding can effectively
exhibit the discriminative representation for DT. Accordingly, our method is
capable of preserving the long-term temporal continuity of the synthesized DT
sequences with excellent sustainability and generalization. Meanwhile, it
effectively generates realistic DT videos with fast speed and low computation,
compared with the state-of-the-art methods. The code and more synthesis videos
are available at our project page
https://shiming-chen.github.io/Similarity-page/Similarit.html.Comment: 13 pages, 12 figures, 2 table
Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums
With the fast development of AI-related techniques, the applications of
trajectory prediction are no longer limited to easier scenes and trajectories.
More and more heterogeneous trajectories with different representation forms,
such as 2D or 3D coordinates, 2D or 3D bounding boxes, and even
high-dimensional human skeletons, need to be analyzed and forecasted. Among
these heterogeneous trajectories, interactions between different elements
within a frame of trajectory, which we call the ``Dimension-Wise
Interactions'', would be more complex and challenging. However, most previous
approaches focus mainly on a specific form of trajectories, which means these
methods could not be used to forecast heterogeneous trajectories, not to
mention the dimension-wise interaction. Besides, previous methods mostly treat
trajectory prediction as a normal time sequence generation task, indicating
that these methods may require more work to directly analyze agents' behaviors
and social interactions at different temporal scales. In this paper, we bring a
new ``view'' for trajectory prediction to model and forecast trajectories
hierarchically according to different frequency portions from the spectral
domain to learn to forecast trajectories by considering their frequency
responses. Moreover, we try to expand the current trajectory prediction task by
introducing the dimension from ``another view'', thus extending its
application scenarios to heterogeneous trajectories vertically. Finally, we
adopt the bilinear structure to fuse two factors, including the frequency
response and the dimension-wise interaction, to forecast heterogeneous
trajectories via spectrums hierarchically in a generic way. Experiments show
that the proposed model outperforms most state-of-the-art methods on ETH-UCY,
Stanford Drone Dataset and nuScenes with heterogeneous trajectories, including
2D coordinates, 2D and 3D bounding boxes
Semantic-visual Guided Transformer for Few-shot Class-incremental Learning
Few-shot class-incremental learning (FSCIL) has recently attracted extensive
attention in various areas. Existing FSCIL methods highly depend on the
robustness of the feature backbone pre-trained on base classes. In recent
years, different Transformer variants have obtained significant processes in
the feature representation learning of massive fields. Nevertheless, the
progress of the Transformer in FSCIL scenarios has not achieved the potential
promised in other fields so far. In this paper, we develop a semantic-visual
guided Transformer (SV-T) to enhance the feature extracting capacity of the
pre-trained feature backbone on incremental classes. Specifically, we first
utilize the visual (image) labels provided by the base classes to supervise the
optimization of the Transformer. And then, a text encoder is introduced to
automatically generate the corresponding semantic (text) labels for each image
from the base classes. Finally, the constructed semantic labels are further
applied to the Transformer for guiding its hyperparameters updating. Our SV-T
can take full advantage of more supervision information from base classes and
further enhance the training robustness of the feature backbone. More
importantly, our SV-T is an independent method, which can directly apply to the
existing FSCIL architectures for acquiring embeddings of various incremental
classes. Extensive experiments on three benchmarks, two FSCIL architectures,
and two Transformer variants show that our proposed SV-T obtains a significant
improvement in comparison to the existing state-of-the-art FSCIL methods.Comment: Accepted by IEEE International Conference on Multimedia and Expo
(ICME 2023
BGM: Building a Dynamic Guidance Map without Visual Images for Trajectory Prediction
Visual images usually contain the informative context of the environment,
thereby helping to predict agents' behaviors. However, they hardly impose the
dynamic effects on agents' actual behaviors due to the respectively fixed
semantics. To solve this problem, we propose a deterministic model named BGM to
construct a guidance map to represent the dynamic semantics, which circumvents
to use visual images for each agent to reflect the difference of activities in
different periods. We first record all agents' activities in the scene within a
period close to the current to construct a guidance map and then feed it to a
Context CNN to obtain their context features. We adopt a Historical Trajectory
Encoder to extract the trajectory features and then combine them with the
context feature as the input of the social energy based trajectory decoder,
thus obtaining the prediction that meets the social rules. Experiments
demonstrate that BGM achieves state-of-the-art prediction accuracy on the two
widely used ETH and UCY datasets and handles more complex scenarios
Towards Unsupervised Graph Completion Learning on Graphs with Features and Structure Missing
In recent years, graph neural networks (GNN) have achieved significant
developments in a variety of graph analytical tasks. Nevertheless, GNN's
superior performance will suffer from serious damage when the collected node
features or structure relationships are partially missing owning to numerous
unpredictable factors. Recently emerged graph completion learning (GCL) has
received increasing attention, which aims to reconstruct the missing node
features or structure relationships under the guidance of a specifically
supervised task. Although these proposed GCL methods have made great success,
they still exist the following problems: the reliance on labels, the bias of
the reconstructed node features and structure relationships. Besides, the
generalization ability of the existing GCL still faces a huge challenge when
both collected node features and structure relationships are partially missing
at the same time. To solve the above issues, we propose a more general GCL
framework with the aid of self-supervised learning for improving the task
performance of the existing GNN variants on graphs with features and structure
missing, termed unsupervised GCL (UGCL). Specifically, to avoid the mismatch
between missing node features and structure during the message-passing process
of GNN, we separate the feature reconstruction and structure reconstruction and
design its personalized model in turn. Then, a dual contrastive loss on the
structure level and feature level is introduced to maximize the mutual
information of node representations from feature reconstructing and structure
reconstructing paths for providing more supervision signals. Finally, the
reconstructed node features and structure can be applied to the downstream node
classification task. Extensive experiments on eight datasets, three GNN
variants and five missing rates demonstrate the effectiveness of our proposed
method.Comment: Accepted by 23rd IEEE International Conference on Data Mining (ICDM
2023