75 research outputs found
Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation
We propose a novel method based on teacher-student learning framework for 3D
human pose estimation without any 3D annotation or side information. To solve
this unsupervised-learning problem, the teacher network adopts
pose-dictionary-based modeling for regularization to estimate a physically
plausible 3D pose. To handle the decomposition ambiguity in the teacher
network, we propose a cycle-consistent architecture promoting a 3D
rotation-invariant property to train the teacher network. To further improve
the estimation accuracy, the student network adopts a novel graph convolution
network for flexibility to directly estimate the 3D coordinates. Another
cycle-consistent architecture promoting 3D rotation-equivariant property is
adopted to exploit geometry consistency, together with knowledge distillation
from the teacher network to improve the pose estimation performance. We conduct
extensive experiments on Human3.6M and MPI-INF-3DHP. Our method reduces the 3D
joint prediction error by 11.4% compared to state-of-the-art unsupervised
methods and also outperforms many weakly-supervised methods that use side
information on Human3.6M. Code will be available at
https://github.com/sjtuxcx/ITES.Comment: Accepted in AAAI 202
FedDisco: Federated Learning with Discrepancy-Aware Collaboration
This work considers the category distribution heterogeneity in federated
learning. This issue is due to biased labeling preferences at multiple clients
and is a typical setting of data heterogeneity. To alleviate this issue, most
previous works consider either regularizing local models or fine-tuning the
global model, while they ignore the adjustment of aggregation weights and
simply assign weights based on the dataset size. However, based on our
empirical observations and theoretical analysis, we find that the dataset size
is not optimal and the discrepancy between local and global category
distributions could be a beneficial and complementary indicator for determining
aggregation weights. We thus propose a novel aggregation method, Federated
Learning with Discrepancy-aware Collaboration (FedDisco), whose aggregation
weights not only involve both the dataset size and the discrepancy value, but
also contribute to a tighter theoretical upper bound of the optimization error.
FedDisco also promotes privacy-preservation, communication and computation
efficiency, as well as modularity. Extensive experiments show that our FedDisco
outperforms several state-of-the-art methods and can be easily incorporated
with many existing methods to further enhance the performance. Our code will be
available at https://github.com/MediaBrain-SJTU/FedDisco.Comment: Accepted by International Conference on Machine Learning (ICML2023
ADD: An Automatic Desensitization Fisheye Dataset for Autonomous Driving
Autonomous driving systems require many images for analyzing the surrounding
environment. However, there is fewer data protection for private information
among these captured images, such as pedestrian faces or vehicle license
plates, which has become a significant issue. In this paper, in response to the
call for data security laws and regulations and based on the advantages of
large Field of View(FoV) of the fisheye camera, we build the first Autopilot
Desensitization Dataset, called ADD, and formulate the first
deep-learning-based image desensitization framework, to promote the study of
image desensitization in autonomous driving scenarios. The compiled dataset
consists of 650K images, including different face and vehicle license plate
information captured by the surround-view fisheye camera. It covers various
autonomous driving scenarios, including diverse facial characteristics and
license plate colors. Then, we propose an efficient multitask desensitization
network called DesCenterNet as a benchmark on the ADD dataset, which can
perform face and vehicle license plate detection and desensitization tasks.
Based on ADD, we further provide an evaluation criterion for desensitization
performance, and extensive comparison experiments have verified the
effectiveness and superiority of our method on image desensitization
Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning
Decentralized and lifelong-adaptive multi-agent collaborative learning aims
to enhance collaboration among multiple agents without a central server, with
each agent solving varied tasks over time. To achieve efficient collaboration,
agents should: i) autonomously identify beneficial collaborative relationships
in a decentralized manner; and ii) adapt to dynamically changing task
observations. In this paper, we propose DeLAMA, a decentralized multi-agent
lifelong collaborative learning algorithm with dynamic collaboration graphs. To
promote autonomous collaboration relationship learning, we propose a
decentralized graph structure learning algorithm, eliminating the need for
external priors. To facilitate adaptation to dynamic tasks, we design a memory
unit to capture the agents' accumulated learning history and knowledge, while
preserving finite storage consumption. To further augment the system's
expressive capabilities and computational efficiency, we apply algorithm
unrolling, leveraging the advantages of both mathematical optimization and
neural networks. This allows the agents to `learn to collaborate' through the
supervision of training tasks. Our theoretical analysis verifies that
inter-agent collaboration is communication efficient under a small number of
communication rounds. The experimental results verify its ability to facilitate
the discovery of collaboration strategies and adaptation to dynamic learning
scenarios, achieving a 98.80% reduction in MSE and a 188.87% improvement in
classification accuracy. We expect our work can serve as a foundational
technique to facilitate future works towards an intelligent, decentralized, and
dynamic multi-agent system. Code is available at
https://github.com/ShuoTang123/DeLAMA.Comment: 23 pages, 15 figure
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
Exploring spatial-temporal dependencies from observed motions is one of the
core challenges of human motion prediction. Previous methods mainly focus on
dedicated network structures to model the spatial and temporal dependencies.
This paper considers a new direction by introducing a model learning framework
with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates
are corrupted by either masking or adding noise and the goal is to recover
corrupted coordinates depending on the rest coordinates. To work with auxiliary
tasks, we propose a novel auxiliary-adapted transformer, which can handle
incomplete, corrupted motion data and achieve coordinate recovery via capturing
spatial-temporal dependencies. Through auxiliary tasks, the auxiliary-adapted
transformer is promoted to capture more comprehensive spatial-temporal
dependencies among body joints' coordinates, leading to better feature
learning. Extensive experimental results have shown that our method outperforms
state-of-the-art methods by remarkable margins of 7.2%, 3.7%, and 9.4% in terms
of 3D mean per joint position error (MPJPE) on the Human3.6M, CMU Mocap, and
3DPW datasets, respectively. We also demonstrate that our method is more robust
under data missing cases and noisy data cases. Code is available at
https://github.com/MediaBrain-SJTU/AuxFormer.Comment: Accpeted to ICCV202
EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
Learning to predict agent motions with relationship reasoning is important
for many applications. In motion prediction tasks, maintaining motion
equivariance under Euclidean geometric transformations and invariance of agent
interaction is a critical and fundamental principle. However, such equivariance
and invariance properties are overlooked by most existing methods. To fill this
gap, we propose EqMotion, an efficient equivariant motion prediction model with
invariant interaction reasoning. To achieve motion equivariance, we propose an
equivariant geometric feature learning module to learn a Euclidean
transformable feature through dedicated designs of equivariant operations. To
reason agent's interactions, we propose an invariant interaction reasoning
module to achieve a more stable interaction modeling. To further promote more
comprehensive motion features, we propose an invariant pattern feature learning
module to learn an invariant pattern feature, which cooperates with the
equivariant geometric feature to enhance network expressiveness. We conduct
experiments for the proposed model on four distinct scenarios: particle
dynamics, molecule dynamics, human skeleton motion prediction and pedestrian
trajectory prediction. Experimental results show that our method is not only
generally applicable, but also achieves state-of-the-art prediction
performances on all the four tasks, improving by 24.0/30.1/8.6/9.2%. Code is
available at https://github.com/MediaBrain-SJTU/EqMotion.Comment: Accepted to CVPR 202
Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting
In multi-modal multi-agent trajectory forecasting, two major challenges have
not been fully tackled: 1) how to measure the uncertainty brought by the
interaction module that causes correlations among the predicted trajectories of
multiple agents; 2) how to rank the multiple predictions and select the optimal
predicted trajectory. In order to handle these challenges, this work first
proposes a novel concept, collaborative uncertainty (CU), which models the
uncertainty resulting from interaction modules. Then we build a general
CU-aware regression framework with an original permutation-equivariant
uncertainty estimator to do both tasks of regression and uncertainty
estimation. Further, we apply the proposed framework to current SOTA
multi-agent multi-modal forecasting systems as a plugin module, which enables
the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal
trajectory forecasting task; 2) rank the multiple predictions and select the
optimal one based on the estimated uncertainty. We conduct extensive
experiments on a synthetic dataset and two public large-scale multi-agent
trajectory forecasting benchmarks. Experiments show that: 1) on the synthetic
dataset, the CU-aware regression framework allows the model to appropriately
approximate the ground-truth Laplace distribution; 2) on the multi-agent
trajectory forecasting benchmarks, the CU-aware regression framework steadily
helps SOTA systems improve their performances. Specially, the proposed
framework helps VectorNet improve by 262 cm regarding the Final Displacement
Error of the chosen optimal prediction on the nuScenes dataset; 3) for
multi-agent multi-modal trajectory forecasting systems, prediction uncertainty
is positively correlated with future stochasticity; and 4) the estimated CU
values are highly related to the interactive information among agents.Comment: arXiv admin note: text overlap with arXiv:2110.1394
- …