80 research outputs found

    Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations

    Full text link
    Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons' structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD.Comment: Accepted by AAAI 2023. Project page: https://jhang2020.github.io/Projects/HiCLR/HiCLR.htm

    Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning

    Full text link
    Self-supervised learning has proved effective for skeleton-based human action understanding, which is an important yet challenging topic. Previous works mainly rely on contrastive learning or masked motion modeling paradigm to model the skeleton relations. However, the sequence-level and joint-level representation learning cannot be effectively and simultaneously handled by these methods. As a result, the learned representations fail to generalize to different downstream tasks. Moreover, combining these two paradigms in a naive manner leaves the synergy between them untapped and can lead to interference in training. To address these problems, we propose Prompted Contrast with Masked Motion Modeling, PCM3^{\rm 3}, for versatile 3D action representation learning. Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner, which substantially boosts the generalization capacity for various downstream tasks. Specifically, masked prediction provides novel training views for contrastive learning, which in turn guides the masked prediction training with high-level semantic information. Moreover, we propose a dual-prompted multi-task pretraining strategy, which further improves model representations by reducing the interference caused by learning the two different pretext tasks. Extensive experiments on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM3^{\rm 3} compared to the state-of-the-art works. Our project is publicly available at: https://jhang2020.github.io/Projects/PCM3/PCM3.html .Comment: Accepted by ACM Multimedia 202

    A deep learning framework based on Koopman operator for data-driven modeling of vehicle dynamics

    Full text link
    Autonomous vehicles and driving technologies have received notable attention in the past decades. In autonomous driving systems, \textcolor{black}{the} information of vehicle dynamics is required in most cases for designing of motion planning and control algorithms. However, it is nontrivial for identifying a global model of vehicle dynamics due to the existence of strong non-linearity and uncertainty. Many efforts have resorted to machine learning techniques for building data-driven models, but it may suffer from interpretability and result in a complex nonlinear representation. In this paper, we propose a deep learning framework relying on an interpretable Koopman operator to build a data-driven predictor of the vehicle dynamics. The main idea is to use the Koopman operator for representing the nonlinear dynamics in a linear lifted feature space. The approach results in a global model that integrates the dynamics in both longitudinal and lateral directions. As the core contribution, we propose a deep learning-based extended dynamic mode decomposition (Deep EDMD) algorithm to learn a finite approximation of the Koopman operator. Different from other machine learning-based approaches, deep neural networks play the role of learning feature representations for EDMD in the framework of the Koopman operator. Simulation results in a high-fidelity CarSim environment are reported, which show the capability of the Deep EDMD approach in multi-step prediction of vehicle dynamics at a wide operating range. Also, the proposed approach outperforms the EDMD method, the multi-layer perception (MLP) method, and the Extreme Learning Machines-based EDMD (ELM-EDMD) method in terms of modeling performance. Finally, we design a linear MPC with Deep EDMD (DE-MPC) for realizing reference tracking and test the controller in the CarSim environment.Comment: 12 pages, 10 figures, 1 table, and 2 algorithm

    S5^{5}Mars: Semi-Supervised Learning for Mars Semantic Segmentation

    Full text link
    Deep learning has become a powerful tool for Mars exploration. Mars terrain semantic segmentation is an important Martian vision task, which is the base of rover autonomous planning and safe driving. However, there is a lack of sufficient detailed and high-confidence data annotations, which are exactly required by most deep learning methods to obtain a good model. To address this problem, we propose our solution from the perspective of joint data and method design. We first present a newdataset S5Mars for Semi-SuperviSed learning on Mars Semantic Segmentation, which contains 6K high-resolution images and is sparsely annotated based on confidence, ensuring the high quality of labels. Then to learn from this sparse data, we propose a semi-supervised learning (SSL) framework for Mars image semantic segmentation, to learn representations from limited labeled data. Different from the existing SSL methods which are mostly targeted at the Earth image data, our method takes into account Mars data characteristics. Specifically, we first investigate the impact of current widely used natural image augmentations on Mars images. Based on the analysis, we then proposed two novel and effective augmentations for SSL of Mars segmentation, AugIN and SAM-Mix, which serve as strong augmentations to boost the model performance. Meanwhile, to fully leverage the unlabeled data, we introduce a soft-to-hard consistency learning strategy, learning from different targets based on prediction confidence. Experimental results show that our method can outperform state-of-the-art SSL approaches remarkably. Our proposed dataset is available at https://jhang2020.github.io/S5Mars.github.io/

    Learning-based Predictive Control for Nonlinear Systems with Unknown Dynamics Subject to Safety Constraints

    Full text link
    Model predictive control (MPC) has been widely employed as an effective method for model-based constrained control. For systems with unknown dynamics, reinforcement learning (RL) and adaptive dynamic programming (ADP) have received notable attention to solve the adaptive optimal control problems. Recently, works on the use of RL in the framework of MPC have emerged, which can enhance the ability of MPC for data-driven control. However, the safety under state constraints and the closed-loop robustness are difficult to be verified due to approximation errors of RL with function approximation structures. Aiming at the above problem, we propose a data-driven robust MPC solution based on incremental RL, called data-driven robust learning-based predictive control (dr-LPC), for perturbed unknown nonlinear systems subject to safety constraints. A data-driven robust MPC (dr-MPC) is firstly formulated with a learned predictor. The incremental Dual Heuristic Programming (DHP) algorithm using an actor-critic architecture is then utilized to solve the online optimization problem of dr-MPC. In each prediction horizon, the actor and critic learn time-varying laws for approximating the optimal control policy and costate respectively, which is different from classical MPCs. The state and control constraints are enforced in the learning process via building a Hamilton-Jacobi-Bellman (HJB) equation and a regularized actor-critic learning structure using logarithmic barrier functions. The closed-loop robustness and safety of the dr-LPC are proven under function approximation errors. Simulation results on two control examples have been reported, which show that the dr-LPC can outperform the DHP and dr-MPC in terms of state regulation, and its average computational time is much smaller than that with the dr-MPC in both examples.Comment: The paper has been submitted at a IEEE Journal for possible publicatio

    Overestimation of thermal emittance in solenoid scans due to coupled transverse motion

    Full text link
    The solenoid scan is a widely used method for the in-situ measurement of the thermal emittance in a photocathode gun. The popularity of this method is due to its simplicity and convenience since all rf photocathode guns are equipped with an emittance compensation solenoid. This paper shows that the solenoid scan measurement overestimates the thermal emittance in the ordinary measurement configuration due to a weak quadrupole field (present in either the rf gun or gun solenoid) followed by a rotation in the solenoid. This coupled transverse dynamics aberration introduces a correlation between the beam's horizontal and vertical motion leading to an increase in the measured 2D transverse emittance, thus the overestimation of the thermal emittance. This effect was systematically studied using both analytic expressions and numerical simulations. These studies were experimentally verified using an L-band 1.6-cell rf photocathode gun with a cesium telluride cathode, which shows a thermal emittance overestimation of 35% with a rms laser spot size of 2.7 mm. The paper concludes by showing that the accuracy of the solenoid scan can be improved by using a quadrupole magnet corrector, consisting of a pair of normal and skew quadrupole magnets.Comment: 12 pages, 13 figure
    • …
    corecore