23 research outputs found

    Do we really need temporal convolutions in action segmentation?

    Full text link
    Action classification has made great progress, but segmenting and recognizing actions from long untrimmed videos remains a challenging problem. Most state-of-the-art methods focus on designing temporal convolution-based models, but the inflexibility of temporal convolutions and the difficulties in modeling long-term temporal dependencies restrict the potential of these models. Transformer-based models with adaptable and sequence modeling capabilities have recently been used in various tasks. However, the lack of inductive bias and the inefficiency of handling long video sequences limit the application of Transformer in action segmentation. In this paper, we design a pure Transformer-based model without temporal convolutions by incorporating temporal sampling, called Temporal U-Transformer (TUT). The U-Transformer architecture reduces complexity while introducing an inductive bias that adjacent frames are more likely to belong to the same class, but the introduction of coarse resolutions results in the misclassification of boundaries. We observe that the similarity distribution between a boundary frame and its neighboring frames depends on whether the boundary frame is the start or end of an action segment. Therefore, we further propose a boundary-aware loss based on the distribution of similarity scores between frames from attention modules to enhance the ability to recognize boundaries. Extensive experiments show the effectiveness of our model

    Learning to Sample Tasks for Meta Learning

    Full text link
    Through experiments on various meta-learning methods, task samplers, and few-shot learning tasks, this paper arrives at three conclusions. Firstly, there are no universal task sampling strategies to guarantee the performance of meta-learning models. Secondly, task diversity can cause the models to either underfit or overfit during training. Lastly, the generalization performance of the models are influenced by task divergence, task entropy, and task difficulty. In response to these findings, we propose a novel task sampler called Adaptive Sampler (ASr). ASr is a plug-and-play task sampler that takes task divergence, task entropy, and task difficulty to sample tasks. To optimize ASr, we rethink and propose a simple and general meta-learning algorithm. Finally, a large number of empirical experiments demonstrate the effectiveness of the proposed ASr.Comment: 10 pages, 7 tables, 3 figure

    Introducing Expertise Logic into Graph Representation Learning from A Causal Perspective

    Full text link
    Benefiting from the injection of human prior knowledge, graphs, as derived discrete data, are semantically dense so that models can efficiently learn the semantic information from such data. Accordingly, graph neural networks (GNNs) indeed achieve impressive success in various fields. Revisiting the GNN learning paradigms, we discover that the relationship between human expertise and the knowledge modeled by GNNs still confuses researchers. To this end, we introduce motivating experiments and derive an empirical observation that the human expertise is gradually learned by the GNNs in general domains. By further observing the ramifications of introducing expertise logic into graph representation learning, we conclude that leading the GNNs to learn human expertise can improve the model performance. By exploring the intrinsic mechanism behind such observations, we elaborate the Structural Causal Model for the graph representation learning paradigm. Following the theoretical guidance, we innovatively introduce the auxiliary causal logic learning paradigm to improve the model to learn the expertise logic causally related to the graph representation learning task. In practice, the counterfactual technique is further performed to tackle the insufficient training issue during optimization. Plentiful experiments on the crafted and real-world domains support the consistent effectiveness of the proposed method

    A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

    Full text link
    Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method

    Bootstrapping Informative Graph Augmentation via A Meta Learning Approach

    Full text link
    Recent works explore learning graph representations in a self-supervised manner. In graph contrastive learning, benchmark methods apply various graph augmentation approaches. However, most of the augmentation methods are non-learnable, which causes the issue of generating unbeneficial augmented graphs. Such augmentation may degenerate the representation ability of graph contrastive learning methods. Therefore, we motivate our method to generate augmented graph by a learnable graph augmenter, called MEta Graph Augmentation (MEGA). We then clarify that a "good" graph augmentation must have uniformity at the instance-level and informativeness at the feature-level. To this end, we propose a novel approach to learning a graph augmenter that can generate an augmentation with uniformity and informativeness. The objective of the graph augmenter is to promote our feature extraction network to learn a more discriminative feature representation, which motivates us to propose a meta-learning paradigm. Empirically, the experiments across multiple benchmark datasets demonstrate that MEGA outperforms the state-of-the-art methods in graph self-supervised learning tasks. Further experimental studies prove the effectiveness of different terms of MEGA.Comment: Accepted by International Joint Conference on Artificial Intelligence (IJCAI) 202

    Using images rendered by PBRT to train faster R-CNN for UAV detection

    Get PDF
    Deep neural networks, such as Faster R-CNN, have been widely used in object detection. However, deep neural networks usually require a large-scale dataset to achieve desirable performance. For the specific application, UAV detection, training data is extremely limited in practice. Since annotating plenty of UAV images manually can be very resource intensive and time consuming, instead, we use PBRT to render a large number of photorealistic UAV images of high variation within a reasonable time. Using PBRT ensures the realism of rendered images, which means they are indistinguishable from real photographs to some extent. Trained with our rendered images, the Faster R-CNN has an AP of 80.69% on manually annotated UAV images test set, much higher than the one only trained with COCO 2014 dataset and PASCAL VOC 2012 dataset (43.36%). Moreover, our rendered image dataset contains not only bounding boxes of all UAVs, but also locations of some important parts of UAVs and locations of all pixels covered by UAVs, which can be used for more complicated application, such as mask detection or keypoint detection

    Robust Causal Graph Representation Learning against Confounding Effects

    No full text
    The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with the model learning semantic information, and current graph representation learning methods have not eliminated their influence. To tackle this issue, we propose Robust Causal Graph Representation Learning (RCGRL) to learn robust graph representations against confounding effects. RCGRL introduces an active approach to generate instrumental variables under unconditional moment restrictions, which empowers the graph representation learning model to eliminate confounders, thereby capturing discriminative information that is causally related to downstream predictions. We offer theorems and proofs to guarantee the theoretical effectiveness of the proposed approach. Empirically, we conduct extensive experiments on a synthetic dataset and multiple benchmark datasets. Experimental results demonstrate the effectiveness and generalization ability of RCGRL. Our codes are available at https://github.com/hang53/RCGRL

    A Polarimetric Scattering Characteristics-Guided Adversarial Learning Approach for Unsupervised PolSAR Image Classification

    No full text
    Highly accurate supervised deep learning-based classifiers for polarimetric synthetic aperture radar (PolSAR) images require large amounts of data with manual annotations. Unfortunately, the complex echo imaging mechanism results in a high labeling cost for PolSAR images. Extracting and transferring knowledge to utilize the existing labeled data to the fullest extent is a viable approach in such circumstances. To this end, we are introducing unsupervised deep adversarial domain adaptation (ADA) into PolSAR image classification for the first time. In contrast to the standard learning paradigm, in this study, the deep learning model is trained on labeled data from a source domain and unlabeled data from a related but distinct target domain. The purpose of this is to extract domain-invariant features and generalize them to the target domain. Although the feature transferability of ADA methods can be ensured through adversarial training to align the feature distributions of source and target domains, improving feature discriminability remains a crucial issue. In this paper, we propose a novel polarimetric scattering characteristics-guided adversarial network (PSCAN) for unsupervised PolSAR image classification. Compared with classical ADA methods, we designed an auxiliary task for PSCAN based on the polarimetric scattering characteristics-guided pseudo-label construction. This approach utilizes the rich information contained in the PolSAR data itself, without the need for expensive manual annotations or complex automatic labeling mechanisms. During the training of PSCAN, the auxiliary task receives category semantic information from pseudo-labels and helps promote the discriminability of the learned domain-invariant features, thereby enabling the model to have a better target prediction function. The effectiveness of the proposed method was demonstrated using data captured with different PolSAR systems in the San Francisco and Qingdao areas. Experimental results show that the proposed method can obtain satisfactory unsupervised classification results

    Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from a Conditional Causal Perspective

    No full text
    Few-shot learning models learn representations with limited human annotations, and such a learning paradigm demonstrates practicability in various tasks, e.g., image classification, object detection, etc. However, few-shot object detection methods suffer from an intrinsic defect that the limited training data makes the model cannot sufficiently explore semantic information. To tackle this, we introduce knowledge distillation to the few-shot object detection learning paradigm. We further run a motivating experiment, which demonstrates that in the process of knowledge distillation, the empirical error of the teacher model degenerates the prediction performance of the few-shot object detection model as the student. To understand the reasons behind this phenomenon, we revisit the learning paradigm of knowledge distillation on the few-shot object detection task from the causal theoretic standpoint, and accordingly, develop a Structural Causal Model. Following the theoretical guidance, we propose a backdoor adjustment-based knowledge distillation method for the few-shot object detection task, namely Disentangle and Remerge (D&R), to perform conditional causal intervention toward the corresponding Structural Causal Model. Empirically, the experiments on benchmarks demonstrate that D&R can yield significant performance boosts in few-shot object detection. Code is available at https://github.com/ZYN-1101/DandR.git

    Integrated Proteomics and Lipidomics Investigation of the Mechanism Underlying the Neuroprotective Effect of <i>N</i>-benzylhexadecanamide

    No full text
    Macamides are very important secondary metabolites produced by Lepidium meyenii Walp, which possess multiple bioactivities, especially in the neuronal system. In a previous study, we observed that macamides exhibited excellent effects in the recovery of injured nerves after 1-methyl-4-phenylpyridinium (MPP+)-induced dopaminergic neuronal damage in zebrafish. However, the mechanism underlying this effect remains unclear. In the present study, we observed that N-benzylhexadecanamide (XA), which is a typical constituent of macamides, improved the survival rate of neurons in vitro. We determined the concentration of neurotransmitters in MN9D cells and used it in conjunction with an integrated proteomics and lipidomics approach to investigate the mechanism underlying the neuroprotective effects of XA in an MPP+-induced neurodegeneration cell model using QqQ MS, Q-TOF MS, and Orbitrap MS. The statistical analysis of the results led to the identification of differentially-expressed biomarkers, including 11 proteins and 22 lipids, which may be responsible for the neuron-related activities of XA. All these potential biomarkers were closely related to the pathogenesis of neurodegenerative diseases, and their levels approached those in the normal group after treatment with XA. Furthermore, seven lipids, including five phosphatidylcholines, one lysophosphatidylcholine, and one phosphatidylethanolamine, were verified by a relative quantitative approach. Moreover, four proteins (Scarb2, Csnk2a2, Vti1b, and Bnip2) were validated by ELISA. The neurotransmitters taurine and norepinephrine, and the cholinergic constituents, correlated closely with the neuroprotective effects of XA. Finally, the protein&#8315;lipid interaction network was analyzed. Based on our results, the regulation of sphingolipid metabolism and mitochondrial function were determined to be the main mechanisms underlying the neuroprotective effect of XA. The present study should help us to better understand the multiple effects of macamides and their use in neurodegenerative diseases
    corecore