23 research outputs found
Do we really need temporal convolutions in action segmentation?
Action classification has made great progress, but segmenting and recognizing
actions from long untrimmed videos remains a challenging problem. Most
state-of-the-art methods focus on designing temporal convolution-based models,
but the inflexibility of temporal convolutions and the difficulties in modeling
long-term temporal dependencies restrict the potential of these models.
Transformer-based models with adaptable and sequence modeling capabilities have
recently been used in various tasks. However, the lack of inductive bias and
the inefficiency of handling long video sequences limit the application of
Transformer in action segmentation. In this paper, we design a pure
Transformer-based model without temporal convolutions by incorporating temporal
sampling, called Temporal U-Transformer (TUT). The U-Transformer architecture
reduces complexity while introducing an inductive bias that adjacent frames are
more likely to belong to the same class, but the introduction of coarse
resolutions results in the misclassification of boundaries. We observe that the
similarity distribution between a boundary frame and its neighboring frames
depends on whether the boundary frame is the start or end of an action segment.
Therefore, we further propose a boundary-aware loss based on the distribution
of similarity scores between frames from attention modules to enhance the
ability to recognize boundaries. Extensive experiments show the effectiveness
of our model
Learning to Sample Tasks for Meta Learning
Through experiments on various meta-learning methods, task samplers, and
few-shot learning tasks, this paper arrives at three conclusions. Firstly,
there are no universal task sampling strategies to guarantee the performance of
meta-learning models. Secondly, task diversity can cause the models to either
underfit or overfit during training. Lastly, the generalization performance of
the models are influenced by task divergence, task entropy, and task
difficulty. In response to these findings, we propose a novel task sampler
called Adaptive Sampler (ASr). ASr is a plug-and-play task sampler that takes
task divergence, task entropy, and task difficulty to sample tasks. To optimize
ASr, we rethink and propose a simple and general meta-learning algorithm.
Finally, a large number of empirical experiments demonstrate the effectiveness
of the proposed ASr.Comment: 10 pages, 7 tables, 3 figure
Introducing Expertise Logic into Graph Representation Learning from A Causal Perspective
Benefiting from the injection of human prior knowledge, graphs, as derived
discrete data, are semantically dense so that models can efficiently learn the
semantic information from such data. Accordingly, graph neural networks (GNNs)
indeed achieve impressive success in various fields. Revisiting the GNN
learning paradigms, we discover that the relationship between human expertise
and the knowledge modeled by GNNs still confuses researchers. To this end, we
introduce motivating experiments and derive an empirical observation that the
human expertise is gradually learned by the GNNs in general domains. By further
observing the ramifications of introducing expertise logic into graph
representation learning, we conclude that leading the GNNs to learn human
expertise can improve the model performance. By exploring the intrinsic
mechanism behind such observations, we elaborate the Structural Causal Model
for the graph representation learning paradigm. Following the theoretical
guidance, we innovatively introduce the auxiliary causal logic learning
paradigm to improve the model to learn the expertise logic causally related to
the graph representation learning task. In practice, the counterfactual
technique is further performed to tackle the insufficient training issue during
optimization. Plentiful experiments on the crafted and real-world domains
support the consistent effectiveness of the proposed method
A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning
Due to limitations in data quality, some essential visual tasks are difficult
to perform independently. Introducing previously unavailable information to
transfer informative dark knowledge has been a common way to solve such hard
tasks. However, research on why transferred knowledge works has not been
extensively explored. To address this issue, in this paper, we discover the
correlation between feature discriminability and dimensional structure (DS) by
analyzing and observing features extracted from simple and hard tasks. On this
basis, we express DS using deep channel-wise correlation and intermediate
spatial distribution, and propose a novel cross-modal knowledge distillation
(CMKD) method for better supervised cross-modal learning (CML) performance. The
proposed method enforces output features to be channel-wise independent and
intermediate ones to be uniformly distributed, thereby learning semantically
irrelevant features from the hard task to boost its accuracy. This is
especially useful in specific applications where the performance gap between
dual modalities is relatively large. Furthermore, we collect a real-world CML
dataset to promote community development. The dataset contains more than 10,000
paired optical and radar images and is continuously being updated. Experimental
results on real-world and benchmark datasets validate the effectiveness of the
proposed method
Bootstrapping Informative Graph Augmentation via A Meta Learning Approach
Recent works explore learning graph representations in a self-supervised
manner. In graph contrastive learning, benchmark methods apply various graph
augmentation approaches. However, most of the augmentation methods are
non-learnable, which causes the issue of generating unbeneficial augmented
graphs. Such augmentation may degenerate the representation ability of graph
contrastive learning methods. Therefore, we motivate our method to generate
augmented graph by a learnable graph augmenter, called MEta Graph Augmentation
(MEGA). We then clarify that a "good" graph augmentation must have uniformity
at the instance-level and informativeness at the feature-level. To this end, we
propose a novel approach to learning a graph augmenter that can generate an
augmentation with uniformity and informativeness. The objective of the graph
augmenter is to promote our feature extraction network to learn a more
discriminative feature representation, which motivates us to propose a
meta-learning paradigm. Empirically, the experiments across multiple benchmark
datasets demonstrate that MEGA outperforms the state-of-the-art methods in
graph self-supervised learning tasks. Further experimental studies prove the
effectiveness of different terms of MEGA.Comment: Accepted by International Joint Conference on Artificial Intelligence
(IJCAI) 202
Using images rendered by PBRT to train faster R-CNN for UAV detection
Deep neural networks, such as Faster R-CNN, have been widely used in object detection. However, deep neural
networks usually require a large-scale dataset to achieve desirable performance. For the specific application, UAV
detection, training data is extremely limited in practice. Since annotating plenty of UAV images manually can be
very resource intensive and time consuming, instead, we use PBRT to render a large number of photorealistic UAV
images of high variation within a reasonable time. Using PBRT ensures the realism of rendered images, which
means they are indistinguishable from real photographs to some extent. Trained with our rendered images, the
Faster R-CNN has an AP of 80.69% on manually annotated UAV images test set, much higher than the one only
trained with COCO 2014 dataset and PASCAL VOC 2012 dataset (43.36%). Moreover, our rendered image dataset
contains not only bounding boxes of all UAVs, but also locations of some important parts of UAVs and locations
of all pixels covered by UAVs, which can be used for more complicated application, such as mask detection or
keypoint detection
Robust Causal Graph Representation Learning against Confounding Effects
The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with the model learning semantic information, and current graph representation learning methods have not eliminated their influence. To tackle this issue, we propose Robust Causal Graph Representation Learning (RCGRL) to learn robust graph representations against confounding effects. RCGRL introduces an active approach to generate instrumental variables under unconditional moment restrictions, which empowers the graph representation learning model to eliminate confounders, thereby capturing discriminative information that is causally related to downstream predictions. We offer theorems and proofs to guarantee the theoretical effectiveness of the proposed approach. Empirically, we conduct extensive experiments on a synthetic dataset and multiple benchmark datasets. Experimental results demonstrate the effectiveness and generalization ability of RCGRL. Our codes are available at https://github.com/hang53/RCGRL
A Polarimetric Scattering Characteristics-Guided Adversarial Learning Approach for Unsupervised PolSAR Image Classification
Highly accurate supervised deep learning-based classifiers for polarimetric synthetic aperture radar (PolSAR) images require large amounts of data with manual annotations. Unfortunately, the complex echo imaging mechanism results in a high labeling cost for PolSAR images. Extracting and transferring knowledge to utilize the existing labeled data to the fullest extent is a viable approach in such circumstances. To this end, we are introducing unsupervised deep adversarial domain adaptation (ADA) into PolSAR image classification for the first time. In contrast to the standard learning paradigm, in this study, the deep learning model is trained on labeled data from a source domain and unlabeled data from a related but distinct target domain. The purpose of this is to extract domain-invariant features and generalize them to the target domain. Although the feature transferability of ADA methods can be ensured through adversarial training to align the feature distributions of source and target domains, improving feature discriminability remains a crucial issue. In this paper, we propose a novel polarimetric scattering characteristics-guided adversarial network (PSCAN) for unsupervised PolSAR image classification. Compared with classical ADA methods, we designed an auxiliary task for PSCAN based on the polarimetric scattering characteristics-guided pseudo-label construction. This approach utilizes the rich information contained in the PolSAR data itself, without the need for expensive manual annotations or complex automatic labeling mechanisms. During the training of PSCAN, the auxiliary task receives category semantic information from pseudo-labels and helps promote the discriminability of the learned domain-invariant features, thereby enabling the model to have a better target prediction function. The effectiveness of the proposed method was demonstrated using data captured with different PolSAR systems in the San Francisco and Qingdao areas. Experimental results show that the proposed method can obtain satisfactory unsupervised classification results
Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from a Conditional Causal Perspective
Few-shot learning models learn representations with limited human annotations, and such a learning paradigm demonstrates practicability in various tasks, e.g., image classification, object detection, etc. However, few-shot object detection methods suffer from an intrinsic defect that the limited training data makes the model cannot sufficiently explore semantic information. To tackle this, we introduce knowledge distillation to the few-shot object detection learning paradigm. We further run a motivating experiment, which demonstrates that in the process of knowledge distillation, the empirical error of the teacher model degenerates the prediction performance of the few-shot object detection model as the student. To understand the reasons behind this phenomenon, we revisit the learning paradigm of knowledge distillation on the few-shot object detection task from the causal theoretic standpoint, and accordingly, develop a Structural Causal Model. Following the theoretical guidance, we propose a backdoor adjustment-based knowledge distillation method for the few-shot object detection task, namely Disentangle and Remerge (D&R), to perform conditional causal intervention toward the corresponding Structural Causal Model. Empirically, the experiments on benchmarks demonstrate that D&R can yield significant performance boosts in few-shot object detection. Code is available at https://github.com/ZYN-1101/DandR.git
Integrated Proteomics and Lipidomics Investigation of the Mechanism Underlying the Neuroprotective Effect of <i>N</i>-benzylhexadecanamide
Macamides are very important secondary metabolites produced by Lepidium meyenii Walp, which possess multiple bioactivities, especially in the neuronal system. In a previous study, we observed that macamides exhibited excellent effects in the recovery of injured nerves after 1-methyl-4-phenylpyridinium (MPP+)-induced dopaminergic neuronal damage in zebrafish. However, the mechanism underlying this effect remains unclear. In the present study, we observed that N-benzylhexadecanamide (XA), which is a typical constituent of macamides, improved the survival rate of neurons in vitro. We determined the concentration of neurotransmitters in MN9D cells and used it in conjunction with an integrated proteomics and lipidomics approach to investigate the mechanism underlying the neuroprotective effects of XA in an MPP+-induced neurodegeneration cell model using QqQ MS, Q-TOF MS, and Orbitrap MS. The statistical analysis of the results led to the identification of differentially-expressed biomarkers, including 11 proteins and 22 lipids, which may be responsible for the neuron-related activities of XA. All these potential biomarkers were closely related to the pathogenesis of neurodegenerative diseases, and their levels approached those in the normal group after treatment with XA. Furthermore, seven lipids, including five phosphatidylcholines, one lysophosphatidylcholine, and one phosphatidylethanolamine, were verified by a relative quantitative approach. Moreover, four proteins (Scarb2, Csnk2a2, Vti1b, and Bnip2) were validated by ELISA. The neurotransmitters taurine and norepinephrine, and the cholinergic constituents, correlated closely with the neuroprotective effects of XA. Finally, the protein⁻lipid interaction network was analyzed. Based on our results, the regulation of sphingolipid metabolism and mitochondrial function were determined to be the main mechanisms underlying the neuroprotective effect of XA. The present study should help us to better understand the multiple effects of macamides and their use in neurodegenerative diseases