18 research outputs found
FLatten Transformer: Vision Transformer using Focused Linear Attention
The quadratic computation complexity of self-attention has been a persistent
challenge when applying Transformer models to vision tasks. Linear attention,
on the other hand, offers a much more efficient alternative with its linear
complexity by approximating the Softmax operation through carefully designed
mapping functions. However, current linear attention approaches either suffer
from significant performance degradation or introduce additional computation
overhead from the mapping functions. In this paper, we propose a novel Focused
Linear Attention module to achieve both high efficiency and expressiveness.
Specifically, we first analyze the factors contributing to the performance
degradation of linear attention from two perspectives: the focus ability and
feature diversity. To overcome these limitations, we introduce a simple yet
effective mapping function and an efficient rank restoration module to enhance
the expressiveness of self-attention while maintaining low computation
complexity. Extensive experiments show that our linear attention module is
applicable to a variety of advanced vision Transformers, and achieves
consistently improved performances on multiple benchmarks. Code is available at
https://github.com/LeapLabTHU/FLatten-Transformer.Comment: ICCV 202
Fine-grained Recognition with Learnable Semantic Data Augmentation
Fine-grained image recognition is a longstanding computer vision challenge
that focuses on differentiating objects belonging to multiple subordinate
categories within the same meta-category. Since images belonging to the same
meta-category usually share similar visual appearances, mining discriminative
visual cues is the key to distinguishing fine-grained categories. Although
commonly used image-level data augmentation techniques have achieved great
success in generic image classification problems, they are rarely applied in
fine-grained scenarios, because their random editing-region behavior is prone
to destroy the discriminative visual cues residing in the subtle regions. In
this paper, we propose diversifying the training data at the feature-level to
alleviate the discriminative region loss problem. Specifically, we produce
diversified augmented samples by translating image features along semantically
meaningful directions. The semantic directions are estimated with a covariance
prediction network, which predicts a sample-wise covariance matrix to adapt to
the large intra-class variation inherent in fine-grained images. Furthermore,
the covariance prediction network is jointly optimized with the classification
network in a meta-learning manner to alleviate the degenerate solution problem.
Experiments on four competitive fine-grained recognition benchmarks
(CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our
method significantly improves the generalization performance on several popular
classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and
ViT). Combined with a recently proposed method, our semantic data augmentation
approach achieves state-of-the-art performance on the CUB-200-2011 dataset. The
source code will be released
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Dynamic computation has emerged as a promising avenue to enhance the
inference efficiency of deep networks. It allows selective activation of
computational units, leading to a reduction in unnecessary computations for
each input sample. However, the actual efficiency of these dynamic models can
deviate from theoretical predictions. This mismatch arises from: 1) the lack of
a unified approach due to fragmented research; 2) the focus on algorithm design
over critical scheduling strategies, especially in CUDA-enabled GPU contexts;
and 3) challenges in measuring practical latency, given that most libraries
cater to static operations. Addressing these issues, we unveil the
Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates
three primary dynamic paradigms-spatially adaptive computation, dynamic layer
skipping, and dynamic channel skipping. To bridge the theoretical and practical
efficiency gap, LAUDNet merges algorithmic design with scheduling optimization,
guided by a latency predictor that accurately gauges dynamic operator latency.
We've tested LAUDNet across multiple vision tasks, demonstrating its capacity
to notably reduce the latency of models like ResNet-101 by over 50% on
platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in
balancing accuracy and efficiency. Code is available at:
https://www.github.com/LeapLabTHU/LAUDNet
Dynamic Perceiver for Efficient Visual Recognition
Early exiting has become a promising approach to improving the inference
efficiency of deep networks. By structuring models with multiple classifiers
(exits), predictions for ``easy'' samples can be generated at earlier exits,
negating the need for executing deeper layers. Current multi-exit networks
typically implement linear classifiers at intermediate layers, compelling
low-level features to encapsulate high-level semantics. This sub-optimal design
invariably undermines the performance of later exits. In this paper, we propose
Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure
and the early classification task with a novel dual-branch architecture. A
feature branch serves to extract image features, while a classification branch
processes a latent code assigned for classification tasks. Bi-directional
cross-attention layers are established to progressively fuse the information of
both branches. Early exits are placed exclusively within the classification
branch, thus eliminating the need for linear separability in low-level
features. Dyn-Perceiver constitutes a versatile and adaptable framework that
can be built upon various architectures. Experiments on image classification,
action recognition, and object detection demonstrate that our method
significantly improves the inference efficiency of different backbones,
outperforming numerous competitive approaches across a broad range of
computational budgets. Evaluation on both CPU and GPU platforms substantiate
the superior practical efficiency of Dyn-Perceiver. Code is available at
https://www.github.com/LeapLabTHU/Dynamic_Perceiver.Comment: Accepted at ICCV 202
Learning to Weight Samples for Dynamic Early-exiting Networks
Early exiting is an effective paradigm for improving the inference efficiency
of deep networks. By constructing classifiers with varying resource demands
(the exits), such networks allow easy samples to be output at early exits,
removing the need for executing deeper layers. While existing works mainly
focus on the architectural design of multi-exit networks, the training
strategies for such models are largely left unexplored. The current
state-of-the-art models treat all samples the same during training. However,
the early-exiting behavior during testing has been ignored, leading to a gap
between training and testing. In this paper, we propose to bridge this gap by
sample weighting. Intuitively, easy samples, which generally exit early in the
network during inference, should contribute more to training early classifiers.
The training of hard samples (mostly exit from deeper layers), however, should
be emphasized by the late classifiers. Our work proposes to adopt a weight
prediction network to weight the loss of different training samples at each
exit. This weight prediction network and the backbone model are jointly
optimized under a meta-learning framework with a novel optimization objective.
By bringing the adaptive behavior during inference into the training phase, we
show that the proposed weighting mechanism consistently improves the trade-off
between classification accuracy and inference efficiency. Code is available at
https://github.com/LeapLabTHU/L2W-DEN.Comment: ECCV 202
Adaptive Rotated Convolution for Rotated Object Detection
Rotated object detection aims to identify and locate objects in images with
arbitrary orientation. In this scenario, the oriented directions of objects
vary considerably across different images, while multiple orientations of
objects exist within an image. This intrinsic characteristic makes it
challenging for standard backbone networks to extract high-quality features of
these arbitrarily orientated objects. In this paper, we present Adaptive
Rotated Convolution (ARC) module to handle the aforementioned challenges. In
our ARC module, the convolution kernels rotate adaptively to extract object
features with varying orientations in different images, and an efficient
conditional computation mechanism is introduced to accommodate the large
orientation variations of objects within an image. The two designs work
seamlessly in rotated object detection problem. Moreover, ARC can conveniently
serve as a plug-and-play module in various vision backbones to boost their
representation ability to detect oriented objects accurately. Experiments on
commonly used benchmarks (DOTA and HRSC2016) demonstrate that equipped with our
proposed ARC module in the backbone network, the performance of multiple
popular oriented object detectors is significantly improved (e.g. +3.03% mAP on
Rotated RetinaNet and +4.16% on CFA). Combined with the highly competitive
method Oriented R-CNN, the proposed approach achieves state-of-the-art
performance on the DOTA dataset with 81.77% mAP
The Impact of Magnetic Field and Gibberellin Treatment on the Release of Dormancy and Internal Nutrient Transformation in <i>Tilia miqueliana</i> Maxim. Seeds
The seeds of Tilia miqueliana Maxim. exhibit deep dormancy, which is categorized as combinational dormancy. This study utilized a comprehensive treatment involving magnetic fields, gibberellin (GA3), and cold stratification to promote the release of seed physiological dormancy and enhance germination rates. After being soaked in 98% H2SO4 for 15 min, mature seeds of Tilia were exposed to magnetic field treatments (150 MT, 250 MT) for different durations (25 min, 45 min, 65 min, and 85 min), as well as GA3 solution soaking (concentration: 0 μmol·L−1, 1443 μmol·L−1). Subsequently, cold stratification (0–5 °C) was applied to investigate the effects of these treatments on seed dormancy release and nutrient transformation. The results indicated that the comprehensive treatment involving magnetic field, GA3 solution soaking, and cold stratification effectively released the physiological dormancy of Tilia seeds and improved germination rates. Among the treatments, M150T85G1443 (magnetic field intensity: 150 MT, magnetic field treatment time: 85 min, GA3 soaking concentration: 1433 μmol·L−1) exhibited the most favorable outcome. After 75 days of cold stratification following the comprehensive treatments, the germination rate of M150T85G1443 seeds reached 89%. Additionally, the levels of storage substances such as starches and crude fats within the seeds decreased, while the utilization of soluble sugars and soluble proteins increased. The M150T85G1443 treatment exhibited the highest degree of variation, leading to gradual increases in metabolic activities of the seeds and a transition from dormancy to germination
Emerging role of ubiquitination/deubiquitination modification of PD-1/PD-L1 in cancer immunotherapy
As members of the immune checkpoint family, PD-1 and its ligand PD-L1 play critical roles in maintaining the balance between autoimmunity and tolerance. The interaction of PD-1/PD-L1 is also involved in tumor evasion inside the tumor microenvironment, caused by reduced T cell activation, proliferation, cytotoxic secretion, and survival. Previous research has shown that the expression level of PD-1/PD-L1 may be regulated by ubiquitin-mediated proteasome degradation, which is an important mode of post-translational modification (PTM). PD-1/PD-L1 ubiquitin modification research in tumor immunotherapy is the subject of the present review, which aims to assess the most recent developments in this area. We offer a short explanation of PD-1/PD-L1 as well as some basic background information on the UPS system and discuss many routes that target E3s and DUBs, respectively, in the regulation of PD-1/PD-L1 in tumor immunotherapy. In addition, we offer numerous innovative prospective research areas for the future, as well as novel immunotherapy concepts and ideas. Taken together, the information compiled herein should serve as a comprehensive repository of information about tumor immunotherapy that is currently available, and it should be useful in the design of future studies, as well as the development of potential targets and strategies for future tumor immunotherapy