290 research outputs found
Class-Incremental Exemplar Compression for Class-Incremental Learning
Exemplar-based class-incremental learning (CIL) finetunes the model with all
samples of new classes but few-shot exemplars of old classes in each
incremental phase, where the "few-shot" abides by the limited memory budget. In
this paper, we break this "few-shot" limit based on a simple yet surprisingly
effective idea: compressing exemplars by downsampling non-discriminative pixels
and saving "many-shot" compressed exemplars in the memory. Without needing any
manual annotation, we achieve this compression by generating 0-1 masks on
discriminative pixels from class activation maps (CAM). We propose an adaptive
mask generation model called class-incremental masking (CIM) to explicitly
resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to
0-1 masks with an arbitrary threshold leads to a trade-off between the coverage
on discriminative pixels and the quantity of exemplars, as the total memory is
fixed; and 2) optimal thresholds vary for different object classes, which is
particularly obvious in the dynamic environment of CIL. We optimize the CIM
model alternatively with the conventional CIL model through a bilevel
optimization problem. We conduct extensive experiments on high-resolution CIL
benchmarks including Food-101, ImageNet-100, and ImageNet-1000, and show that
using the compressed exemplars by CIM can achieve a new state-of-the-art CIL
accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase
ImageNet-1000. Our code is available at https://github.com/xfflzl/CIM-CIL.Comment: Accepted to CVPR 202
Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying
reward function based on collected expert demonstrations. Considering that
obtaining expert demonstrations can be costly, the focus of current IRL
techniques is on learning a better-than-demonstrator policy using a reward
function derived from sub-optimal demonstrations. However, existing IRL
algorithms primarily tackle the challenge of trajectory ranking ambiguity when
learning the reward function. They overlook the crucial role of considering the
degree of difference between trajectories in terms of their returns, which is
essential for further removing reward ambiguity. Additionally, it is important
to note that the reward of a single transition is heavily influenced by the
context information within the trajectory. To address these issues, we
introduce the Distance-rank Aware Sequential Reward Learning (DRASRL)
framework. Unlike existing approaches, DRASRL takes into account both the
ranking of trajectories and the degrees of dissimilarity between them to
collaboratively eliminate reward ambiguity when learning a sequence of
contextually informed reward signals. Specifically, we leverage the distance
between policies, from which the trajectories are generated, as a measure to
quantify the degree of differences between traces. This distance-aware
information is then used to infer embeddings in the representation space for
reward learning, employing the contrastive learning technique. Meanwhile, we
integrate the pairwise ranking loss function to incorporate ranking information
into the latent features. Moreover, we resort to the Transformer architecture
to capture the contextual dependencies within the trajectories in the latent
space, leading to more accurate reward estimation. Through extensive
experimentation, our DRASRL framework demonstrates significant performance
improvements over previous SOTA methods
Robust tracking with discriminative ranking middle-level patches
The appearance model has been shown to be essential for robust visual tracking since it is the basic criterion to locating targets in video sequences. Though existing tracking-by-detection algorithms have shown to be greatly promising, they still suffer from the drift problem, which is caused by updating appearance models. In this paper, we propose a new appearance model composed of ranking middle-level patches to capture more object distinctiveness than traditional tracking-by-detection models. Targets and backgrounds are represented by both low-level bottom-up features and high-level top-down patches, which can compensate each other. Bottom-up features are defined at the pixel level, and each feature gets its discrimination score through selective feature attention mechanism. In top-down feature extraction, rectangular patches are ranked according to their bottom-up discrimination scores, by which all of them are clustered into irregular patches, named ranking middle-level patches. In addition, at the stage of classifier training, the online random forests algorithm is specially refined to reduce drifting problems. Experiments on challenging public datasets and our test videos demonstrate that our approach can effectively prevent the tracker drifting problem and obtain competitive performance in visual tracking
Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer
Effective classification of autonomous vehicle (AV) driving behavior emerges
as a critical area for diagnosing AV operation faults, enhancing autonomous
driving algorithms, and reducing accident rates. This paper presents the
Gramian Angular Field Vision Transformer (GAF-ViT) model, designed to analyze
AV driving behavior. The proposed GAF-ViT model consists of three key
components: GAF Transformer Module, Channel Attention Module, and Multi-Channel
ViT Module. These modules collectively convert representative sequences of
multivariate behavior into multi-channel images and employ image recognition
techniques for behavior classification. A channel attention mechanism is
applied to multi-channel images to discern the impact of various driving
behavior features. Experimental evaluation on the Waymo Open Dataset of
trajectories demonstrates that the proposed model achieves state-of-the-art
performance. Furthermore, an ablation study effectively substantiates the
efficacy of individual modules within the model
Single Satellite Imagery Simultaneous Super-resolution and Colorization using Multi-task Deep Neural Networks
Satellite imagery is a kind of typical remote sensing data, which holds preponderance in large area imaging and strong macro integrity. However, for most commercial space usages, such as virtual display of urban traffic flow, virtual interaction of environmental resources, one drawback of satellite imagery is its low spatial resolution, failing to provide the clear image details. Moreover, in recent years, synthesizing the color for grayscale satellite imagery or recovering the original color of camouflage sensitive regions becomes an urgent requirement for large spatial objects virtual reality interaction. In this work, unlike existing works which solve these two problems separately, we focus on achieving image super-resolution (SR) and image colorization synchronously. Based on multi-task learning, we provide a novel deep neural network model to fulfill single satellite imagery SR and colorization simultaneously. By feeding back the color feature representations into the SR network and jointly optimizing such two tasks, our deep model successfully achieves the mutual cooperation between imagery reconstruction and image colorization. To avoid color bias, we not only adopt the non-satellite imagery to enrich the color diversity of satellite image, but also recalculate the prior color distribution and the valid color range based on the mixed data. We evaluate the proposed model on satellite images from different data sets, such as RSSCN7 and AID. Both the evaluations and comparisons reveal that the proposed multi-task deep learning approach is superior to the state-of-the-art methods, where image SR and colorization can be accomplished simultaneously and efficiently
Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark
Despite the recent emergence of video captioning models, how to generate the
text description with specific entity names and fine-grained actions is far
from being solved, which however has great applications such as basketball live
text broadcast. In this paper, a new multimodal knowledge graph supported
basketball benchmark for video captioning is proposed. Specifically, we
construct a multimodal basketball game knowledge graph (KG_NBA_2022) to provide
additional knowledge beyond videos. Then, a multimodal basketball game video
captioning (VC_NBA_2022) dataset that contains 9 types of fine-grained shooting
events and 286 players' knowledge (i.e., images and names) is constructed based
on KG_NBA_2022. We develop a knowledge guided entity-aware video captioning
network (KEANet) based on a candidate player list in encoder-decoder form for
basketball live text broadcast. The temporal contextual information in video is
encoded by introducing the bi-directional GRU (Bi-GRU) module. And the
entity-aware module is designed to model the relationships among the players
and highlight the key players. Extensive experiments on multiple sports
benchmarks demonstrate that KEANet effectively leverages extera knowledge and
outperforms advanced video captioning models. The proposed dataset and
corresponding codes will be publicly available soo
Recommended from our members
Synergistic effects of neck circumference and metabolic risk factors on insulin resistance: the Cardiometabolic Risk in Chinese (CRC) study
Objectives: Recent studies have associated neck circumference (NC) with insulin resistance (IR). We examined whether such relation was modified by other metabolic risk factors. Methods: The study samples were from a community-based health examination survey in central China. A total of 2588 apparently healthy Chinese men and women were included. Results: Plasma levels of total cholesterol (TC), HDL-C, uric acid (UA) and diastolic blood pressure (DBP) were independently associated with NC after adjusted for age, sex, body mass index (BMI), waist circumference (WC) and hip circumference (HC) (P = 0.009, 0.001, 0.015 and 0.015, respectively). We observed significant interactions of NC with triglyceride (TG) and UA (all the p for interaction = 0.001) in relation to HOMA-IR. It appeared that the associations between NC and HOMA-IR were more evident in those with higher UA or TG level. Conclusions: Our data indicate that in apparently healthy Chinese adults, there were synergistic effects of UA, TG and neck circumference on insulin resistance
- …