Search CORE

290 research outputs found

Class-Incremental Exemplar Compression for Class-Incremental Learning

Author: Liu Yaoyao
Luo Zilin
Schiele Bernt
Sun Qianru
Publication venue
Publication date: 07/04/2023
Field of study

Exemplar-based class-incremental learning (CIL) finetunes the model with all samples of new classes but few-shot exemplars of old classes in each incremental phase, where the "few-shot" abides by the limited memory budget. In this paper, we break this "few-shot" limit based on a simple yet surprisingly effective idea: compressing exemplars by downsampling non-discriminative pixels and saving "many-shot" compressed exemplars in the memory. Without needing any manual annotation, we achieve this compression by generating 0-1 masks on discriminative pixels from class activation maps (CAM). We propose an adaptive mask generation model called class-incremental masking (CIM) to explicitly resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to 0-1 masks with an arbitrary threshold leads to a trade-off between the coverage on discriminative pixels and the quantity of exemplars, as the total memory is fixed; and 2) optimal thresholds vary for different object classes, which is particularly obvious in the dynamic environment of CIL. We optimize the CIM model alternatively with the conventional CIL model through a bilevel optimization problem. We conduct extensive experiments on high-resolution CIL benchmarks including Food-101, ImageNet-100, and ImageNet-1000, and show that using the compressed exemplars by CIM can achieve a new state-of-the-art CIL accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase ImageNet-1000. Our code is available at https://github.com/xfflzl/CIM-CIL.Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

Author: Chen Ruobing
Li Lu
Li Zhiheng
Liu Jie
Liu Yu
Pan Yuxin
Wang Zilin
Publication venue
Publication date: 12/10/2023
Field of study

Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations. Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations. However, existing IRL algorithms primarily tackle the challenge of trajectory ranking ambiguity when learning the reward function. They overlook the crucial role of considering the degree of difference between trajectories in terms of their returns, which is essential for further removing reward ambiguity. Additionally, it is important to note that the reward of a single transition is heavily influenced by the context information within the trajectory. To address these issues, we introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework. Unlike existing approaches, DRASRL takes into account both the ranking of trajectories and the degrees of dissimilarity between them to collaboratively eliminate reward ambiguity when learning a sequence of contextually informed reward signals. Specifically, we leverage the distance between policies, from which the trajectories are generated, as a measure to quantify the degree of differences between traces. This distance-aware information is then used to infer embeddings in the representation space for reward learning, employing the contrastive learning technique. Meanwhile, we integrate the pairwise ranking loss function to incorporate ranking information into the latent features. Moreover, we resort to the Transformer architecture to capture the contextual dependencies within the trajectories in the latent space, leading to more accurate reward estimation. Through extensive experimentation, our DRASRL framework demonstrates significant performance improvements over previous SOTA methods

arXiv.org e-Print Archive

Robust tracking with discriminative ranking middle-level patches

Author: LIANG Zilin
LIU Hong
SUN Qianru
Publication venue: 'IntechOpen'
Publication date: 01/04/2014
Field of study

The appearance model has been shown to be essential for robust visual tracking since it is the basic criterion to locating targets in video sequences. Though existing tracking-by-detection algorithms have shown to be greatly promising, they still suffer from the drift problem, which is caused by updating appearance models. In this paper, we propose a new appearance model composed of ranking middle-level patches to capture more object distinctiveness than traditional tracking-by-detection models. Targets and backgrounds are represented by both low-level bottom-up features and high-level top-down patches, which can compensate each other. Bottom-up features are defined at the pixel level, and each feature gets its discrimination score through selective feature attention mechanism. In top-down feature extraction, rectangular patches are ranked according to their bottom-up discrimination scores, by which all of them are clustered into irregular patches, named ranking middle-level patches. In addition, at the stage of classifier training, the online random forests algorithm is specially refined to reduce drifting problems. Experiments on challenging public datasets and our test videos demonstrate that our approach can effectively prevent the tracker drifting problem and obtain competitive performance in visual tracking

Crossref

Institutional Knowledge at Singapore Management University

Directory of Open Access Journals

Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer

Author: Chen Ying
Ding Yifeng
Huang Zilin
Jiang Zhuoyu
Liu Zhangchi
Ran Bin
You Junwei
Publication venue
Publication date: 21/10/2023
Field of study

Effective classification of autonomous vehicle (AV) driving behavior emerges as a critical area for diagnosing AV operation faults, enhancing autonomous driving algorithms, and reducing accident rates. This paper presents the Gramian Angular Field Vision Transformer (GAF-ViT) model, designed to analyze AV driving behavior. The proposed GAF-ViT model consists of three key components: GAF Transformer Module, Channel Attention Module, and Multi-Channel ViT Module. These modules collectively convert representative sequences of multivariate behavior into multi-channel images and employ image recognition techniques for behavior classification. A channel attention mechanism is applied to multi-channel images to discern the impact of various driving behavior features. Experimental evaluation on the Waymo Open Dataset of trajectories demonstrates that the proposed model achieves state-of-the-art performance. Furthermore, an ablation study effectively substantiates the efficacy of individual modules within the model

arXiv.org e-Print Archive

Single Satellite Imagery Simultaneous Super-resolution and Colorization using Multi-task Deep Neural Networks

Author: Fu Zilin
Han Jungong
Liu Heng
Liu Hongshen
Shao Ling
Publication venue: 'Elsevier BV'
Publication date: 01/05/2018
Field of study

Satellite imagery is a kind of typical remote sensing data, which holds preponderance in large area imaging and strong macro integrity. However, for most commercial space usages, such as virtual display of urban traffic flow, virtual interaction of environmental resources, one drawback of satellite imagery is its low spatial resolution, failing to provide the clear image details. Moreover, in recent years, synthesizing the color for grayscale satellite imagery or recovering the original color of camouflage sensitive regions becomes an urgent requirement for large spatial objects virtual reality interaction. In this work, unlike existing works which solve these two problems separately, we focus on achieving image super-resolution (SR) and image colorization synchronously. Based on multi-task learning, we provide a novel deep neural network model to fulfill single satellite imagery SR and colorization simultaneously. By feeding back the color feature representations into the SR network and jointly optimizing such two tasks, our deep model successfully achieves the mutual cooperation between imagery reconstruction and image colorization. To avoid color bias, we not only adopt the non-satellite imagery to enrich the color diversity of satellite image, but also recalculate the prior color distribution and the valid color range based on the mixed data. We evaluate the proposed model on satellite images from different data sets, such as RSSCN7 and AID. Both the evaluations and comparisons reveal that the proposed multi-task deep learning approach is superior to the state-of-the-art methods, where image SR and colorization can be accomplished simultaneously and efficiently

Crossref

Lancaster E-Prints

Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark

Author: Li Xuefen
Li Zun
Liu Zilin
Shi Ge
Wang Liang
Wu Lifang
Xi Zeyu
Yan Junchi
Publication venue
Publication date: 27/02/2024
Field of study

Despite the recent emergence of video captioning models, how to generate the text description with specific entity names and fine-grained actions is far from being solved, which however has great applications such as basketball live text broadcast. In this paper, a new multimodal knowledge graph supported basketball benchmark for video captioning is proposed. Specifically, we construct a multimodal basketball game knowledge graph (KG_NBA_2022) to provide additional knowledge beyond videos. Then, a multimodal basketball game video captioning (VC_NBA_2022) dataset that contains 9 types of fine-grained shooting events and 286 players' knowledge (i.e., images and names) is constructed based on KG_NBA_2022. We develop a knowledge guided entity-aware video captioning network (KEANet) based on a candidate player list in encoder-decoder form for basketball live text broadcast. The temporal contextual information in video is encoded by introducing the bi-directional GRU (Bi-GRU) module. And the entity-aware module is designed to model the relationships among the players and highlight the key players. Extensive experiments on multiple sports benchmarks demonstrate that KEANet effectively leverages extera knowledge and outperforms advanced video captioning models. The proposed dataset and corresponding codes will be publicly available soo

arXiv.org e-Print Archive

Recommended from our members

Synergistic effects of neck circumference and metabolic risk factors on insulin resistance: the Cardiometabolic Risk in Chinese (CRC) study

Author: Dou Lianjun
Liang Jun
Liu Xuekui
Qi Lu
Sun Zilin
Teng Fei
Wang Yu
Zou Caiyan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/12/2014
Field of study

Objectives: Recent studies have associated neck circumference (NC) with insulin resistance (IR). We examined whether such relation was modified by other metabolic risk factors. Methods: The study samples were from a community-based health examination survey in central China. A total of 2588 apparently healthy Chinese men and women were included. Results: Plasma levels of total cholesterol (TC), HDL-C, uric acid (UA) and diastolic blood pressure (DBP) were independently associated with NC after adjusted for age, sex, body mass index (BMI), waist circumference (WC) and hip circumference (HC) (P = 0.009, 0.001, 0.015 and 0.015, respectively). We observed significant interactions of NC with triglyceride (TG) and UA (all the p for interaction = 0.001) in relation to HOMA-IR. It appeared that the associations between NC and HOMA-IR were more evident in those with higher UA or TG level. Conclusions: Our data indicate that in apparently healthy Chinese adults, there were synergistic effects of UA, TG and neck circumference on insulin resistance

Harvard University - DASH