Search CORE

47 research outputs found

Efficient Multi-Grained Knowledge Reuse for Class Incremental Segmentation

Author: Lu Zhihe
Wang Xinchao
Yan Shuicheng
Publication venue
Publication date: 03/06/2023
Field of study

Class Incremental Semantic Segmentation (CISS) has been a trend recently due to its great significance in real-world applications. Although the existing CISS methods demonstrate remarkable performance, they either leverage the high-level knowledge (feature) only while neglecting the rich and diverse knowledge in the low-level features, leading to poor old knowledge preservation and weak new knowledge exploration; or use multi-level features for knowledge distillation by retraining a heavy backbone, which is computationally intensive. In this paper, we for the first time propose to efficiently reuse the multi-grained knowledge for CISS by fusing multi-level features with the frozen backbone and show a simple aggregation of varying-level features, i.e., naive feature pyramid, can boost the performance significantly. We further introduce a novel densely-interactive feature pyramid (DEFY) module that enhances the fusion of high- and low-level features by enabling their dense interaction. Specifically, DEFY establishes a per-pixel relationship between pairs of feature maps, allowing for multi-pair outputs to be aggregated. This results in improved semantic segmentation by leveraging the complementary information from multi-level features. We show that DEFY can be effortlessly integrated into three representative methods for performance enhancement. Our method yields a new state-of-the-art performance when combined with the current SOTA by notably averaged mIoU gains on two widely used benchmarks, i.e., 2.5% on PASCAL VOC 2012 and 2.3% on ADE20K.Comment: Technical Report. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

A Dive into SAM Prior in Image Restoration

Author: Bai Jiawang
Lu Zhihe
Xiao Zeyu
Xiong Zhiwei
Publication venue
Publication date: 22/05/2023
Field of study

The goal of image restoration (IR), a fundamental issue in computer vision, is to restore a high-quality (HQ) image from its degraded low-quality (LQ) observation. Multiple HQ solutions may correspond to an LQ input in this poorly posed problem, creating an ambiguous solution space. This motivates the investigation and incorporation of prior knowledge in order to effectively constrain the solution space and enhance the quality of the restored images. In spite of the pervasive use of hand-crafted and learned priors in IR, limited attention has been paid to the incorporation of knowledge from large-scale foundation models. In this paper, we for the first time leverage the prior knowledge of the state-of-the-art segment anything model (SAM) to boost the performance of existing IR networks in an parameter-efficient tuning manner. In particular, the choice of SAM is based on its robustness to image degradations, such that HQ semantic masks can be extracted from it. In order to leverage semantic priors and enhance restoration quality, we propose a lightweight SAM prior tuning (SPT) unit. This plug-and-play component allows us to effectively integrate semantic priors into existing IR networks, resulting in significant improvements in restoration quality. As the only trainable module in our method, the SPT unit has the potential to improve both efficiency and scalability. We demonstrate the effectiveness of the proposed method in enhancing a variety of methods across multiple tasks, such as image super-resolution and color image denoising.Comment: Technical Repor

arXiv.org e-Print Archive

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

Author: Bai Jiawang
Li Xin
Lu Zhihe
Wang Xinchao
Xiao Zeyu
Publication venue
Publication date: 28/11/2023
Field of study

Fine-tuning pre-trained vision-language models (VLMs), e.g., CLIP, for the open-world generalization has gained increasing popularity due to its practical value. However, performance advancements are limited when relying solely on intricate algorithmic designs for a single model, even one exhibiting strong performance, e.g., CLIP-ViT-B/16. This paper, for the first time, explores the collaborative potential of leveraging much weaker VLMs to enhance the generalization of a robust single model. The affirmative findings motivate us to address the generalization problem from a novel perspective, i.e., ensemble of pre-trained VLMs. We introduce three customized ensemble strategies, each tailored to one specific scenario. Firstly, we introduce the zero-shot ensemble, automatically adjusting the logits of different models based on their confidence when only pre-trained VLMs are available. Furthermore, for scenarios with extra few-shot samples, we propose the training-free and tuning ensemble, offering flexibility based on the availability of computing resources. The proposed ensemble strategies are evaluated on zero-shot, base-to-new, and cross-dataset generalization, achieving new state-of-the-art performance. Notably, this work represents an initial stride toward enhancing the generalization performance of VLMs via ensemble. The code is available at https://github.com/zhiheLu/Ensemble_VLM.git.Comment: Technical repor

arXiv.org e-Print Archive

Task Residual for Tuning Vision-Language Models

Author: Chen Zhibo
Jin Xin
Lu Zhihe
Wang Xinchao
Yu Tao
Publication venue
Publication date: 24/03/2023
Field of study

Large-scale vision-language models (VLMs) pre-trained on billion-level data have learned general visual representations and broad visual concepts. In principle, the well-learned knowledge structure of the VLMs should be inherited appropriately when being transferred to downstream tasks with limited data. However, most existing efficient transfer learning (ETL) approaches for VLMs either damage or are excessively biased towards the prior knowledge, e.g., prompt tuning (PT) discards the pre-trained text-based classifier and builds a new one while adapter-style tuning (AT) fully relies on the pre-trained features. To address this, we propose a new efficient tuning approach for VLMs named Task Residual Tuning (TaskRes), which performs directly on the text-based classifier and explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. Specifically, TaskRes keeps the original classifier weights from the VLMs frozen and obtains a new classifier for the target task by tuning a set of prior-independent parameters as a residual to the original one, which enables reliable prior knowledge preservation and flexible task-specific knowledge exploration. The proposed TaskRes is simple yet effective, which significantly outperforms previous ETL methods (e.g., PT and AT) on 11 benchmark datasets while requiring minimal effort for the implementation. Our code is available at https://github.com/geekyutao/TaskRes.Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

Prediction Calibration for Generalized Few-shot Semantic Segmentation

Author: He Sen
Li Da
Lu Zhihe
Song Yi-Zhe
Xiang Tao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/10/2022
Field of study

Generalized Few-shot Semantic Segmentation (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class. Compared to the widely studied Few-shot Semantic Segmentation FSS, which is limited to segmenting novel classes only, GFSS is much under-studied despite being more practical. Existing approach to GFSS is based on classifier parameter fusion whereby a newly trained novel class classifier and a pre-trained base class classifier are combined to form a new classifier. As the training data is dominated by base classes, this approach is inevitably biased towards the base classes. In this work, we propose a novel Prediction Calibration Network PCN to address this problem. Instead of fusing the classifier parameters, we fuse the scores produced separately by the base and novel classifiers. To ensure that the fused scores are not biased to either the base or novel classes, a new Transformer-based calibration module is introduced. It is known that the lower-level features are useful of detecting edge information in an input image than higher-level features. Thus, we build a cross-attention module that guides the classifier's final prediction using the fused multi-level features. However, transformers are computationally demanding. Crucially, to make the proposed cross-attention module training tractable at the pixel level, this module is designed based on feature-score cross-covariance and episodically trained to be generalizable at inference time. Extensive experiments on PASCAL-

5^{i}

and COCO-

20^{i}

show that our PCN outperforms the state-the-the-art alternatives by large margins.Comment: Technical Repor

arXiv.org e-Print Archive

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

Author: Bai Jiawang
Chen Zhibo
Li Xin
Lian Dongze
Lu Zhihe
Wang Xinchao
Publication venue
Publication date: 24/09/2023
Field of study

Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) under the low-data regime, where only a few additional parameters are introduced to excavate the task-specific knowledge based on the general and powerful representation of VLMs. However, most adapter-style works face two limitations: (i) modeling task-specific knowledge with a single modality only; and (ii) overlooking the exploitation of the inter-class relationships in downstream tasks, thereby leading to sub-optimal solutions. To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively. This enables the textual feature of each prompt to leverage the task-specific structure knowledge from both textual and visual modalities, yielding a more effective classifier for downstream tasks. Extensive experimental results on 11 benchmark datasets reveal that our GraphAdapter significantly outperforms previous adapter-based methods. The code will be released at https://github.com/lixinustc/GraphAdapterComment: Accepted by NeurIPS 2023. The manuscript will be further revised based on the review

arXiv.org e-Print Archive

Uncertainty-Aware Source-Free Domain Adaptive Semantic Segmentation

Author: Hospedales Timothy M
Li Da
Lu Zhihe
Song Yi-Zhe
Xiang Tao
Publication venue
Publication date: 17/08/2023
Field of study

Edinburgh Research Explorer

An exploratory survey of money boys and HIV transmission risk in Jilin Province, PR China

Author: Anderson Allen F
Cai Yong
Guo Wei
Jin Huixin
Lee Zixuan
Li Zhihe
Meng Xiangdong
Wang Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

This report represents the first exploratory study of Chinese men who provide commercial sex services to other men ("money boys") in Jilin Province, People's Republic of China, through a convenience sample drawn from Changchun and Jilin City. A total of 86 active money boy participants (Changchun, n = 49; Jilin City, n = 37) were surveyed concerning background and demographics, basic HIV transmission knowledge, and sexual practices. The survey indicated that while Jilin Province money boy behavior matches other studies concerning propensity to high risk behavior and significant bridging potential, the Jilin money boys, unlike previous studies, exhibited a high level of basic HIV/AIDS transmission knowledge. In spite of this level of knowledge, none of the participants reported always using a condom in their sexual activities. They also exhibited a high level of awareness of voluntary counseling and testing available in the province, yet relatively few had availed themselves of these services. These preliminary findings will be used as a baseline and springboard for continuing study in the Jilin Province money boy community. Even now, however, it is becoming clear that the dynamics of male commercial sex work may vary greatly depending upon local influences, and will necessitate that future interventions are highly tailored to area-specific circumstances

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fast clustering algorithm based on MST of representative points

Author: Cuntao Ma
Depeng Lu
Hui Du
Xiaoli Wang
Xinxin Shi
Zhihe Wang
Publication venue: AIMS Press
Publication date: 01/07/2023
Field of study

Minimum spanning tree (MST)-based clustering algorithms are widely used to detect clusters with diverse densities and irregular shapes. However, most algorithms require the entire dataset to construct an MST, which leads to significant computational overhead. To alleviate this issue, our proposed algorithm R-MST utilizes representative points instead of all sample points for constructing MST. Additionally, based on the density and nearest neighbor distance, we improved the representative point selection strategy to enhance the uniform distribution of representative points in sparse areas, enabling the algorithm to perform well on datasets with varying densities. Furthermore, traditional methods for eliminating inconsistent edges generally require prior knowledge about the number of clusters, which is not always readily available in practical applications. Therefore, we propose an adaptive method that employs mutual neighbors to identify inconsistent edges and determine the optimal number of clusters automatically. The experimental results indicate that the R-MST algorithm not only improves the efficiency of clustering but also enhances its accuracy

Directory of Open Access Journals

Ultrasound-guided median nerve electrical stimulation to promote upper limb function recovery after stroke

Author: Chin Kai Ling
Hongmei Fang
Jianlin Zhuang
Jianming Yang
Jingyi Lu
Kunli Yang
Liuyan Wang
Meiqi Wang
Ping Zhang
Qing Luo
Rui Li
Zhihe Tian
Zhufen Yang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2022
Field of study

Peripheral electrical nerve stimulation enhances hand function during stroke rehabilitation. Here, we proposed a percutaneous direct median nerve stimulation guided by ultrasound (ultrasound‐guided median nerve electrical stimulation, UG-MNES) and evaluated its feasibility and effectiveness in the treatment of stroke patients with upper limb extremity impairments. Sixty-three stroke patients (2-3 months of onset) were randomly divided into control and UG-MNES groups. Both groups received routine rehabilitation and the UG-MNES group received an additional ultrasound-guided electrical stimulation of the median nerve at 2 Hz, 0.2 ms pulse-width for 20 minutes with gradual intensity enhancement. The Fugl-Meyer Assessment for upper extremity motor function (FMA-UE) was used as the primary outcome. The secondary outcomes were the Functional Test for the Hemiplegic Upper Extremity (FTHUE-HK), Hand Function Rating Scale, Brunnstrom Stages, and Barthel Index scores for motor and daily functions. All the participants completed the trial without any side effects or adverse events during the intervention. After 4 weeks of intervention, the functions of the upper limbs on the hemiplegic side in both groups achieved significant recovery. Compared to the control group, all evaluation indices used in this trial were improved significantly in the UG-MNES group after 2 and 4 weeks of intervention; particularly, the first intervention of UG-MNES immediately improved all the assessment items significantly. In conclusion, the UG-MNES is a safe and feasible treatment for stroke patients with upper limb extremity impairments and could significantly improve the motor function of the affected upper limb, especially in the first intervention. The UG-MNES could be an effective alternative intervention for stroke with upper limb extremity impairments

UMS Institutional Repository

PubMed Central