47 research outputs found
Efficient Multi-Grained Knowledge Reuse for Class Incremental Segmentation
Class Incremental Semantic Segmentation (CISS) has been a trend recently due
to its great significance in real-world applications. Although the existing
CISS methods demonstrate remarkable performance, they either leverage the
high-level knowledge (feature) only while neglecting the rich and diverse
knowledge in the low-level features, leading to poor old knowledge preservation
and weak new knowledge exploration; or use multi-level features for knowledge
distillation by retraining a heavy backbone, which is computationally
intensive. In this paper, we for the first time propose to efficiently reuse
the multi-grained knowledge for CISS by fusing multi-level features with the
frozen backbone and show a simple aggregation of varying-level features, i.e.,
naive feature pyramid, can boost the performance significantly. We further
introduce a novel densely-interactive feature pyramid (DEFY) module that
enhances the fusion of high- and low-level features by enabling their dense
interaction. Specifically, DEFY establishes a per-pixel relationship between
pairs of feature maps, allowing for multi-pair outputs to be aggregated. This
results in improved semantic segmentation by leveraging the complementary
information from multi-level features. We show that DEFY can be effortlessly
integrated into three representative methods for performance enhancement. Our
method yields a new state-of-the-art performance when combined with the current
SOTA by notably averaged mIoU gains on two widely used benchmarks, i.e., 2.5%
on PASCAL VOC 2012 and 2.3% on ADE20K.Comment: Technical Report. This work has been submitted to the IEEE for
possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl
A Dive into SAM Prior in Image Restoration
The goal of image restoration (IR), a fundamental issue in computer vision,
is to restore a high-quality (HQ) image from its degraded low-quality (LQ)
observation. Multiple HQ solutions may correspond to an LQ input in this poorly
posed problem, creating an ambiguous solution space. This motivates the
investigation and incorporation of prior knowledge in order to effectively
constrain the solution space and enhance the quality of the restored images. In
spite of the pervasive use of hand-crafted and learned priors in IR, limited
attention has been paid to the incorporation of knowledge from large-scale
foundation models. In this paper, we for the first time leverage the prior
knowledge of the state-of-the-art segment anything model (SAM) to boost the
performance of existing IR networks in an parameter-efficient tuning manner. In
particular, the choice of SAM is based on its robustness to image degradations,
such that HQ semantic masks can be extracted from it. In order to leverage
semantic priors and enhance restoration quality, we propose a lightweight SAM
prior tuning (SPT) unit. This plug-and-play component allows us to effectively
integrate semantic priors into existing IR networks, resulting in significant
improvements in restoration quality. As the only trainable module in our
method, the SPT unit has the potential to improve both efficiency and
scalability. We demonstrate the effectiveness of the proposed method in
enhancing a variety of methods across multiple tasks, such as image
super-resolution and color image denoising.Comment: Technical Repor
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Fine-tuning pre-trained vision-language models (VLMs), e.g., CLIP, for the
open-world generalization has gained increasing popularity due to its practical
value. However, performance advancements are limited when relying solely on
intricate algorithmic designs for a single model, even one exhibiting strong
performance, e.g., CLIP-ViT-B/16. This paper, for the first time, explores the
collaborative potential of leveraging much weaker VLMs to enhance the
generalization of a robust single model. The affirmative findings motivate us
to address the generalization problem from a novel perspective, i.e., ensemble
of pre-trained VLMs. We introduce three customized ensemble strategies, each
tailored to one specific scenario. Firstly, we introduce the zero-shot
ensemble, automatically adjusting the logits of different models based on their
confidence when only pre-trained VLMs are available. Furthermore, for scenarios
with extra few-shot samples, we propose the training-free and tuning ensemble,
offering flexibility based on the availability of computing resources. The
proposed ensemble strategies are evaluated on zero-shot, base-to-new, and
cross-dataset generalization, achieving new state-of-the-art performance.
Notably, this work represents an initial stride toward enhancing the
generalization performance of VLMs via ensemble. The code is available at
https://github.com/zhiheLu/Ensemble_VLM.git.Comment: Technical repor
Task Residual for Tuning Vision-Language Models
Large-scale vision-language models (VLMs) pre-trained on billion-level data
have learned general visual representations and broad visual concepts. In
principle, the well-learned knowledge structure of the VLMs should be inherited
appropriately when being transferred to downstream tasks with limited data.
However, most existing efficient transfer learning (ETL) approaches for VLMs
either damage or are excessively biased towards the prior knowledge, e.g.,
prompt tuning (PT) discards the pre-trained text-based classifier and builds a
new one while adapter-style tuning (AT) fully relies on the pre-trained
features. To address this, we propose a new efficient tuning approach for VLMs
named Task Residual Tuning (TaskRes), which performs directly on the text-based
classifier and explicitly decouples the prior knowledge of the pre-trained
models and new knowledge regarding a target task. Specifically, TaskRes keeps
the original classifier weights from the VLMs frozen and obtains a new
classifier for the target task by tuning a set of prior-independent parameters
as a residual to the original one, which enables reliable prior knowledge
preservation and flexible task-specific knowledge exploration. The proposed
TaskRes is simple yet effective, which significantly outperforms previous ETL
methods (e.g., PT and AT) on 11 benchmark datasets while requiring minimal
effort for the implementation. Our code is available at
https://github.com/geekyutao/TaskRes.Comment: Accepted to CVPR 202
Prediction Calibration for Generalized Few-shot Semantic Segmentation
Generalized Few-shot Semantic Segmentation (GFSS) aims to segment each image
pixel into either base classes with abundant training examples or novel classes
with only a handful of (e.g., 1-5) training images per class. Compared to the
widely studied Few-shot Semantic Segmentation FSS, which is limited to
segmenting novel classes only, GFSS is much under-studied despite being more
practical. Existing approach to GFSS is based on classifier parameter fusion
whereby a newly trained novel class classifier and a pre-trained base class
classifier are combined to form a new classifier. As the training data is
dominated by base classes, this approach is inevitably biased towards the base
classes. In this work, we propose a novel Prediction Calibration Network PCN to
address this problem. Instead of fusing the classifier parameters, we fuse the
scores produced separately by the base and novel classifiers. To ensure that
the fused scores are not biased to either the base or novel classes, a new
Transformer-based calibration module is introduced. It is known that the
lower-level features are useful of detecting edge information in an input image
than higher-level features. Thus, we build a cross-attention module that guides
the classifier's final prediction using the fused multi-level features.
However, transformers are computationally demanding. Crucially, to make the
proposed cross-attention module training tractable at the pixel level, this
module is designed based on feature-score cross-covariance and episodically
trained to be generalizable at inference time. Extensive experiments on
PASCAL- and COCO- show that our PCN outperforms the
state-the-the-art alternatives by large margins.Comment: Technical Repor
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Adapter-style efficient transfer learning (ETL) has shown excellent
performance in the tuning of vision-language models (VLMs) under the low-data
regime, where only a few additional parameters are introduced to excavate the
task-specific knowledge based on the general and powerful representation of
VLMs. However, most adapter-style works face two limitations: (i) modeling
task-specific knowledge with a single modality only; and (ii) overlooking the
exploitation of the inter-class relationships in downstream tasks, thereby
leading to sub-optimal solutions. To mitigate that, we propose an effective
adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual
adapter by explicitly modeling the dual-modality structure knowledge (i.e., the
correlation of different semantics/classes in textual and visual modalities)
with a dual knowledge graph. In particular, the dual knowledge graph is
established with two sub-graphs, i.e., a textual knowledge sub-graph, and a
visual knowledge sub-graph, where the nodes and edges represent the
semantics/classes and their correlations in two modalities, respectively. This
enables the textual feature of each prompt to leverage the task-specific
structure knowledge from both textual and visual modalities, yielding a more
effective classifier for downstream tasks. Extensive experimental results on 11
benchmark datasets reveal that our GraphAdapter significantly outperforms
previous adapter-based methods. The code will be released at
https://github.com/lixinustc/GraphAdapterComment: Accepted by NeurIPS 2023. The manuscript will be further revised
based on the review
An exploratory survey of money boys and HIV transmission risk in Jilin Province, PR China
This report represents the first exploratory study of Chinese men who provide commercial sex services to other men ("money boys") in Jilin Province, People's Republic of China, through a convenience sample drawn from Changchun and Jilin City. A total of 86 active money boy participants (Changchun, n = 49; Jilin City, n = 37) were surveyed concerning background and demographics, basic HIV transmission knowledge, and sexual practices. The survey indicated that while Jilin Province money boy behavior matches other studies concerning propensity to high risk behavior and significant bridging potential, the Jilin money boys, unlike previous studies, exhibited a high level of basic HIV/AIDS transmission knowledge. In spite of this level of knowledge, none of the participants reported always using a condom in their sexual activities. They also exhibited a high level of awareness of voluntary counseling and testing available in the province, yet relatively few had availed themselves of these services. These preliminary findings will be used as a baseline and springboard for continuing study in the Jilin Province money boy community. Even now, however, it is becoming clear that the dynamics of male commercial sex work may vary greatly depending upon local influences, and will necessitate that future interventions are highly tailored to area-specific circumstances
Fast clustering algorithm based on MST of representative points
Minimum spanning tree (MST)-based clustering algorithms are widely used to detect clusters with diverse densities and irregular shapes. However, most algorithms require the entire dataset to construct an MST, which leads to significant computational overhead. To alleviate this issue, our proposed algorithm R-MST utilizes representative points instead of all sample points for constructing MST. Additionally, based on the density and nearest neighbor distance, we improved the representative point selection strategy to enhance the uniform distribution of representative points in sparse areas, enabling the algorithm to perform well on datasets with varying densities. Furthermore, traditional methods for eliminating inconsistent edges generally require prior knowledge about the number of clusters, which is not always readily available in practical applications. Therefore, we propose an adaptive method that employs mutual neighbors to identify inconsistent edges and determine the optimal number of clusters automatically. The experimental results indicate that the R-MST algorithm not only improves the efficiency of clustering but also enhances its accuracy
Ultrasound-guided median nerve electrical stimulation to promote upper limb function recovery after stroke
Peripheral electrical nerve stimulation enhances hand function during stroke rehabilitation. Here, we proposed a percutaneous direct median nerve stimulation guided by ultrasound (ultrasound‐guided median nerve electrical stimulation, UG-MNES) and evaluated its feasibility and effectiveness in the treatment of stroke patients with upper limb extremity impairments. Sixty-three stroke patients (2-3 months of onset) were randomly divided into control and UG-MNES groups. Both groups received routine rehabilitation and the UG-MNES group received an additional ultrasound-guided electrical stimulation of the median nerve at 2 Hz, 0.2 ms pulse-width for 20 minutes with gradual intensity enhancement. The Fugl-Meyer Assessment for upper extremity motor function (FMA-UE) was used as the primary outcome. The secondary outcomes were the Functional Test for the Hemiplegic Upper Extremity (FTHUE-HK), Hand Function Rating Scale, Brunnstrom Stages, and Barthel Index scores for motor and daily functions. All the participants completed the trial without any side effects or adverse events during the intervention. After 4 weeks of intervention, the functions of the upper limbs on the hemiplegic side in both groups achieved significant recovery. Compared to the control group, all evaluation indices used in this trial were improved significantly in the UG-MNES group after 2 and 4 weeks of intervention; particularly, the first intervention of UG-MNES immediately improved all the assessment items significantly. In conclusion, the UG-MNES is a safe and feasible treatment for stroke patients with upper limb extremity impairments and could significantly improve the motor function of the affected upper limb, especially in the first intervention. The UG-MNES could be an effective alternative intervention for stroke with upper limb extremity impairments