46 research outputs found

    Efficient Multi-Grained Knowledge Reuse for Class Incremental Segmentation

    Full text link
    Class Incremental Semantic Segmentation (CISS) has been a trend recently due to its great significance in real-world applications. Although the existing CISS methods demonstrate remarkable performance, they either leverage the high-level knowledge (feature) only while neglecting the rich and diverse knowledge in the low-level features, leading to poor old knowledge preservation and weak new knowledge exploration; or use multi-level features for knowledge distillation by retraining a heavy backbone, which is computationally intensive. In this paper, we for the first time propose to efficiently reuse the multi-grained knowledge for CISS by fusing multi-level features with the frozen backbone and show a simple aggregation of varying-level features, i.e., naive feature pyramid, can boost the performance significantly. We further introduce a novel densely-interactive feature pyramid (DEFY) module that enhances the fusion of high- and low-level features by enabling their dense interaction. Specifically, DEFY establishes a per-pixel relationship between pairs of feature maps, allowing for multi-pair outputs to be aggregated. This results in improved semantic segmentation by leveraging the complementary information from multi-level features. We show that DEFY can be effortlessly integrated into three representative methods for performance enhancement. Our method yields a new state-of-the-art performance when combined with the current SOTA by notably averaged mIoU gains on two widely used benchmarks, i.e., 2.5% on PASCAL VOC 2012 and 2.3% on ADE20K.Comment: Technical Report. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    A Dive into SAM Prior in Image Restoration

    Full text link
    The goal of image restoration (IR), a fundamental issue in computer vision, is to restore a high-quality (HQ) image from its degraded low-quality (LQ) observation. Multiple HQ solutions may correspond to an LQ input in this poorly posed problem, creating an ambiguous solution space. This motivates the investigation and incorporation of prior knowledge in order to effectively constrain the solution space and enhance the quality of the restored images. In spite of the pervasive use of hand-crafted and learned priors in IR, limited attention has been paid to the incorporation of knowledge from large-scale foundation models. In this paper, we for the first time leverage the prior knowledge of the state-of-the-art segment anything model (SAM) to boost the performance of existing IR networks in an parameter-efficient tuning manner. In particular, the choice of SAM is based on its robustness to image degradations, such that HQ semantic masks can be extracted from it. In order to leverage semantic priors and enhance restoration quality, we propose a lightweight SAM prior tuning (SPT) unit. This plug-and-play component allows us to effectively integrate semantic priors into existing IR networks, resulting in significant improvements in restoration quality. As the only trainable module in our method, the SPT unit has the potential to improve both efficiency and scalability. We demonstrate the effectiveness of the proposed method in enhancing a variety of methods across multiple tasks, such as image super-resolution and color image denoising.Comment: Technical Repor

    Task Residual for Tuning Vision-Language Models

    Full text link
    Large-scale vision-language models (VLMs) pre-trained on billion-level data have learned general visual representations and broad visual concepts. In principle, the well-learned knowledge structure of the VLMs should be inherited appropriately when being transferred to downstream tasks with limited data. However, most existing efficient transfer learning (ETL) approaches for VLMs either damage or are excessively biased towards the prior knowledge, e.g., prompt tuning (PT) discards the pre-trained text-based classifier and builds a new one while adapter-style tuning (AT) fully relies on the pre-trained features. To address this, we propose a new efficient tuning approach for VLMs named Task Residual Tuning (TaskRes), which performs directly on the text-based classifier and explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. Specifically, TaskRes keeps the original classifier weights from the VLMs frozen and obtains a new classifier for the target task by tuning a set of prior-independent parameters as a residual to the original one, which enables reliable prior knowledge preservation and flexible task-specific knowledge exploration. The proposed TaskRes is simple yet effective, which significantly outperforms previous ETL methods (e.g., PT and AT) on 11 benchmark datasets while requiring minimal effort for the implementation. Our code is available at https://github.com/geekyutao/TaskRes.Comment: Accepted to CVPR 202

    Prediction Calibration for Generalized Few-shot Semantic Segmentation

    Full text link
    Generalized Few-shot Semantic Segmentation (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class. Compared to the widely studied Few-shot Semantic Segmentation FSS, which is limited to segmenting novel classes only, GFSS is much under-studied despite being more practical. Existing approach to GFSS is based on classifier parameter fusion whereby a newly trained novel class classifier and a pre-trained base class classifier are combined to form a new classifier. As the training data is dominated by base classes, this approach is inevitably biased towards the base classes. In this work, we propose a novel Prediction Calibration Network PCN to address this problem. Instead of fusing the classifier parameters, we fuse the scores produced separately by the base and novel classifiers. To ensure that the fused scores are not biased to either the base or novel classes, a new Transformer-based calibration module is introduced. It is known that the lower-level features are useful of detecting edge information in an input image than higher-level features. Thus, we build a cross-attention module that guides the classifier's final prediction using the fused multi-level features. However, transformers are computationally demanding. Crucially, to make the proposed cross-attention module training tractable at the pixel level, this module is designed based on feature-score cross-covariance and episodically trained to be generalizable at inference time. Extensive experiments on PASCAL-5i5^{i} and COCO-20i20^{i} show that our PCN outperforms the state-the-the-art alternatives by large margins.Comment: Technical Repor

    GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

    Full text link
    Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) under the low-data regime, where only a few additional parameters are introduced to excavate the task-specific knowledge based on the general and powerful representation of VLMs. However, most adapter-style works face two limitations: (i) modeling task-specific knowledge with a single modality only; and (ii) overlooking the exploitation of the inter-class relationships in downstream tasks, thereby leading to sub-optimal solutions. To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively. This enables the textual feature of each prompt to leverage the task-specific structure knowledge from both textual and visual modalities, yielding a more effective classifier for downstream tasks. Extensive experimental results on 11 benchmark datasets reveal that our GraphAdapter significantly outperforms previous adapter-based methods. The code will be released at https://github.com/lixinustc/GraphAdapterComment: Accepted by NeurIPS 2023. The manuscript will be further revised based on the review

    An exploratory survey of money boys and HIV transmission risk in Jilin Province, PR China

    Get PDF
    This report represents the first exploratory study of Chinese men who provide commercial sex services to other men ("money boys") in Jilin Province, People's Republic of China, through a convenience sample drawn from Changchun and Jilin City. A total of 86 active money boy participants (Changchun, n = 49; Jilin City, n = 37) were surveyed concerning background and demographics, basic HIV transmission knowledge, and sexual practices. The survey indicated that while Jilin Province money boy behavior matches other studies concerning propensity to high risk behavior and significant bridging potential, the Jilin money boys, unlike previous studies, exhibited a high level of basic HIV/AIDS transmission knowledge. In spite of this level of knowledge, none of the participants reported always using a condom in their sexual activities. They also exhibited a high level of awareness of voluntary counseling and testing available in the province, yet relatively few had availed themselves of these services. These preliminary findings will be used as a baseline and springboard for continuing study in the Jilin Province money boy community. Even now, however, it is becoming clear that the dynamics of male commercial sex work may vary greatly depending upon local influences, and will necessitate that future interventions are highly tailored to area-specific circumstances

    Fast clustering algorithm based on MST of representative points

    Get PDF
    Minimum spanning tree (MST)-based clustering algorithms are widely used to detect clusters with diverse densities and irregular shapes. However, most algorithms require the entire dataset to construct an MST, which leads to significant computational overhead. To alleviate this issue, our proposed algorithm R-MST utilizes representative points instead of all sample points for constructing MST. Additionally, based on the density and nearest neighbor distance, we improved the representative point selection strategy to enhance the uniform distribution of representative points in sparse areas, enabling the algorithm to perform well on datasets with varying densities. Furthermore, traditional methods for eliminating inconsistent edges generally require prior knowledge about the number of clusters, which is not always readily available in practical applications. Therefore, we propose an adaptive method that employs mutual neighbors to identify inconsistent edges and determine the optimal number of clusters automatically. The experimental results indicate that the R-MST algorithm not only improves the efficiency of clustering but also enhances its accuracy

    Ultrasound-guided median nerve electrical stimulation to promote upper limb function recovery after stroke

    Get PDF
    Peripheral electrical nerve stimulation enhances hand function during stroke rehabilitation. Here, we proposed a percutaneous direct median nerve stimulation guided by ultrasound (ultrasound‐guided median nerve electrical stimulation, UG-MNES) and evaluated its feasibility and effectiveness in the treatment of stroke patients with upper limb extremity impairments. Sixty-three stroke patients (2-3 months of onset) were randomly divided into control and UG-MNES groups. Both groups received routine rehabilitation and the UG-MNES group received an additional ultrasound-guided electrical stimulation of the median nerve at 2 Hz, 0.2 ms pulse-width for 20 minutes with gradual intensity enhancement. The Fugl-Meyer Assessment for upper extremity motor function (FMA-UE) was used as the primary outcome. The secondary outcomes were the Functional Test for the Hemiplegic Upper Extremity (FTHUE-HK), Hand Function Rating Scale, Brunnstrom Stages, and Barthel Index scores for motor and daily functions. All the participants completed the trial without any side effects or adverse events during the intervention. After 4 weeks of intervention, the functions of the upper limbs on the hemiplegic side in both groups achieved significant recovery. Compared to the control group, all evaluation indices used in this trial were improved significantly in the UG-MNES group after 2 and 4 weeks of intervention; particularly, the first intervention of UG-MNES immediately improved all the assessment items significantly. In conclusion, the UG-MNES is a safe and feasible treatment for stroke patients with upper limb extremity impairments and could significantly improve the motor function of the affected upper limb, especially in the first intervention. The UG-MNES could be an effective alternative intervention for stroke with upper limb extremity impairments

    Improved Point Dipole Model for Subwavelength Resolution Scattering Near-Field Optical Microscopy (SNOM)

    No full text
    High-resolution microscopy technique is of significant importance for studying nanomaterials. It is necessary to understand the near-field interaction between the probe and substrate materials in order to get the fine structure of the nanomaterial in the subwavelength scale. The numerical methods such as FDTD, FEM, and MoM are inefficient for the SNOM problems because of the illness of the impedance matrix. The analytic method can only be used for some simple objects such as sphere. Here, a quasianalytical method is developed, in which the analytic formula is refined to adapt to various shapes of the probe approaching the curve of SNOM. By this way, it is helpful in comparing the performance of different probes and giving us a direction to design a new type probe in SNOM. As an application, the developed method is used to study the contrast in the SNOM for the interface between the two different surfaces that have different materials and topography
    corecore