Search CORE

237 research outputs found

Self-Paced Multi-Task Learning

Author: Dong Weishan
Li Changsheng
Liu Qingshan
Wei Fan
Yan Junchi
Zha Hongyuan
Publication venue
Publication date: 13/02/2017
Field of study

In this paper, we propose a novel multi-task learning (MTL) framework, called Self-Paced Multi-Task Learning (SPMTL). Different from previous works treating all tasks and instances equally when training, SPMTL attempts to jointly learn the tasks by taking into consideration the complexities of both tasks and instances. This is inspired by the cognitive process of human brain that often learns from the easy to the hard. We construct a compact SPMTL formulation by proposing a new task-oriented regularizer that can jointly prioritize the tasks and the instances. Thus it can be interpreted as a self-paced learner for MTL. A simple yet effective algorithm is designed for optimizing the proposed objective function. An error bound for a simplified formulation is also analyzed theoretically. Experimental results on toy and real-world datasets demonstrate the effectiveness of the proposed approach, compared to the state-of-the-art methods

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

Author: Cao Xiaochun
Liang Xiaodan
Lin Liang
Liu Luoqi
Liu Si
Shen Xiaohui
Xu Changsheng
Yan Shuicheng
Yang Jianchao
Publication venue
Publication date: 06/04/2015
Field of study

Both parametric and non-parametric approaches have demonstrated encouraging performances in the human parsing task, namely segmenting a human image into several semantic regions (e.g., hat, bag, left arm, face). In this work, we aim to develop a new solution with the advantages of both methodologies, namely supervision from annotated data and the flexibility to use newly annotated (possibly uncommon) images, and present a quasi-parametric human parsing model. Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image. Given a testing image, we first retrieve its KNN images from the annotated/manually-parsed human image corpus. Then each semantic region in each KNN image is matched with confidence to the testing image using M-CNN, and the matched regions from all KNN images are further fused, followed by a superpixel smoothing procedure to obtain the ultimate human parsing result. The M-CNN differs from the classic CNN in that the tailored cross image matching filters are introduced to characterize the matching between the testing image and the semantic region of a KNN image. The cross image matching filters are defined at different convolutional layers, each aiming to capture a particular range of displacements. Comprehensive evaluations over a large dataset with 7,700 annotated human images well demonstrate the significant performance gain from the quasi-parametric model over the state-of-the-arts, for the human parsing task.Comment: This manuscript is the accepted version for CVPR 201

arXiv.org e-Print Archive

Crossref

Construction of a dense genetic linkage map and mapping quantitative trait loci for economic traits of a doubled haploid population of Pyropia haitanensis (Bangiales, Rhodophyta)

Author: Changsheng Chen
Chaotian Xie
Dehua Ji
Hongkun Zheng
Long Huang
Yan Xu
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

The genotypes of 4550 LP markers that were mapped onto the genetic map. (XLSX 1645 kb

Springer - Publisher Connector

The Francis Crick Institute

CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding

Author: Peng Fang
Wang Yaowei
Xiao Linhui
Xu Changsheng
Yan Ming
Yang Xiaoshan
Publication venue
Publication date: 09/10/2023
Field of study

Visual Grounding (VG) is a crucial topic in the field of vision and language, which involves locating a specific region described by expressions within an image. To reduce the reliance on manually labeled data, unsupervised methods have been developed to locate regions using pseudo-labels. However, the performance of existing unsupervised methods is highly dependent on the quality of pseudo-labels and these methods always encounter issues with limited diversity. In order to utilize vision and language pre-trained models to address the grounding problem, and reasonably take advantage of pseudo-labels, we propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels. We propose a simple yet efficient end-to-end network architecture to realize the transfer of CLIP to the visual grounding. Based on the CLIP-based architecture, we further propose single-source and multi-source curriculum adapting algorithms, which can progressively find more reliable pseudo-labels to learn an optimal model, thereby achieving a balance between reliability and diversity for the pseudo-language labels. Our method outperforms the current state-of-the-art unsupervised method by a significant margin on RefCOCO/+/g datasets in both single-source and multi-source scenarios, with improvements ranging from 6.78% to 10.67% and 11.39% to 14.87%, respectively. Furthermore, our approach even outperforms existing weakly supervised methods. The code and models are available at https://github.com/linhuixiao/CLIP-VG.Comment: Accepted by IEEE Transaction on Multimedia (2023), Paper page: https://ieeexplore.ieee.org/abstract/document/10269126. Code will be released at https://github.com/linhuixiao/CLIP-V

arXiv.org e-Print Archive