237 research outputs found
Self-Paced Multi-Task Learning
In this paper, we propose a novel multi-task learning (MTL) framework, called
Self-Paced Multi-Task Learning (SPMTL). Different from previous works treating
all tasks and instances equally when training, SPMTL attempts to jointly learn
the tasks by taking into consideration the complexities of both tasks and
instances. This is inspired by the cognitive process of human brain that often
learns from the easy to the hard. We construct a compact SPMTL formulation by
proposing a new task-oriented regularizer that can jointly prioritize the tasks
and the instances. Thus it can be interpreted as a self-paced learner for MTL.
A simple yet effective algorithm is designed for optimizing the proposed
objective function. An error bound for a simplified formulation is also
analyzed theoretically. Experimental results on toy and real-world datasets
demonstrate the effectiveness of the proposed approach, compared to the
state-of-the-art methods
Matching-CNN Meets KNN: Quasi-Parametric Human Parsing
Both parametric and non-parametric approaches have demonstrated encouraging
performances in the human parsing task, namely segmenting a human image into
several semantic regions (e.g., hat, bag, left arm, face). In this work, we aim
to develop a new solution with the advantages of both methodologies, namely
supervision from annotated data and the flexibility to use newly annotated
(possibly uncommon) images, and present a quasi-parametric human parsing model.
Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the
parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict
the matching confidence and displacements of the best matched region in the
testing image for a particular semantic region in one KNN image. Given a
testing image, we first retrieve its KNN images from the
annotated/manually-parsed human image corpus. Then each semantic region in each
KNN image is matched with confidence to the testing image using M-CNN, and the
matched regions from all KNN images are further fused, followed by a superpixel
smoothing procedure to obtain the ultimate human parsing result. The M-CNN
differs from the classic CNN in that the tailored cross image matching filters
are introduced to characterize the matching between the testing image and the
semantic region of a KNN image. The cross image matching filters are defined at
different convolutional layers, each aiming to capture a particular range of
displacements. Comprehensive evaluations over a large dataset with 7,700
annotated human images well demonstrate the significant performance gain from
the quasi-parametric model over the state-of-the-arts, for the human parsing
task.Comment: This manuscript is the accepted version for CVPR 201
Construction of a dense genetic linkage map and mapping quantitative trait loci for economic traits of a doubled haploid population of Pyropia haitanensis (Bangiales, Rhodophyta)
The genotypes of 4550 LP markers that were mapped onto the genetic map. (XLSX 1645 kb
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
Visual Grounding (VG) is a crucial topic in the field of vision and language,
which involves locating a specific region described by expressions within an
image. To reduce the reliance on manually labeled data, unsupervised methods
have been developed to locate regions using pseudo-labels. However, the
performance of existing unsupervised methods is highly dependent on the quality
of pseudo-labels and these methods always encounter issues with limited
diversity. In order to utilize vision and language pre-trained models to
address the grounding problem, and reasonably take advantage of pseudo-labels,
we propose CLIP-VG, a novel method that can conduct self-paced curriculum
adapting of CLIP with pseudo-language labels. We propose a simple yet efficient
end-to-end network architecture to realize the transfer of CLIP to the visual
grounding. Based on the CLIP-based architecture, we further propose
single-source and multi-source curriculum adapting algorithms, which can
progressively find more reliable pseudo-labels to learn an optimal model,
thereby achieving a balance between reliability and diversity for the
pseudo-language labels. Our method outperforms the current state-of-the-art
unsupervised method by a significant margin on RefCOCO/+/g datasets in both
single-source and multi-source scenarios, with improvements ranging from 6.78%
to 10.67% and 11.39% to 14.87%, respectively. Furthermore, our approach even
outperforms existing weakly supervised methods. The code and models are
available at https://github.com/linhuixiao/CLIP-VG.Comment: Accepted by IEEE Transaction on Multimedia (2023), Paper page:
https://ieeexplore.ieee.org/abstract/document/10269126. Code will be released
at https://github.com/linhuixiao/CLIP-V
- …