127 research outputs found
Weakly Supervised Semantic Segmentation via Progressive Patch Learning
Most of the existing semantic segmentation approaches with image-level class
labels as supervision, highly rely on the initial class activation map (CAM)
generated from the standard classification network. In this paper, a novel
"Progressive Patch Learning" approach is proposed to improve the local details
extraction of the classification, producing the CAM better covering the whole
object rather than only the most discriminative regions as in CAMs obtained in
conventional classification models. "Patch Learning" destructs the feature maps
into patches and independently processes each local patch in parallel before
the final aggregation. Such a mechanism enforces the network to find weak
information from the scattered discriminative local parts, achieving enhanced
local details sensitivity. "Progressive Patch Learning" further extends the
feature destruction and patch learning to multi-level granularities in a
progressive manner. Cooperating with a multi-stage optimization strategy, such
a "Progressive Patch Learning" mechanism implicitly provides the model with the
feature extraction ability across different locality-granularities. As an
alternative to the implicit multi-granularity progressive fusion approach, we
additionally propose an explicit method to simultaneously fuse features from
different granularities in a single model, further enhancing the CAM quality on
the full object coverage. Our proposed method achieves outstanding performance
on the PASCAL VOC 2012 dataset e.g., with 69.6$% mIoU on the test set), which
surpasses most existing weakly supervised semantic segmentation methods. Code
will be made publicly available here https://github.com/TyroneLi/PPL_WSSS.Comment: TMM2022 accepte
Deep Structure Inference Network for Facial Action Unit Recognition
Facial expressions are combinations of basic components called Action Units
(AU). Recognizing AUs is key for developing general facial expression analysis.
In recent years, most efforts in automatic AU recognition have been dedicated
to learning combinations of local features and to exploiting correlations
between Action Units. In this paper, we propose a deep neural architecture that
tackles both problems by combining learned local and global features in its
initial stages and replicating a message passing algorithm between classes
similar to a graphical model inference approach in later stages. We show that
by training the model end-to-end with increased supervision we improve
state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets,
respectively
Attention-Aware Face Hallucination via Deep Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem with the
goal to generate high-resolution (HR) faces from low-resolution (LR) input
images. In contrast to existing methods that often learn a single
patch-to-patch mapping from LR to HR images and are regardless of the
contextual interdependency between patches, we propose a novel Attention-aware
Face Hallucination (Attention-FH) framework which resorts to deep reinforcement
learning for sequentially discovering attended patches and then performing the
facial part enhancement by fully exploiting the global interdependency of the
image. Specifically, in each time step, the recurrent policy network is
proposed to dynamically specify a new attended region by incorporating what
happened in the past. The state (i.e., face hallucination result for the whole
image) can thus be exploited and updated by the local enhancement network on
the selected region. The Attention-FH approach jointly learns the recurrent
policy network and local enhancement network through maximizing the long-term
reward that reflects the hallucination performance over the whole image.
Therefore, our proposed Attention-FH is capable of adaptively personalizing an
optimal searching path for each face image according to its own characteristic.
Extensive experiments show our approach significantly surpasses the
state-of-the-arts on in-the-wild faces with large pose and illumination
variations
- …