2,339 research outputs found
Interpretable Convolutional Neural Networks
This paper proposes a method to modify traditional convolutional neural
networks (CNNs) into interpretable CNNs, in order to clarify knowledge
representations in high conv-layers of CNNs. In an interpretable CNN, each
filter in a high conv-layer represents a certain object part. We do not need
any annotations of object parts or textures to supervise the learning process.
Instead, the interpretable CNN automatically assigns each filter in a high
conv-layer with an object part during the learning process. Our method can be
applied to different types of CNNs with different structures. The clear
knowledge representation in an interpretable CNN can help people understand the
logics inside a CNN, i.e., based on which patterns the CNN makes the decision.
Experiments showed that filters in an interpretable CNN were more semantically
meaningful than those in traditional CNNs.Comment: In this version, we release the website of the code. Compared to the
previous version, we have corrected all values of location instability in
Table 3--6 by dividing the values by sqrt(2), i.e., a=a/sqrt(2). Such
revisions do NOT decrease the significance of the superior performance of our
method, because we make the same correction to location-instability values of
all baseline
Unsupervised Landmark Discovery Using Consistency Guided Bottleneck
We study a challenging problem of unsupervised discovery of object landmarks.
Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps
however, these are limited in generating informed heatmaps while training,
presumably due to the lack of effective structural cues. Also, it is assumed
that all predicted landmarks are semantically relevant despite having no ground
truth supervision. In the current work, we introduce a consistency-guided
bottleneck in an image reconstruction-based pipeline that leverages landmark
consistency, a measure of compatibility score with the pseudo-ground truth to
generate adaptive heatmaps. We propose obtaining pseudo-supervision via forming
landmark correspondence across images. The consistency then modulates the
uncertainty of the discovered landmarks in the generation of adaptive heatmaps
which rank consistent landmarks above their noisy counterparts, providing
effective structural information for improved robustness. Evaluations on five
diverse datasets including MAFL, AFLW, LS3D, Cats, and Shoes demonstrate
excellent performance of the proposed approach compared to the existing
state-of-the-art methods. Our code is publicly available at
https://github.com/MamonaAwan/CGB_ULD.Comment: Accepted ORAL at BMVC 2023 ; Code:
https://github.com/MamonaAwan/CGB_UL
Self-supervised learning of a facial attribute embedding from video
We propose a self-supervised framework for learning facial attributes by
simply watching videos of a human face speaking, laughing, and moving over
time. To perform this task, we introduce a network, Facial Attributes-Net
(FAb-Net), that is trained to embed multiple frames from the same video
face-track into a common low-dimensional space. With this approach, we make
three contributions: first, we show that the network can leverage information
from multiple source frames by predicting confidence/attention masks for each
frame; second, we demonstrate that using a curriculum learning regime improves
the learned embedding; finally, we demonstrate that the network learns a
meaningful face embedding that encodes information about head pose, facial
landmarks and facial expression, i.e. facial attributes, without having been
supervised with any labelled data. We are comparable or superior to
state-of-the-art self-supervised methods on these tasks and approach the
performance of supervised methods.Comment: To appear in BMVC 2018. Supplementary material can be found at
http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/fabnet.htm
- …