17,250 research outputs found
Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels
Graph Convolutional Networks(GCNs) play a crucial role in graph learning
tasks, however, learning graph embedding with few supervised signals is still a
difficult problem. In this paper, we propose a novel training algorithm for
Graph Convolutional Network, called Multi-Stage Self-Supervised(M3S) Training
Algorithm, combined with self-supervised learning approach, focusing on
improving the generalization performance of GCNs on graphs with few labeled
nodes. Firstly, a Multi-Stage Training Framework is provided as the basis of
M3S training method. Then we leverage DeepCluster technique, a popular form of
self-supervised learning, and design corresponding aligning mechanism on the
embedding space to refine the Multi-Stage Training Framework, resulting in M3S
Training Algorithm. Finally, extensive experimental results verify the superior
performance of our algorithm on graphs with few labeled nodes under different
label rates compared with other state-of-the-art approaches.Comment: AAAI Conference on Artificial Intelligence (AAAI 2020
Transitive Invariance for Self-supervised Visual Representation Learning
Learning visual representations with self-supervised learning has become
popular in computer vision. The idea is to design auxiliary tasks where labels
are free to obtain. Most of these tasks end up providing data to learn specific
kinds of invariance useful for recognition. In this paper, we propose to
exploit different self-supervised approaches to learn representations invariant
to (i) inter-instance variations (two objects in the same class should have
similar features) and (ii) intra-instance variations (viewpoint, pose,
deformations, illumination, etc). Instead of combining two approaches with
multi-task learning, we argue to organize and reason the data with multiple
variations. Specifically, we propose to generate a graph with millions of
objects mined from hundreds of thousands of videos. The objects are connected
by two types of edges which correspond to two types of invariance: "different
instances but a similar viewpoint and category" and "different viewpoints of
the same instance". By applying simple transitivity on the graph with these
edges, we can obtain pairs of images exhibiting richer visual invariance. We
use this data to train a Triplet-Siamese network with VGG16 as the base
architecture and apply the learned representations to different recognition
tasks. For object detection, we achieve 63.2% mAP on PASCAL VOC 2007 using Fast
R-CNN (compare to 67.3% with ImageNet pre-training). For the challenging COCO
dataset, our method is surprisingly close (23.5%) to the ImageNet-supervised
counterpart (24.4%) using the Faster R-CNN framework. We also show that our
network can perform significantly better than the ImageNet network in the
surface normal estimation task.Comment: ICCV 201
RetinaFace: Single-stage Dense Face Localisation in the Wild
Though tremendous strides have been made in uncontrolled face detection,
accurate and efficient face localisation in the wild remains an open challenge.
This paper presents a robust single-stage face detector, named RetinaFace,
which performs pixel-wise face localisation on various scales of faces by
taking advantages of joint extra-supervised and self-supervised multi-task
learning. Specifically, We make contributions in the following five aspects:
(1) We manually annotate five facial landmarks on the WIDER FACE dataset and
observe significant improvement in hard face detection with the assistance of
this extra supervision signal. (2) We further add a self-supervised mesh
decoder branch for predicting a pixel-wise 3D shape face information in
parallel with the existing supervised branches. (3) On the WIDER FACE hard test
set, RetinaFace outperforms the state of the art average precision (AP) by 1.1%
(achieving AP equal to 91.4%). (4) On the IJB-C test set, RetinaFace enables
state of the art methods (ArcFace) to improve their results in face
verification (TAR=89.59% for FAR=1e-6). (5) By employing light-weight backbone
networks, RetinaFace can run real-time on a single CPU core for a
VGA-resolution image. Extra annotations and code have been made available at:
https://github.com/deepinsight/insightface/tree/master/RetinaFace
Every Node Counts: Self-Ensembling Graph Convolutional Networks for Semi-Supervised Learning
Graph convolutional network (GCN) provides a powerful means for graph-based
semi-supervised tasks. However, as a localized first-order approximation of
spectral graph convolution, the classic GCN can not take full advantage of
unlabeled data, especially when the unlabeled node is far from labeled ones. To
capitalize on the information from unlabeled nodes to boost the training for
GCN, we propose a novel framework named Self-Ensembling GCN (SEGCN), which
marries GCN with Mean Teacher - another powerful model in semi-supervised
learning. SEGCN contains a student model and a teacher model. As a student, it
not only learns to correctly classify the labeled nodes, but also tries to be
consistent with the teacher on unlabeled nodes in more challenging situations,
such as a high dropout rate and graph collapse. As a teacher, it averages the
student model weights and generates more accurate predictions to lead the
student. In such a mutual-promoting process, both labeled and unlabeled samples
can be fully utilized for backpropagating effective gradients to train GCN. In
three article classification tasks, i.e. Citeseer, Cora and Pubmed, we validate
that the proposed method matches the state of the arts in the classification
accuracy.Comment: 9 pages, 4 figure
Neural Graph Machines: Learning Neural Networks Using Graphs
Label propagation is a powerful and flexible semi-supervised learning
technique on graphs. Neural networks, on the other hand, have proven track
records in many supervised learning tasks. In this work, we propose a training
framework with a graph-regularised objective, namely "Neural Graph Machines",
that can combine the power of neural networks and label propagation. This work
generalises previous literature on graph-augmented training of neural networks,
enabling it to be applied to multiple neural architectures (Feed-forward NNs,
CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the
neural networks to harness both labeled and unlabeled data by: (a) allowing the
network to train using labeled data as in the supervised setting, (b) biasing
the network to learn similar hidden representations for neighboring nodes on a
graph, in the same vein as label propagation. Such architectures with the
proposed objective can be trained efficiently using stochastic gradient descent
and scaled to large graphs, with a runtime that is linear in the number of
edges. The proposed joint training approach convincingly outperforms many
existing methods on a wide range of tasks (multi-label classification on social
graphs, news categorization, document classification and semantic intent
classification), with multiple forms of graph inputs (including graphs with and
without node-level features) and using different types of neural networks.Comment: 9 page
Mix-and-Match Tuning for Self-Supervised Semantic Segmentation
Deep convolutional networks for semantic image segmentation typically require
large-scale labeled data, e.g. ImageNet and MS COCO, for network pre-training.
To reduce annotation efforts, self-supervised semantic segmentation is recently
proposed to pre-train a network without any human-provided labels. The key of
this new form of learning is to design a proxy task (e.g. image colorization),
from which a discriminative loss can be formulated on unlabeled data. Many
proxy tasks, however, lack the critical supervision signals that could induce
discriminative representation for the target image segmentation task. Thus
self-supervision's performance is still far from that of supervised
pre-training. In this study, we overcome this limitation by incorporating a
"mix-and-match" (M&M) tuning stage in the self-supervision pipeline. The
proposed approach is readily pluggable to many self-supervision methods and
does not use more annotated samples than the original process. Yet, it is
capable of boosting the performance of target image segmentation task to
surpass fully-supervised pre-trained counterpart. The improvement is made
possible by better harnessing the limited pixel-wise annotations in the target
dataset. Specifically, we first introduce the "mix" stage, which sparsely
samples and mixes patches from the target set to reflect rich and diverse local
patch statistics of target images. A "match" stage then forms a class-wise
connected graph, which can be used to derive a strong triplet-based
discriminative loss for fine-tuning the network. Our paradigm follows the
standard practice in existing self-supervised studies and no extra data or
label is required. With the proposed M&M approach, for the first time, a
self-supervision method can achieve comparable or even better performance
compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset
and CityScapes dataset.Comment: To appear in AAAI 2018 as a spotlight paper. More details at the
project page: http://mmlab.ie.cuhk.edu.hk/projects/M%26M
Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization
Image clustering is one of the most important computer vision applications,
which has been extensively studied in literature. However, current clustering
methods mostly suffer from lack of efficiency and scalability when dealing with
large-scale and high-dimensional data. In this paper, we propose a new
clustering model, called DEeP Embedded RegularIzed ClusTering (DEPICT), which
efficiently maps data into a discriminative embedding subspace and precisely
predicts cluster assignments. DEPICT generally consists of a multinomial
logistic regression function stacked on top of a multi-layer convolutional
autoencoder. We define a clustering objective function using relative entropy
(KL divergence) minimization, regularized by a prior for the frequency of
cluster assignments. An alternating strategy is then derived to optimize the
objective by updating parameters and estimating cluster assignments.
Furthermore, we employ the reconstruction loss functions in our autoencoder, as
a data-dependent regularization term, to prevent the deep embedding function
from overfitting. In order to benefit from end-to-end optimization and
eliminate the necessity for layer-wise pretraining, we introduce a joint
learning framework to minimize the unified clustering and reconstruction loss
functions together and train all network layers simultaneously. Experimental
results indicate the superiority and faster running time of DEPICT in
real-world clustering tasks, where no labeled data is available for
hyper-parameter tuning
Deep graph learning for semi-supervised classification
Graph learning (GL) can dynamically capture the distribution structure (graph
structure) of data based on graph convolutional networks (GCN), and the
learning quality of the graph structure directly influences GCN for
semi-supervised classification. Existing methods mostly combine the
computational layer and the related losses into GCN for exploring the global
graph(measuring graph structure from all data samples) or local graph
(measuring graph structure from local data samples). Global graph emphasises on
the whole structure description of the inter-class data, while local graph
trend to the neighborhood structure representation of intra-class data.
However, it is difficult to simultaneously balance these graphs of the learning
process for semi-supervised classification because of the interdependence of
these graphs. To simulate the interdependence, deep graph learning(DGL) is
proposed to find the better graph representation for semi-supervised
classification. DGL can not only learn the global structure by the previous
layer metric computation updating, but also mine the local structure by next
layer local weight reassignment. Furthermore, DGL can fuse the different
structures by dynamically encoding the interdependence of these structures, and
deeply mine the relationship of the different structures by the hierarchical
progressive learning for improving the performance of semi-supervised
classification. Experiments demonstrate the DGL outperforms state-of-the-art
methods on three benchmark datasets (Citeseer,Cora, and Pubmed) for citation
networks and two benchmark datasets (MNIST and Cifar10) for images
Deep Convolutional Networks on Graph-Structured Data
Deep Learning's recent successes have mostly relied on Convolutional
Networks, which exploit fundamental statistical properties of images, sounds
and video data: the local stationarity and multi-scale compositional structure,
that allows expressing long range interactions in terms of shorter, localized
interactions. However, there exist other important examples, such as text
documents or bioinformatic data, that may lack some or all of these strong
statistical regularities.
In this paper we consider the general question of how to construct deep
architectures with small learning complexity on general non-Euclidean domains,
which are typically unknown and need to be estimated from the data. In
particular, we develop an extension of Spectral Networks which incorporates a
Graph Estimation procedure, that we test on large-scale classification
problems, matching or improving over Dropout Networks with far less parameters
to estimate
Co-salient Object Detection Based on Deep Saliency Networks and Seed Propagation over an Integrated Graph
This paper presents a co-salient object detection method to find common
salient regions in a set of images. We utilize deep saliency networks to
transfer co-saliency prior knowledge and better capture high-level semantic
information, and the resulting initial co-saliency maps are enhanced by seed
propagation steps over an integrated graph. The deep saliency networks are
trained in a supervised manner to avoid online weakly supervised learning and
exploit them not only to extract high-level features but also to produce both
intra- and inter-image saliency maps. Through a refinement step, the initial
co-saliency maps can uniformly highlight co-salient regions and locate accurate
object boundaries. To handle input image groups inconsistent in size, we
propose to pool multi-regional descriptors including both within-segment and
within-group information. In addition, the integrated multilayer graph is
constructed to find the regions that the previous steps may not detect by seed
propagation with low-level descriptors. In this work, we utilize the useful
complementary components of high-, low-level information, and several
learning-based steps. Our experiments have demonstrated that the proposed
approach outperforms comparable co-saliency detection methods on widely used
public databases and can also be directly applied to co-segmentation tasks.Comment: 13 pages, 10 figures, 3 table
- …