5,835 research outputs found
The Deep Weight Prior
Bayesian inference is known to provide a general framework for incorporating
prior knowledge or specific properties into machine learning models via
carefully choosing a prior distribution. In this work, we propose a new type of
prior distributions for convolutional neural networks, deep weight prior (DWP),
that exploit generative models to encourage a specific structure of trained
convolutional filters e.g., spatial correlations of weights. We define DWP in
the form of an implicit distribution and propose a method for variational
inference with such type of implicit priors. In experiments, we show that DWP
improves the performance of Bayesian neural networks when training data are
limited, and initialization of weights with samples from DWP accelerates
training of conventional convolutional neural networks.Comment: TL;DR: The deep weight prior learns a generative model for kernels of
convolutional neural networks, that acts as a prior distribution while
training on new dataset
BayesNAS: A Bayesian Approach for Neural Architecture Search
One-Shot Neural Architecture Search (NAS) is a promising method to
significantly reduce search time without any separate training. It can be
treated as a Network Compression problem on the architecture parameters from an
over-parameterized network. However, there are two issues associated with most
one-shot NAS methods. First, dependencies between a node and its predecessors
and successors are often disregarded which result in improper treatment over
zero operations. Second, architecture parameters pruning based on their
magnitude is questionable. In this paper, we employ the classic Bayesian
learning approach to alleviate these two issues by modeling architecture
parameters using hierarchical automatic relevance determination (HARD) priors.
Unlike other NAS methods, we train the over-parameterized network for only one
epoch then update the architecture. Impressively, this enabled us to find the
architecture on CIFAR-10 within only 0.2 GPU days using a single GPU.
Competitive performance can be also achieved by transferring to ImageNet. As a
byproduct, our approach can be applied directly to compress convolutional
neural networks by enforcing structural sparsity which achieves extremely
sparse networks without accuracy deterioration.Comment: International Conference on Machine Learning 201
Context-Dependent Diffusion Network for Visual Relationship Detection
Visual relationship detection can bridge the gap between computer vision and
natural language for scene understanding of images. Different from pure object
recognition tasks, the relation triplets of subject-predicate-object lie on an
extreme diversity space, such as \textit{person-behind-person} and
\textit{car-behind-building}, while suffering from the problem of combinatorial
explosion. In this paper, we propose a context-dependent diffusion network
(CDDN) framework to deal with visual relationship detection. To capture the
interactions of different object instances, two types of graphs, word semantic
graph and visual scene graph, are constructed to encode global context
interdependency. The semantic graph is built through language priors to model
semantic correlations across objects, whilst the visual scene graph defines the
connections of scene objects so as to utilize the surrounding scene
information. For the graph-structured data, we design a diffusion network to
adaptively aggregate information from contexts, which can effectively learn
latent representations of visual relationships and well cater to visual
relationship detection in view of its isomorphic invariance to graphs.
Experiments on two widely-used datasets demonstrate that our proposed method is
more effective and achieves the state-of-the-art performance.Comment: 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18
Bayesian Compression for Deep Learning
Compression and computational efficiency in deep learning have become a
problem of great significance. In this work, we argue that the most principled
and effective way to attack this problem is by adopting a Bayesian point of
view, where through sparsity inducing priors we prune large parts of the
network. We introduce two novelties in this paper: 1) we use hierarchical
priors to prune nodes instead of individual weights, and 2) we use the
posterior uncertainties to determine the optimal fixed point precision to
encode the weights. Both factors significantly contribute to achieving the
state of the art in terms of compression rates, while still staying competitive
with methods designed to optimize for speed or energy efficiency.Comment: Published as a conference paper at NIPS 201
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
- …