1,422 research outputs found
Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields
The visual world is vast and varied, but its variations divide into
structured and unstructured factors. We compose free-form filters and
structured Gaussian filters, optimized end-to-end, to factorize deep
representations and learn both local features and their degree of locality. Our
semi-structured composition is strictly more expressive than free-form
filtering, and changes in its structured parameters would require changes in
free-form architecture. In effect this optimizes over receptive field size and
shape, tuning locality to the data and task. Dynamic inference, in which the
Gaussian structure varies with the input, adapts receptive field size to
compensate for local scale variation. Optimizing receptive field size improves
semantic segmentation accuracy on Cityscapes by 1-2 points for strong dilated
and skip architectures and by up to 10 points for suboptimal designs. Adapting
receptive fields by dynamic Gaussian structure further improves results,
equaling the accuracy of free-form deformation while improving efficiency
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
We develop a general duality between neural networks and compositional
kernels, striving towards a better understanding of deep learning. We show that
initial representations generated by common random initializations are
sufficiently rich to express all functions in the dual kernel space. Hence,
though the training objective is hard to optimize in the worst case, the
initial weights form a good starting point for optimization. Our dual view also
reveals a pragmatic and aesthetic perspective of neural networks and
underscores their expressive power
Improving Efficiency in Convolutional Neural Network with Multilinear Filters
The excellent performance of deep neural networks has enabled us to solve
several automatization problems, opening an era of autonomous devices. However,
current deep net architectures are heavy with millions of parameters and
require billions of floating point operations. Several works have been
developed to compress a pre-trained deep network to reduce memory footprint
and, possibly, computation. Instead of compressing a pre-trained network, in
this work, we propose a generic neural network layer structure employing
multilinear projection as the primary feature extractor. The proposed
architecture requires several times less memory as compared to the traditional
Convolutional Neural Networks (CNN), while inherits the similar design
principles of a CNN. In addition, the proposed architecture is equipped with
two computation schemes that enable computation reduction or scalability.
Experimental results show the effectiveness of our compact projection that
outperforms traditional CNN, while requiring far fewer parameters.Comment: 10 pages, 3 figure
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
There is a previously identified equivalence between wide fully connected
neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables,
for instance, test set predictions that would have resulted from a fully
Bayesian, infinitely wide trained FCN to be computed without ever instantiating
the FCN, but by instead evaluating the corresponding GP. In this work, we
derive an analogous equivalence for multi-layer convolutional neural networks
(CNNs) both with and without pooling layers, and achieve state of the art
results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte
Carlo method to estimate the GP corresponding to a given neural network
architecture, even in cases where the analytic form has too many terms to be
computationally feasible.
Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs
with and without weight sharing are identical. As a consequence, translation
equivariance, beneficial in finite channel CNNs trained with stochastic
gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment
of the infinite channel limit - a qualitative difference between the two
regimes that is not present in the FCN case. We confirm experimentally, that
while in some scenarios the performance of SGD-trained finite CNNs approaches
that of the corresponding GPs as the channel count increases, with careful
tuning SGD-trained CNNs can significantly outperform their corresponding GPs,
suggesting advantages from SGD training compared to fully Bayesian parameter
estimation.Comment: Published as a conference paper at ICLR 201
A Gaussian Process perspective on Convolutional Neural Networks
In this paper we cast the well-known convolutional neural network in a
Gaussian process perspective. In this way we hope to gain additional insights
into the performance of convolutional networks, in particular understand under
what circumstances they tend to perform well and what assumptions are
implicitly made in the network. While for fully-connected networks the
properties of convergence to Gaussian processes have been studied extensively,
little is known about situations in which the output from a convolutional
network approaches a multivariate normal distribution
Mellin-Meijer-kernel density estimation on
Nonparametric kernel density estimation is a very natural procedure which
simply makes use of the smoothing power of the convolution operation. Yet, it
performs poorly when the density of a positive variable is to be estimated
(boundary issues, spurious bumps in the tail). So various extensions of the
basic kernel estimator allegedly suitable for -supported
densities, such as those using Gamma or other asymmetric kernels, abound in the
literature. Those, however, are not based on any valid smoothing operation
analogous to the convolution, which typically leads to inconsistencies. By
contrast, in this paper a kernel estimator for -supported
densities is defined by making use of the Mellin convolution, the natural
analogue of the usual convolution on . From there, a very
transparent theory flows and leads to new type of asymmetric kernels strongly
related to Meijer's -functions. The numerous pleasant properties of this
`Mellin-Meijer-kernel density estimator' are demonstrated in the paper. Its
pointwise and -consistency (with optimal rate of convergence) is
established for a large class of densities, including densities unbounded at 0
and showing power-law decay in their right tail. Its practical behaviour is
investigated further through simulations and some real data analyses
Feature Weight Tuning for Recursive Neural Networks
This paper addresses how a recursive neural network model can automatically
leave out useless information and emphasize important evidence, in other words,
to perform "weight tuning" for higher-level representation acquisition. We
propose two models, Weighted Neural Network (WNN) and Binary-Expectation Neural
Network (BENN), which automatically control how much one specific unit
contributes to the higher-level representation. The proposed model can be
viewed as incorporating a more powerful compositional function for embedding
acquisition in recursive neural networks. Experimental results demonstrate the
significant improvement over standard neural models
Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on the
choice of kernel, and this choice remains a dark art. We present the Neural
Kernel Network (NKN), a flexible family of kernels represented by a neural
network. The NKN architecture is based on the composition rules for kernels, so
that each unit of the network corresponds to a valid kernel. It can compactly
approximate compositional kernel structures such as those used by the Automatic
Statistician (Lloyd et al., 2014), but because the architecture is
differentiable, it is end-to-end trainable with gradient-based optimization. We
show that the NKN is universal for the class of stationary kernels. Empirically
we demonstrate pattern discovery and extrapolation abilities of NKN on several
tasks that depend crucially on identifying the underlying structure, including
time series and texture extrapolation, as well as Bayesian optimization.Comment: ICML 2018; update proo
Towards Interpretable R-CNN by Unfolding Latent Structures
This paper first proposes a method of formulating model interpretability in
visual understanding tasks based on the idea of unfolding latent structures. It
then presents a case study in object detection using popular two-stage
region-based convolutional network (i.e., R-CNN) detection systems. We focus on
weakly-supervised extractive rationale generation, that is learning to unfold
latent discriminative part configurations of object instances automatically and
simultaneously in detection without using any supervision for part
configurations. We utilize a top-down hierarchical and compositional grammar
model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold
the space of latent part configurations of regions of interest (RoIs). We
propose an AOGParsing operator to substitute the RoIPooling operator widely
used in R-CNN. In detection, a bounding box is interpreted by the best parse
tree derived from the AOG on-the-fly, which is treated as the qualitatively
extractive rationale generated for interpreting detection. We propose a
folding-unfolding method to train the AOG and convolutional networks
end-to-end. In experiments, we build on R-FCN and test our method on the PASCAL
VOC 2007 and 2012 datasets. We show that the method can unfold promising latent
structures without hurting the performance.Comment: 9 page
Monotone Increment Processes, Classical Markov Processes, and Loewner Chains
We prove one-to-one correspondences between certain decreasing Loewner chains
in the upper half-plane, a special class of real-valued Markov processes, and
quantum stochastic processes with monotonically independent additive
increments. This leads us to a detailed investigation of probability measures
on with univalent Cauchy transform. We discuss several subclasses
of such measures and obtain characterizations in terms of analytic and
geometric properties of the corresponding Cauchy transforms.
Furthermore, we obtain analogous results for the setting of decreasing
Loewner chains in the unit disk, which correspond to quantum stochastic
processes of unitary operators with monotonically independent multiplicative
increments
- …