4,425 research outputs found
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines
Restricted Boltzmann machines (RBMs) are energy-based neural-networks which
are commonly used as the building blocks for deep architectures neural
architectures. In this work, we derive a deterministic framework for the
training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer
(TAP) mean-field approximation of widely-connected systems with weak
interactions coming from spin-glass theory. While the TAP approach has been
extensively studied for fully-visible binary spin systems, our construction is
generalized to latent-variable models, as well as to arbitrarily distributed
real-valued spin systems with bounded support. In our numerical experiments, we
demonstrate the effective deterministic training of our proposed models and are
able to show interesting features of unsupervised learning which could not be
directly observed with sampling. Additionally, we demonstrate how to utilize
our TAP-based framework for leveraging trained RBMs as joint priors in
denoising problems
Cycle-based Cluster Variational Method for Direct and Inverse Inference
We elaborate on the idea that loop corrections to belief propagation could be
dealt with in a systematic way on pairwise Markov random fields, by using the
elements of a cycle basis to define region in a generalized belief propagation
setting. The region graph is specified in such a way as to avoid dual loops as
much as possible, by discarding redundant Lagrange multipliers, in order to
facilitate the convergence, while avoiding instabilities associated to minimal
factor graph construction. We end up with a two-level algorithm, where a belief
propagation algorithm is run alternatively at the level of each cycle and at
the inter-region level. The inverse problem of finding the couplings of a
Markov random field from empirical covariances can be addressed region wise. It
turns out that this can be done efficiently in particular in the Ising context,
where fixed point equations can be derived along with a one-parameter log
likelihood function to minimize. Numerical experiments confirm the
effectiveness of these considerations both for the direct and inverse MRF
inference.Comment: 47 pages, 16 figure
Belief Consensus Algorithms for Fast Distributed Target Tracking in Wireless Sensor Networks
In distributed target tracking for wireless sensor networks, agreement on the
target state can be achieved by the construction and maintenance of a
communication path, in order to exchange information regarding local likelihood
functions. Such an approach lacks robustness to failures and is not easily
applicable to ad-hoc networks. To address this, several methods have been
proposed that allow agreement on the global likelihood through fully
distributed belief consensus (BC) algorithms, operating on local likelihoods in
distributed particle filtering (DPF). However, a unified comparison of the
convergence speed and communication cost has not been performed. In this paper,
we provide such a comparison and propose a novel BC algorithm based on belief
propagation (BP). According to our study, DPF based on metropolis belief
consensus (MBC) is the fastest in loopy graphs, while DPF based on BP consensus
is the fastest in tree graphs. Moreover, we found that BC-based DPF methods
have lower communication overhead than data flooding when the network is
sufficiently sparse
Kernel Belief Propagation
We propose a nonparametric generalization of belief propagation, Kernel
Belief Propagation (KBP), for pairwise Markov random fields. Messages are
represented as functions in a reproducing kernel Hilbert space (RKHS), and
message updates are simple linear operations in the RKHS. KBP makes none of the
assumptions commonly required in classical BP algorithms: the variables need
not arise from a finite domain or a Gaussian distribution, nor must their
relations take any particular parametric form. Rather, the relations between
variables are represented implicitly, and are learned nonparametrically from
training data. KBP has the advantage that it may be used on any domain where
kernels are defined (Rd, strings, groups), even where explicit parametric
models are not known, or closed form expressions for the BP updates do not
exist. The computational cost of message updates in KBP is polynomial in the
training data size. We also propose a constant time approximate message update
procedure by representing messages using a small number of basis functions. In
experiments, we apply KBP to image denoising, depth prediction from still
images, and protein configuration prediction: KBP is faster than competing
classical and nonparametric approaches (by orders of magnitude, in some cases),
while providing significantly more accurate results
- …