38 research outputs found
Uniform estimation in stochastic block models is slow
We explicitly quantify the empirically observed phenomenon that estimation under a stochastic block model (SBM) is hard if the model contains classes that are similar. More precisely, we consider estimation of certain functionals of random graphs generated by a SBM. The SBM may or may not be sparse, and the number of classes may be fixed or grow with the number of vertices. Minimax lower and upper bounds of estimation along specific submodels are derived. The results are nonasymptotic and imply that uniform estimation of a single connectivity parameter is much slower than the expected asymptotic pointwise rate. Specifically, the uniform quadratic rate does not scale as the number of edges, but only as the number of vertices. The lower bounds are local around any possible SBM. An analogous result is derived for functionals of a class of smooth graphons
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Modern neural networks are highly overparameterized, with capacity to
substantially overfit to training data. Nevertheless, these networks often
generalize well in practice. It has also been observed that trained networks
can often be "compressed" to much smaller representations. The purpose of this
paper is to connect these two empirical observations. Our main technical result
is a generalization bound for compressed networks based on the compressed size.
Combined with off-the-shelf compression algorithms, the bound leads to state of
the art generalization guarantees; in particular, we provide the first
non-vacuous generalization guarantees for realistic architectures applied to
the ImageNet classification problem. As additional evidence connecting
compression and generalization, we show that compressibility of models that
tend to overfit is limited: We establish an absolute limit on expected
compressibility as a function of expected generalization error, where the
expectations are over the random choice of training examples. The bounds are
complemented by empirical results that show an increase in overfitting implies
an increase in the number of bits required to describe a trained network.Comment: 16 pages, 1 figure. Accepted at ICLR 201
Representing and Learning Functions Invariant Under Crystallographic Groups
Crystallographic groups describe the symmetries of crystals and other
repetitive structures encountered in nature and the sciences. These groups
include the wallpaper and space groups. We derive linear and nonlinear
representations of functions that are (1) smooth and (2) invariant under such a
group. The linear representation generalizes the Fourier basis to
crystallographically invariant basis functions. We show that such a basis
exists for each crystallographic group, that it is orthonormal in the relevant
space, and recover the standard Fourier basis as a special case for pure
shift groups. The nonlinear representation embeds the orbit space of the group
into a finite-dimensional Euclidean space. We show that such an embedding
exists for every crystallographic group, and that it factors functions through
a generalization of a manifold called an orbifold. We describe algorithms that,
given a standardized description of the group, compute the Fourier basis and an
embedding map. As examples, we construct crystallographically invariant neural
networks, kernel machines, and Gaussian processes
Borel liftings of graph limits
The cut pseudo-metric on the space of graph limits induces an equivalence relation. The quotient space obtained by collapsing each equivalence class to a point is a metric space with appealing analytic properties. We show the equivalence relation admits a Borel lifting: There exists a Borel-measurable mapping that maps each equivalence class to one of its elements. The result yields a general framework for proving measurability properties on the space of graph limits. We give several examples, including Borel-measurability of the set of isomorphism classes of random-free graphons. © 2016, University of Washington. All rights reserved
Location Dependent Dirichlet Processes
Dirichlet processes (DP) are widely applied in Bayesian nonparametric
modeling. However, in their basic form they do not directly integrate
dependency information among data arising from space and time. In this paper,
we propose location dependent Dirichlet processes (LDDP) which incorporate
nonparametric Gaussian processes in the DP modeling framework to model such
dependencies. We develop the LDDP in the context of mixture modeling, and
develop a mean field variational inference algorithm for this mixture model.
The effectiveness of the proposed modeling framework is shown on an image
segmentation task
Probabilistic machine learning and artificial intelligence.
How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.The author acknowledges an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research.This is the author accepted manuscript. The final version is available from NPG at http://www.nature.com/nature/journal/v521/n7553/full/nature14541.html#abstract
Construction of nonparametric Bayesian models from parametric Bayes equations
We consider the general problem of constructing nonparametric Bayesian models on infinite-dimensional random objects, such as functions, infinite graphs or infinite permutations. The problem has generated much interest in machine learning, where it is treated heuristically, but has not been studied in full generality in non-parametric Bayesian statistics, which tends to focus on models over probability distributions. Our approach applies a standard tool of stochastic process theory, the construction of stochastic processes from their finite-dimensional marginal distributions. The main contribution of the paper is a generalization of the classic Kolmogorov extension theorem to conditional probabilities. This extension allows a rigorous construction of nonparametric Bayesian models from systems of finite-dimensional, parametric Bayes equations. Using this approach, we show (i) how existence of a conjugate posterior for the nonparametric model can be guaranteed by choosing conjugate finite-dimensional models in the construction, (ii) how the mapping to the posterior parameters of the nonparametric model can be explicitly determined, and (iii) that the construction of conjugate models in essence requires the finite-dimensional models to be in the exponential family. As an application of our constructive framework, we derive a model on infinite permutations, the nonparametric Bayesian analogue of a model recently proposed for the analysis of rank data
Projective Limit Random Probabilities on Polish Spaces
A pivotal problem in Bayesian nonparametrics is the construction of prior distributions on the space M(V) of probability measures on a given domain V. In principle, such distributions on the infinite-dimensional space M(V) can be constructed from their finite-dimensional marginals---the most prominent example being the construction of the Dirichlet process from finite-dimensional Dirichlet distributions. This approach is both intuitive and applicable to the construction of arbitrary distributions on M(V), but also hamstrung by a number of technical difficulties. We show how these difficulties can be resolved if the domain V is a Polish topological space, and give a representation theorem directly applicable to the construction of any probability distribution on M(V) whose first moment measure is well-defined. The proof draws on a projective limit theorem of Bochner, and on properties of set functions on Polish spaces to establish countable additivity of the resulting random probabilities