47 research outputs found
Central Limit Theorem with Exchangeable Summands and Mixtures of Stable Laws as Limits
The problem of convergence in law of normed sums of exchangeable random
variables is examined. First, the problem is studied w.r.t. arrays of
exchangeable random variables, and the special role played by mixtures of
products of stable laws - as limits in law of normed sums in different rows of
the array - is emphasized. Necessary and sufficient conditions for convergence
to a specific form in the above class of measures are then given. Moreover,
sufficient conditions for convergence of sums in a single row are proved.
Finally, a potentially useful variant of the formulation of the results just
summarized is briefly sketched, a more complete study of it being deferred to a
future work
Exchangeability, prediction and predictive modeling in Bayesian statistics
Prediction is a central problem in Statistics, and there is currently a
renewed interest for the so-called predictive approach in Bayesian statistics.
What is the latter about? One has to return on foundational concepts, which we
do in this paper, moving from the role of exchangeability and reviewing forms
of partial exchangeability for more structured data, with the aim of discussing
their use and implications in Bayesian statistics. There we show the underlying
concept that, in Bayesian statistics, a predictive rule is meant as a learning
rule - how one conveys past information to information on future events. This
concept has implications on the use of exchangeability and generally invests
all statistical problems, also in inference. It applies to classic contexts and
to less explored situations, such as the use of predictive algorithms that can
be read as Bayesian learning rules. The paper offers a historical overview, but
also includes a few new results, presents some recent developments and poses
some open questions
Non-asymptotic approximations of Gaussian neural networks via second-order Poincar\'e inequalities
There is a growing interest on large-width asymptotic properties of Gaussian
neural networks (NNs), namely NNs whose weights are initialized according to
Gaussian distributions. A well-established result is that, as the width goes to
infinity, a Gaussian NN converges in distribution to a Gaussian stochastic
process, which provides an asymptotic or qualitative Gaussian approximation of
the NN. In this paper, we introduce some non-asymptotic or quantitative
Gaussian approximations of Gaussian NNs, quantifying the approximation error
with respect to some popular distances for (probability) distributions, e.g.
the -Wasserstein distance, the total variation distance and the
Kolmogorov-Smirnov distance. Our results rely on the use of second-order
Gaussian Poincar\'e inequalities, which provide tight estimates of the
approximation error, with optimal rates. This is a novel application of
second-order Gaussian Poincar\'e inequalities, which are well-known in the
probabilistic literature for being a powerful tool to obtain Gaussian
approximations of general functionals of Gaussian stochastic processes. A
generalization of our results to deep Gaussian NNs is discussed.Comment: 20 pages, 3 figure
Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions
There is a growing literature on the study of large-width properties of deep
Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed
parameters or weights, and Gaussian stochastic processes. Motivated by some
empirical and theoretical studies showing the potential of replacing Gaussian
distributions with Stable distributions, namely distributions with heavy tails,
in this paper we investigate large-width properties of deep Stable NNs, i.e.
deep NNs with Stable-distributed parameters. For sub-linear activation
functions, a recent work has characterized the infinitely wide limit of a
suitable rescaled deep Stable NN in terms of a Stable stochastic process, both
under the assumption of a ``joint growth" and under the assumption of a
``sequential growth" of the width over the NN's layers. Here, assuming a
``sequential growth" of the width, we extend such a characterization to a
general class of activation functions, which includes sub-linear,
asymptotically linear and super-linear functions. As a novelty with respect to
previous works, our results rely on the use of a generalized central limit
theorem for heavy tails distributions, which allows for an interesting unified
treatment of infinitely wide limits for deep Stable NNs. Our study shows that
the scaling of Stable NNs and the stability of their infinitely wide limits may
depend on the choice of the activation function, bringing out a critical
difference with respect to the Gaussian setting.Comment: 20 pages, 2 figure
Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions
There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting
Stable behaviour of infinitely wide deep neural networks
We consider fully connected feed-forward deep neural networks (NNs) where
weights and biases are independent and identically distributed as symmetric
centered stable distributions. Then, we show that the infinite wide limit of
the NN, under suitable scaling on the weights, is a stochastic process whose
finite-dimensional distributions are multivariate stable distributions. The
limiting process is referred to as the stable process, and it generalizes the
class of Gaussian processes recently obtained as infinite wide limits of NNs
(Matthews at al., 2018b). Parameters of the stable process can be computed via
an explicit recursion over the layers of the network. Our result contributes to
the theory of fully connected feed-forward deep NNs, and it paves the way to
expand recent lines of research that rely on Gaussian infinite wide limits.Comment: 25 pages, 3 figure
Deep Stable neural networks: large-width asymptotics and convergence rates
In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotic
properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and Gaussian stochastic processes (SPs). Motivated by empirical analyses that show the potential of replacing Gaussian
distributions with Stable distributions for the NN’s weights, in this paper we present a rigorous analysis of the
large-width asymptotic behaviour of (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stabledistributed
weights. We show that as the width goes to infinity jointly over the NN’s layers, i.e. the “joint growth”
setting, a rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively
through the NN’s layers. Because of the non-triangular structure of the NN, this is a non-standard asymptotic
problem, to which we propose an inductive approach of independent interest. Then, we establish sup-norm convergence rates of the rescaled deep Stable NN to the Stable SP, under the “joint growth” and a “sequential growth” of the width over the NN’s layers. Such a result provides the difference between the “joint growth” and the “sequential growth” settings, showing that the former leads to a slower rate than the latter, depending on the depth of the layer and the number of inputs of the NN. Our work extends some recent results on infinitely wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates in the “joint growth” setting
Large-width functional asymptotics for deep Gaussian neural networks
In this paper, we consider fully connected feed-forward deep neural networks
where weights and biases are independent and identically distributed according
to Gaussian distributions. Extending previous results (Matthews et al.,
2018a;b; Yang, 2019) we adopt a function-space perspective, i.e. we look at
neural networks as infinite-dimensional random elements on the input space
. Under suitable assumptions on the activation function we show
that: i) a network defines a continuous Gaussian process on the input space
; ii) a network with re-scaled weights converges weakly to a
continuous Gaussian process in the large-width limit; iii) the limiting
Gaussian process has almost surely locally -H\"older continuous paths,
for . Our results contribute to recent theoretical studies on
the interplay between infinitely wide deep neural networks and Gaussian
processes by establishing weak convergence in function-space with respect to a
stronger metric
Predictive Constructions Based on Measure-Valued Pólya Urn Processes
Measure-valued Pólya urn processes (MVPP) are Markov chains with an additive structure
that serve as an extension of the generalized k-color Pólya urn model towards a continuum of pos-
sible colors. We prove that, for any MVPP on a Polish space , the normalized sequence
agrees with the marginal predictive distributions of some random process .
Moreover, , where is a random transition kernel on ; thus, if
represents the contents of an urn, then X n denotes the color of the ball drawn with distribution
and - the subsequent reinforcement. In the case , for some
non-negative random weights ... , the process is better understood as a randomly reinforced extension of Blackwell and MacQueen’s Pólya sequence. We study the asymptotic properties
of the predictive distributions and the empirical frequencies of under different assumptions
on the weights. We also investigate a generalization of the above models via a randomization of the
law of the reinforcement