4,976 research outputs found
Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems
A new class of non-homogeneous state-affine systems is introduced for use in
reservoir computing. Sufficient conditions are identified that guarantee first,
that the associated reservoir computers with linear readouts are causal,
time-invariant, and satisfy the fading memory property and second, that a
subset of this class is universal in the category of fading memory filters with
stochastic almost surely uniformly bounded inputs. This means that any
discrete-time filter that satisfies the fading memory property with random
inputs of that type can be uniformly approximated by elements in the
non-homogeneous state-affine family.Comment: 41 page
A Transfer Principle: Universal Approximators Between Metric Spaces From Euclidean Universal Approximators
We build universal approximators of continuous maps between arbitrary Polish
metric spaces and using universal approximators
between Euclidean spaces as building blocks. Earlier results assume that the
output space is a topological vector space. We overcome this
limitation by "randomization": our approximators output discrete probability
measures over . When and are Polish
without additional structure, we prove very general qualitative guarantees;
when they have suitable combinatorial structure, we prove quantitative
guarantees for H\"older-like maps, including maps between finite graphs,
solution operators to rough differential equations between certain Carnot
groups, and continuous non-linear operators between Banach spaces arising in
inverse problems. In particular, we show that the required number of Dirac
measures is determined by the combinatorial structure of and
. For barycentric , including Banach spaces,
-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric
spaces, our approximators reduce to -valued functions. When the
Euclidean approximators are neural networks, our constructions generalize
transformer networks, providing a new probabilistic viewpoint of geometric deep
learning.Comment: 14 Figures, 3 Tables, 78 Pages (Main 40, Proofs 26, Acknowledgments
and References 12
The Universal Approximation Property
The universal approximation property of various machine learning models is
currently only understood on a case-by-case basis, limiting the rapid
development of new theoretically justified neural network architectures and
blurring our understanding of our current models' potential. This paper works
towards overcoming these challenges by presenting a characterization, a
representation, a construction method, and an existence result, each of which
applies to any universal approximator on most function spaces of practical
interest. Our characterization result is used to describe which activation
functions allow the feed-forward architecture to maintain its universal
approximation capabilities when multiple constraints are imposed on its final
layers and its remaining layers are only sparsely connected. These include a
rescaled and shifted Leaky ReLU activation function but not the ReLU activation
function. Our construction and representation result is used to exhibit a
simple modification of the feed-forward architecture, which can approximate any
continuous function with non-pathological growth, uniformly on the entire
Euclidean input space. This improves the known capabilities of the feed-forward
architecture
Universal Regular Conditional Distributions
We introduce a general framework for approximating regular conditional
distributions (RCDs). Our approximations of these RCDs are implemented by a new
class of geometric deep learning models with inputs in and
outputs in the Wasserstein- space . We find
that the models built using our framework can approximate any continuous
functions from to uniformly on
compacts, and quantitative rates are obtained. We identify two methods for
avoiding the "curse of dimensionality"; i.e.: the number of parameters
determining the approximating neural network depends only polynomially on the
involved dimension and the approximation error. The first solution describes
functions in which can be
efficiently approximated on any compact subset of . Conversely,
the second approach describes sets in , on which any function in
can be efficiently approximated.
Our framework is used to obtain an affirmative answer to the open conjecture of
Bishop (1994); namely: mixture density networks are universal regular
conditional distributions. The predictive performance of the proposed models is
evaluated against comparable learning models on various probabilistic
predictions tasks in the context of ELMs, model uncertainty, and
heteroscedastic regression. All the results are obtained for more general input
and output spaces and thus apply to geometric deep learning contexts.Comment: Keywords: Universal Regular Conditional Distributions, Geometric Deep
Learning, Measure-Valued Neural Networks, Conditional Expectation,
Uncertainty Quantification. Additional Information: 27 Pages + 22 Page
Appendix, 7 Table
- …