144,466 research outputs found
Approximation results for Gradient Descent trained Shallow Neural Networks in
Two aspects of neural networks that have been extensively studied in the
recent literature are their function approximation properties and their
training by gradient descent methods. The approximation problem seeks accurate
approximations with a minimal number of weights. In most of the current
literature these weights are fully or partially hand-crafted, showing the
capabilities of neural networks but not necessarily their practical
performance. In contrast, optimization theory for neural networks heavily
relies on an abundance of weights in over-parametrized regimes.
This paper balances these two demands and provides an approximation result
for shallow networks in with non-convex weight optimization by gradient
descent. We consider finite width networks and infinite sample limits, which is
the typical setup in approximation theory. Technically, this problem is not
over-parametrized, however, some form of redundancy reappears as a loss in
approximation rate compared to best possible rates
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks
Convolutional neural networks (CNNs) have been shown to achieve optimal
approximation and estimation error rates (in minimax sense) in several function
classes. However, previous analyzed optimal CNNs are unrealistically wide and
difficult to obtain via optimization due to sparse constraints in important
function classes, including the H\"older class. We show a ResNet-type CNN can
attain the minimax optimal error rates in these classes in more plausible
situations -- it can be dense, and its width, channel size, and filter size are
constant with respect to sample size. The key idea is that we can replicate the
learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as
long as the FNNs have \textit{block-sparse} structures. Our theory is general
in a sense that we can automatically translate any approximation rate achieved
by block-sparse FNNs into that by CNNs. As an application, we derive
approximation and estimation error rates of the aformentioned type of CNNs for
the Barron and H\"older classes with the same strategy.Comment: 8 pages + References 2 pages + Supplemental material 18 page
On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks
Large-scale deep neural networks are both memory intensive and
computation-intensive, thereby posing stringent requirements on the computing
platforms. Hardware accelerations of deep neural networks have been extensively
investigated in both industry and academia. Specific forms of binary neural
networks (BNNs) and stochastic computing based neural networks (SCNNs) are
particularly appealing to hardware implementations since they can be
implemented almost entirely with binary operations. Despite the obvious
advantages in hardware implementation, these approximate computing techniques
are questioned by researchers in terms of accuracy and universal applicability.
Also it is important to understand the relative pros and cons of SCNNs and BNNs
in theory and in actual hardware implementations. In order to address these
concerns, in this paper we prove that the "ideal" SCNNs and BNNs satisfy the
universal approximation property with probability 1 (due to the stochastic
behavior). The proof is conducted by first proving the property for SCNNs from
the strong law of large numbers, and then using SCNNs as a "bridge" to prove
for BNNs. Based on the universal approximation property, we further prove that
SCNNs and BNNs exhibit the same energy complexity. In other words, they have
the same asymptotic energy consumption with the growing of network size. We
also provide a detailed analysis of the pros and cons of SCNNs and BNNs for
hardware implementations and conclude that SCNNs are more suitable for
hardware.Comment: 9 pages, 3 figure
A Comprehensive Survey on Functional Approximation
The theory of functional approximation has numerous applications in sciences and industry. This thesis focuses on the possible approaches to approximate a continuous function on a compact subset of R2 using a variety of constructions. The results are presented from the following four general topics: polynomials, Fourier series, wavelets, and neural networks. Approximation with polynomials on subsets of R leads to the discussion of the Stone-Weierstrass theorem. Convergence of Fourier series is characterized on the unit circle. Wavelets are introduced following the Fourier transform, and their construction as well as ability to approximate functions in L2(R) is discussed. At the end, the universal approximation theorem for artificial neural networks is presented, and the function representation and approximation with single- and multilayer neural networks on R2 is constructed
- …