4,499 research outputs found
Scalable Data Augmentation for Deep Learning
Scalable Data Augmentation (SDA) provides a framework for training deep
learning models using auxiliary hidden layers. Scalable MCMC is available for
network training and inference. SDA provides a number of computational
advantages over traditional algorithms, such as avoiding backtracking, local
modes and can perform optimization with stochastic gradient descent (SGD) in
TensorFlow. Standard deep neural networks with logit, ReLU and SVM activation
functions are straightforward to implement. To illustrate our architectures and
methodology, we use P\'{o}lya-Gamma logit data augmentation for a number of
standard datasets. Finally, we conclude with directions for future research
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks
Convolutional neural networks (CNNs) have been shown to achieve optimal
approximation and estimation error rates (in minimax sense) in several function
classes. However, previous analyzed optimal CNNs are unrealistically wide and
difficult to obtain via optimization due to sparse constraints in important
function classes, including the H\"older class. We show a ResNet-type CNN can
attain the minimax optimal error rates in these classes in more plausible
situations -- it can be dense, and its width, channel size, and filter size are
constant with respect to sample size. The key idea is that we can replicate the
learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as
long as the FNNs have \textit{block-sparse} structures. Our theory is general
in a sense that we can automatically translate any approximation rate achieved
by block-sparse FNNs into that by CNNs. As an application, we derive
approximation and estimation error rates of the aformentioned type of CNNs for
the Barron and H\"older classes with the same strategy.Comment: 8 pages + References 2 pages + Supplemental material 18 page
- …