5 research outputs found
A Gradient Boosting Approach for Training Convolutional and Deep Neural Networks
Deep learning has revolutionized the computer vision and image classification
domains. In this context Convolutional Neural Networks (CNNs) based
architectures are the most widely applied models. In this article, we
introduced two procedures for training Convolutional Neural Networks (CNNs) and
Deep Neural Network based on Gradient Boosting (GB), namely GB-CNN and GB-DNN.
These models are trained to fit the gradient of the loss function or
pseudo-residuals of previous models. At each iteration, the proposed method
adds one dense layer to an exact copy of the previous deep NN model. The
weights of the dense layers trained on previous iterations are frozen to
prevent over-fitting, permitting the model to fit the new dense as well as to
fine-tune the convolutional layers (for GB-CNN) while still utilizing the
information already learned. Through extensive experimentation on different
2D-image classification and tabular datasets, the presented models show
superior performance in terms of classification accuracy with respect to
standard CNN and Deep-NN with the same architectures
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks
Convolutional neural networks (CNNs) have been shown to achieve optimal
approximation and estimation error rates (in minimax sense) in several function
classes. However, previous analyzed optimal CNNs are unrealistically wide and
difficult to obtain via optimization due to sparse constraints in important
function classes, including the H\"older class. We show a ResNet-type CNN can
attain the minimax optimal error rates in these classes in more plausible
situations -- it can be dense, and its width, channel size, and filter size are
constant with respect to sample size. The key idea is that we can replicate the
learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as
long as the FNNs have \textit{block-sparse} structures. Our theory is general
in a sense that we can automatically translate any approximation rate achieved
by block-sparse FNNs into that by CNNs. As an application, we derive
approximation and estimation error rates of the aformentioned type of CNNs for
the Barron and H\"older classes with the same strategy.Comment: 8 pages + References 2 pages + Supplemental material 18 page
Nonparametric Teaching for Multiple Learners
We study the problem of teaching multiple learners simultaneously in the
nonparametric iterative teaching setting, where the teacher iteratively
provides examples to the learner for accelerating the acquisition of a target
concept. This problem is motivated by the gap between current single-learner
teaching setting and the real-world scenario of human instruction where a
teacher typically imparts knowledge to multiple students. Under the new problem
formulation, we introduce a novel framework -- Multi-learner Nonparametric
Teaching (MINT). In MINT, the teacher aims to instruct multiple learners, with
each learner focusing on learning a scalar-valued target model. To achieve
this, we frame the problem as teaching a vector-valued target model and extend
the target model space from a scalar-valued reproducing kernel Hilbert space
used in single-learner scenarios to a vector-valued space. Furthermore, we
demonstrate that MINT offers significant teaching speed-up over repeated
single-learner teaching, particularly when the multiple learners can
communicate with each other. Lastly, we conduct extensive experiments to
validate the practicality and efficiency of MINT.Comment: NeurIPS 2023 (31 pages, 20 figures