33,744 research outputs found
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
The bias-variance tradeoff tells us that as model complexity increases, bias
falls and variances increases, leading to a U-shaped test error curve. However,
recent empirical results with over-parameterized neural networks are marked by
a striking absence of the classic U-shaped test error curve: test error keeps
decreasing in wider networks. This suggests that there might not be a
bias-variance tradeoff in neural networks with respect to network width, unlike
was originally claimed by, e.g., Geman et al. (1992). Motivated by the shaky
evidence used to support this claim in neural networks, we measure bias and
variance in the modern setting. We find that both bias and variance can
decrease as the number of parameters grows. To better understand this, we
introduce a new decomposition of the variance to disentangle the effects of
optimization and data sampling. We also provide theoretical analysis in a
simplified setting that is consistent with our empirical findings
Sharp analysis of low-rank kernel matrix approximations
We consider supervised learning problems within the positive-definite kernel
framework, such as kernel ridge regression, kernel logistic regression or the
support vector machine. With kernels leading to infinite-dimensional feature
spaces, a common practical limiting difficulty is the necessity of computing
the kernel matrix, which most frequently leads to algorithms with running time
at least quadratic in the number of observations n, i.e., O(n^2). Low-rank
approximations of the kernel matrix are often considered as they allow the
reduction of running time complexities to O(p^2 n), where p is the rank of the
approximation. The practicality of such methods thus depends on the required
rank p. In this paper, we show that in the context of kernel ridge regression,
for approximations based on a random subset of columns of the original kernel
matrix, the rank p may be chosen to be linear in the degrees of freedom
associated with the problem, a quantity which is classically used in the
statistical analysis of such methods, and is often seen as the implicit number
of parameters of non-parametric estimators. This result enables simple
algorithms that have sub-quadratic running time complexity, but provably
exhibit the same predictive performance than existing algorithms, for any given
problem instance, and not only for worst-case situations
OCReP: An Optimally Conditioned Regularization for Pseudoinversion Based Neural Training
In this paper we consider the training of single hidden layer neural networks
by pseudoinversion, which, in spite of its popularity, is sometimes affected by
numerical instability issues. Regularization is known to be effective in such
cases, so that we introduce, in the framework of Tikhonov regularization, a
matricial reformulation of the problem which allows us to use the condition
number as a diagnostic tool for identification of instability. By imposing
well-conditioning requirements on the relevant matrices, our theoretical
analysis allows the identification of an optimal value for the regularization
parameter from the standpoint of stability. We compare with the value derived
by cross-validation for overfitting control and optimisation of the
generalization performance. We test our method for both regression and
classification tasks. The proposed method is quite effective in terms of
predictivity, often with some improvement on performance with respect to the
reference cases considered. This approach, due to analytical determination of
the regularization parameter, dramatically reduces the computational load
required by many other techniques.Comment: Published on Neural Network
Popular Ensemble Methods: An Empirical Study
An ensemble consists of a set of individually trained classifiers (such as
neural networks or decision trees) whose predictions are combined when
classifying novel instances. Previous research has shown that an ensemble is
often more accurate than any of the single classifiers in the ensemble. Bagging
(Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two
relatively new but popular methods for producing ensembles. In this paper we
evaluate these methods on 23 data sets using both neural networks and decision
trees as our classification algorithm. Our results clearly indicate a number of
conclusions. First, while Bagging is almost always more accurate than a single
classifier, it is sometimes much less accurate than Boosting. On the other
hand, Boosting can create ensembles that are less accurate than a single
classifier -- especially when using neural networks. Analysis indicates that
the performance of the Boosting methods is dependent on the characteristics of
the data set being examined. In fact, further results show that Boosting
ensembles may overfit noisy data sets, thus decreasing its performance.
Finally, consistent with previous studies, our work suggests that most of the
gain in an ensemble's performance comes in the first few classifiers combined;
however, relatively large gains can be seen up to 25 classifiers when Boosting
decision trees
Neural network ensembles: Evaluation of aggregation algorithms
Ensembles of artificial neural networks show improved generalization
capabilities that outperform those of single networks. However, for aggregation
to be effective, the individual networks must be as accurate and diverse as
possible. An important problem is, then, how to tune the aggregate members in
order to have an optimal compromise between these two conflicting conditions.
We present here an extensive evaluation of several algorithms for ensemble
construction, including new proposals and comparing them with standard methods
in the literature. We also discuss a potential problem with sequential
aggregation algorithms: the non-frequent but damaging selection through their
heuristics of particularly bad ensemble members. We introduce modified
algorithms that cope with this problem by allowing individual weighting of
aggregate members. Our algorithms and their weighted modifications are
favorably tested against other methods in the literature, producing a sensible
improvement in performance on most of the standard statistical databases used
as benchmarks.Comment: 35 pages, 2 figures, In press AI Journa
- …