3 research outputs found
A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
In recent years, randomized methods for numerical linear algebra have
received growing interest as a general approach to large-scale problems.
Typically, the essential ingredient of these methods is some form of randomized
dimension reduction, which accelerates computations, but also creates random
approximation error. In this way, the dimension reduction step encodes a
tradeoff between cost and accuracy. However, the exact numerical relationship
between cost and accuracy is typically unknown, and consequently, it may be
difficult for the user to precisely know (1) how accurate a given solution is,
or (2) how much computation is needed to achieve a given level of accuracy. In
the current paper, we study randomized matrix multiplication (sketching) as a
prototype setting for addressing these general problems. As a solution, we
develop a bootstrap method for \emph{directly estimating} the accuracy as a
function of the reduced dimension (as opposed to deriving worst-case bounds on
the accuracy in terms of the reduced dimension). From a computational
standpoint, the proposed method does not substantially increase the cost of
standard sketching methods, and this is made possible by an "extrapolation"
technique. In addition, we provide both theoretical and empirical results to
demonstrate the effectiveness of the proposed method
Error Estimation for Sketched SVD via the Bootstrap
In order to compute fast approximations to the singular value decompositions
(SVD) of very large matrices, randomized sketching algorithms have become a
leading approach. However, a key practical difficulty of sketching an SVD is
that the user does not know how far the sketched singular vectors/values are
from the exact ones. Indeed, the user may be forced to rely on analytical
worst-case error bounds, which do not account for the unique structure of a
given problem. As a result, the lack of tools for error estimation often leads
to much more computation than is really necessary. To overcome these
challenges, this paper develops a fully data-driven bootstrap method that
numerically estimates the actual error of sketched singular vectors/values. In
particular, this allows the user to inspect the quality of a rough initial
sketched SVD, and then adaptively predict how much extra work is needed to
reach a given error tolerance. Furthermore, the method is computationally
inexpensive, because it operates only on sketched objects, and it requires no
passes over the full matrix being factored. Lastly, the method is supported by
theoretical guarantees and a very encouraging set of experimental results
Estimating the Algorithmic Variance of Randomized Ensembles via the Bootstrap
Although the methods of bagging and random forests are some of the most
widely used prediction methods, relatively little is known about their
algorithmic convergence. In particular, there are not many theoretical
guarantees for deciding when an ensemble is "large enough" --- so that its
accuracy is close to that of an ideal infinite ensemble. Due to the fact that
bagging and random forests are randomized algorithms, the choice of ensemble
size is closely related to the notion of "algorithmic variance" (i.e. the
variance of prediction error due only to the training algorithm). In the
present work, we propose a bootstrap method to estimate this variance for
bagging, random forests, and related methods in the context of classification.
To be specific, suppose the training dataset is fixed, and let the random
variable denote the prediction error of a randomized ensemble of size
. Working under a "first-order model" for randomized ensembles, we prove
that the centered law of can be consistently approximated via the
proposed method as . Meanwhile, the computational cost of the
method is quite modest, by virtue of an extrapolation technique. As a
consequence, the method offers a practical guideline for deciding when the
algorithmic fluctuations of are negligible.Comment: 53 page