10 research outputs found
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum--Norm Interpolated Classifiers
This paper establishes a precise high-dimensional asymptotic theory for
boosting on separable data, taking statistical and computational perspectives.
We consider a high-dimensional setting where the number of features (weak
learners) scales with the sample size , in an overparametrized regime.
Under a class of statistical models, we provide an exact analysis of the
generalization error of boosting when the algorithm interpolates the training
data and maximizes the empirical -margin. Further, we explicitly pin
down the relation between the boosting test error and the optimal Bayes error,
as well as the proportion of active features at interpolation (with zero
initialization). In turn, these precise characterizations answer certain
questions raised in \cite{breiman1999prediction, schapire1998boosting}
surrounding boosting, under assumed data generating processes. At the heart of
our theory lies an in-depth study of the maximum--margin, which can be
accurately described by a new system of non-linear equations; to analyze this
margin, we rely on Gaussian comparison techniques and develop a novel uniform
deviation argument. Our statistical and computational arguments can handle (1)
any finite-rank spiked covariance model for the feature distribution and (2)
variants of boosting corresponding to general -geometry, . As a final component, via the Lindeberg principle, we establish a
universality result showcasing that the scaled -margin (asymptotically)
remains the same, whether the covariates used for boosting arise from a
non-linear random feature model or an appropriately linearized model with
matching moments.Comment: 68 pages, 4 figure
Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control
In high dimensional variable selection problems, statisticians often seek to
design multiple testing procedures controlling the false discovery rate (FDR)
and simultaneously discovering more relevant variables. Model-X methods, such
as Knockoffs and conditional randomization tests, achieve the first goal of
finite-sample FDR control under the assumption of known covariates
distribution. However, it is not clear whether these methods can concurrently
achieve the second goal of maximizing the number of discoveries. In fact,
designing procedures to discover more relevant variables with finite-sample FDR
control is a largely open question, even in the arguably simplest linear
models.
In this paper, we derive near-optimal testing procedures in high dimensional
Bayesian linear models with isotropic covariates. We propose a Model-X multiple
testing procedure, PoEdCe, which provably controls the frequentist FDR from
finite samples even under model misspecification, and conjecturally achieves
near-optimal power when the data follow the Bayesian linear model with a known
prior. PoEdCe has three important ingredients: Posterior Expectation, distilled
Conditional randomization test (dCRT), and the Benjamini-Hochberg procedure
with e-values (eBH). The optimality conjecture of PoEdCe is based on a
heuristic calculation of its asymptotic true positive proportion (TPP) and
false discovery proportion (FDP), which is supported by methods from
statistical physics as well as extensive numerical simulations. Furthermore,
when the prior is unknown, we show that an empirical Bayes variant of PoEdCe
still has finite-sample FDR control and achieves near-optimal power.Comment: 45 pages, 5 figure
Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning
Wasserstein distributionally robust optimization has recently emerged as a
powerful framework for robust estimation, enjoying good out-of-sample
performance guarantees, well-understood regularization effects, and
computationally tractable reformulations. In such framework, the estimator is
obtained by minimizing the worst-case expected loss over all probability
distributions which are close, in a Wasserstein sense, to the empirical
distribution. In this paper, we propose a Wasserstein distributionally robust
estimation framework to estimate an unknown parameter from noisy linear
measurements, and we focus on the task of analyzing the squared error
performance of such estimators. Our study is carried out in the modern
high-dimensional proportional regime, where both the ambient dimension and the
number of samples go to infinity at a proportional rate which encodes the
under/over-parametrization of the problem. Under an isotropic Gaussian features
assumption, we show that the squared error can be recovered as the solution of
a convex-concave optimization problem which, surprinsingly, involves at most
four scalar variables. Importantly, the precise quantification of the squared
error allows to accurately and efficiently compare different ambiguity radii
and to understand the effect of the under/over-parametrization on the
estimation error. We conclude the paper with a list of exciting research
directions enabled by our results.Comment: This paper was previously titled "The Performance of Wasserstein
Distributionally Robust M-Estimators in High Dimensions
The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression
Successful deep learning models often involve training neural network
architectures that contain more parameters than the number of training samples.
Such overparametrized models have been extensively studied in recent years, and
the virtues of overparametrization have been established from both the
statistical perspective, via the double-descent phenomenon, and the
computational perspective via the structural properties of the optimization
landscape.
Despite the remarkable success of deep learning architectures in the
overparametrized regime, it is also well known that these models are highly
vulnerable to small adversarial perturbations in their inputs. Even when
adversarially trained, their performance on perturbed inputs (robust
generalization) is considerably worse than their best attainable performance on
benign inputs (standard generalization). It is thus imperative to understand
how overparametrization fundamentally affects robustness.
In this paper, we will provide a precise characterization of the role of
overparametrization on robustness by focusing on random features regression
models (two-layer neural networks with random first layer weights). We consider
a regime where the sample size, the input dimension and the number of
parameters grow in proportion to each other, and derive an asymptotically exact
formula for the robust generalization error when the model is adversarially
trained. Our developed theory reveals the nontrivial effect of
overparametrization on robustness and indicates that for adversarially trained
random features models, high overparametrization can hurt robust
generalization.Comment: 86 pages (main file: 25 pages and supplementary: 61 pages). To appear
in the Annals of Statistic