11 research outputs found
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum--Norm Interpolated Classifiers
This paper establishes a precise high-dimensional asymptotic theory for
boosting on separable data, taking statistical and computational perspectives.
We consider a high-dimensional setting where the number of features (weak
learners) scales with the sample size , in an overparametrized regime.
Under a class of statistical models, we provide an exact analysis of the
generalization error of boosting when the algorithm interpolates the training
data and maximizes the empirical -margin. Further, we explicitly pin
down the relation between the boosting test error and the optimal Bayes error,
as well as the proportion of active features at interpolation (with zero
initialization). In turn, these precise characterizations answer certain
questions raised in \cite{breiman1999prediction, schapire1998boosting}
surrounding boosting, under assumed data generating processes. At the heart of
our theory lies an in-depth study of the maximum--margin, which can be
accurately described by a new system of non-linear equations; to analyze this
margin, we rely on Gaussian comparison techniques and develop a novel uniform
deviation argument. Our statistical and computational arguments can handle (1)
any finite-rank spiked covariance model for the feature distribution and (2)
variants of boosting corresponding to general -geometry, . As a final component, via the Lindeberg principle, we establish a
universality result showcasing that the scaled -margin (asymptotically)
remains the same, whether the covariates used for boosting arise from a
non-linear random feature model or an appropriately linearized model with
matching moments.Comment: 68 pages, 4 figure
Spectrum-Aware Adjustment: A New Debiasing Framework with Applications to Principal Components Regression
We introduce a new debiasing framework for high-dimensional linear regression
that bypasses the restrictions on covariate distributions imposed by modern
debiasing technology. We study the prevalent setting where the number of
features and samples are both large and comparable. In this context,
state-of-the-art debiasing technology uses a degrees-of-freedom correction to
remove shrinkage bias of regularized estimators and conduct inference. However,
this method requires that the observed samples are i.i.d., the covariates
follow a mean zero Gaussian distribution, and reliable covariance matrix
estimates for observed features are available. This approach struggles when (i)
covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii)
rows of the design exhibit heterogeneity or dependencies, and (iii) reliable
feature covariance estimates are lacking.
To address these, we develop a new strategy where the debiasing correction is
a rescaled gradient descent step (suitably initialized) with step size
determined by the spectrum of the sample covariance matrix. Unlike prior work,
we assume that eigenvectors of this matrix are uniform draws from the
orthogonal group. We show this assumption remains valid in diverse situations
where traditional debiasing fails, including designs with complex row-column
dependencies, heavy tails, asymmetric properties, and latent low-rank
structures. We establish asymptotic normality of our proposed estimator
(centered and scaled) under various convergence notions. Moreover, we develop a
consistent estimator for its asymptotic variance. Lastly, we introduce a
debiased Principal Component Regression (PCR) technique using our
Spectrum-Aware approach. In varied simulations and real data experiments, we
observe that our method outperforms degrees-of-freedom debiasing by a margin
Abstracting Fairness: Oracles, Metrics, and Interpretability
It is well understood that classification algorithms, for example, for
deciding on loan applications, cannot be evaluated for fairness without taking
context into account. We examine what can be learned from a fairness oracle
equipped with an underlying understanding of ``true'' fairness. The oracle
takes as input a (context, classifier) pair satisfying an arbitrary fairness
definition, and accepts or rejects the pair according to whether the classifier
satisfies the underlying fairness truth. Our principal conceptual result is an
extraction procedure that learns the underlying truth; moreover, the procedure
can learn an approximation to this truth given access to a weak form of the
oracle. Since every ``truly fair'' classifier induces a coarse metric, in which
those receiving the same decision are at distance zero from one another and
those receiving different decisions are at distance one, this extraction
process provides the basis for ensuring a rough form of metric fairness, also
known as individual fairness. Our principal technical result is a higher
fidelity extractor under a mild technical constraint on the weak oracle's
conception of fairness. Our framework permits the scenario in which many
classifiers, with differing outcomes, may all be considered fair. Our results
have implications for interpretablity -- a highly desired but poorly defined
property of classification systems that endeavors to permit a human arbiter to
reject classifiers deemed to be ``unfair'' or illegitimately derived.Comment: 17 pages, 1 figur
A modern maximum-likelihood theory for high-dimensional logistic regression
Every student in statistics or data science learns early on that when the
sample size largely exceeds the number of variables, fitting a logistic model
produces estimates that are approximately unbiased. Every student also learns
that there are formulas to predict the variability of these estimates which are
used for the purpose of statistical inference; for instance, to produce
p-values for testing the significance of regression coefficients. Although
these formulas come from large sample asymptotics, we are often told that we
are on reasonably safe grounds when is large in such a way that
or . This paper shows that this is far from the case, and
consequently, inferences routinely produced by common software packages are
often unreliable.
Consider a logistic model with independent features in which and
become increasingly large in a fixed ratio. Then we show that (1) the MLE is
biased, (2) the variability of the MLE is far greater than classically
predicted, and (3) the commonly used likelihood-ratio test (LRT) is not
distributed as a chi-square. The bias of the MLE is extremely problematic as it
yields completely wrong predictions for the probability of a case based on
observed values of the covariates. We develop a new theory, which
asymptotically predicts (1) the bias of the MLE, (2) the variability of the
MLE, and (3) the distribution of the LRT. We empirically also demonstrate that
these predictions are extremely accurate in finite samples. Further, an
appealing feature is that these novel predictions depend on the unknown
sequence of regression coefficients only through a single scalar, the overall
strength of the signal. This suggests very concrete procedures to adjust
inference; we describe one such procedure learning a single parameter from data
and producing accurate inferenceComment: 29 pages, 14 figures, 4 table
The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance
We study the distribution of the maximum likelihood estimate (MLE) in
high-dimensional logistic models, extending the recent results from Sur (2019)
to the case where the Gaussian covariates may have an arbitrary covariance
structure. We prove that in the limit of large problems holding the ratio
between the number of covariates and the sample size constant, every
finite list of MLE coordinates follows a multivariate normal distribution.
Concretely, the th coordinate of the MLE is asymptotically
normally distributed with mean and standard deviation
; here, is the value of the true regression
coefficient, and the standard deviation of the th predictor
conditional on all the others. The numerical parameters and
only depend upon the problem dimensionality and the
overall signal strength, and can be accurately estimated. Our results imply
that the MLE's magnitude is biased upwards and that the MLE's standard
deviation is greater than that predicted by classical theory. We present a
series of experiments on simulated and real data showing excellent agreement
with the theory