12 research outputs found
Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach
Empirical process theory for i.i.d. observations has emerged as a ubiquitous
tool for understanding the generalization properties of various statistical
problems. However, in many applications where the data exhibit temporal
dependencies (e.g., in finance, medical imaging, weather forecasting etc.), the
corresponding empirical processes are much less understood. Motivated by this
observation, we present a general bound on the expected supremum of empirical
processes under standard -mixing assumptions. Unlike most prior
work, our results cover both the long and the short-range regimes of
dependence. Our main result shows that a non-trivial trade-off between the
complexity of the underlying function class and the dependence among the
observations characterizes the learning rate in a large class of nonparametric
problems. This trade-off reveals a new phenomenon, namely that even under
long-range dependence, it is possible to attain the same rates as in the i.i.d.
setting, provided the underlying function class is complex enough. We
demonstrate the practical implications of our findings by analyzing various
statistical estimators in both fixed and growing dimensions. Our main examples
include a comprehensive case study of generalization error bounds in
nonparametric regression over smoothness classes in fixed as well as growing
dimension using neural nets, shape-restricted multivariate convex regression,
estimating the optimal transport (Wasserstein) distance between two probability
distributions, and classification under the Mammen-Tsybakov margin condition --
all under appropriate mixing assumptions. In the process, we also develop
bounds on ()-localized empirical processes with dependent
observations, which we then leverage to get faster rates for (a) tuning-free
adaptation, and (b) set-structured learning problems.Comment: 94 pages, 1 figur
UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation
Uncertainty quantification for prediction is an intriguing problem with
significant applications in various fields, such as biomedical science,
economic studies, and weather forecasts. Numerous methods are available for
constructing prediction intervals, such as quantile regression and conformal
predictions, among others. Nevertheless, model misspecification (especially in
high-dimension) or sub-optimal constructions can frequently result in biased or
unnecessarily-wide prediction intervals. In this paper, we propose a novel and
widely applicable technique for aggregating multiple prediction intervals to
minimize the average width of the prediction band along with coverage
guarantee, called Universally Trainable Optimal Predictive Intervals
Aggregation (UTOPIA). The method also allows us to directly construct
predictive bands based on elementary basis functions. Our approach is based on
linear or convex programming which is easy to implement. All of our proposed
methodologies are supported by theoretical guarantees on the coverage
probability and optimal average length, which are detailed in this paper. The
effectiveness of our approach is convincingly demonstrated by applying it to
synthetic data and two real datasets on finance and macroeconomics
Representation Learning Dynamics of Self-Supervised Models
Self-Supervised Learning (SSL) is an important paradigm for learning
representations from unlabelled data, and SSL with neural networks has been
highly successful in practice. However current theoretical analysis of SSL is
mostly restricted to generalisation error bounds. In contrast, learning
dynamics often provide a precise characterisation of the behaviour of neural
networks based models but, so far, are mainly known in supervised settings. In
this paper, we study the learning dynamics of SSL models, specifically
representations obtained by minimising contrastive and non-contrastive losses.
We show that a naive extension of the dymanics of multivariate regression to
SSL leads to learning trivial scalar representations that demonstrates
dimension collapse in SSL. Consequently, we formulate SSL objectives with
orthogonality constraints on the weights, and derive the exact (network width
independent) learning dynamics of the SSL models trained using gradient descent
on the Grassmannian manifold. We also argue that the infinite width
approximation of SSL models significantly deviate from the neural tangent
kernel approximations of supervised models. We numerically illustrate the
validity of our theoretical findings, and discuss how the presented results
provide a framework for further theoretical analysis of contrastive and
non-contrastive SSL
Analysis of High Dimensional Statistical Models with Discontinuity
This dissertation focuses on analyzing certain statistical models with roots in fields like economics, psychometry, etc. that can be put under the umbrella of threshold estimation problems. The first chapter involves the analysis of Manski's celebrated maximum score estimator in a stochastic utility model in high dimensions both for p/n --> 0 and p >> n. We establish upper and lower bounds for the minimax l_2 error in the utility model with binary responses, that coincide up to a logarithmic factor, and construct a minimax-optimal estimator in the slow growth regime. Some extensions to the multinomial response model are also considered. The second chapter analyzes the canonical change point/plane model in fixed and growing dimensions in presence of heavy-tailed errors. In fixed dimensions, we establish that using a robust loss function (e.g. based on Huber estimating equation) leads to smaller asymptotic confidence intervals at the usual levels compared to the standard least-squares criterion. We have also derived minimax optimal rates for the changing plane estimation problem in growing dimensions and demonstrate that Huber estimation attains the optimal rate while the least-squares estimation gives a sub-optimal rate. The third and final chapter proposes and analyzes a new methodology for efficient estimation of homogeneous treatment effect in presence of endogeneity both in fixed and growing dimensions. In the fixed dimension regime, our method yields a root(n)- consistent, asymptotically normal, and semi-parametrically efficient estimator. In the growing/high dimensional regime, a debiased-lasso type estimator is shown to be asymptotically normal with root(n) convergence rate.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/174585/1/mdeb_1.pd