12 research outputs found

    Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach

    Full text link
    Empirical process theory for i.i.d. observations has emerged as a ubiquitous tool for understanding the generalization properties of various statistical problems. However, in many applications where the data exhibit temporal dependencies (e.g., in finance, medical imaging, weather forecasting etc.), the corresponding empirical processes are much less understood. Motivated by this observation, we present a general bound on the expected supremum of empirical processes under standard β/ρ\beta/\rho-mixing assumptions. Unlike most prior work, our results cover both the long and the short-range regimes of dependence. Our main result shows that a non-trivial trade-off between the complexity of the underlying function class and the dependence among the observations characterizes the learning rate in a large class of nonparametric problems. This trade-off reveals a new phenomenon, namely that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting, provided the underlying function class is complex enough. We demonstrate the practical implications of our findings by analyzing various statistical estimators in both fixed and growing dimensions. Our main examples include a comprehensive case study of generalization error bounds in nonparametric regression over smoothness classes in fixed as well as growing dimension using neural nets, shape-restricted multivariate convex regression, estimating the optimal transport (Wasserstein) distance between two probability distributions, and classification under the Mammen-Tsybakov margin condition -- all under appropriate mixing assumptions. In the process, we also develop bounds on LrL_r (1r21\le r\le 2)-localized empirical processes with dependent observations, which we then leverage to get faster rates for (a) tuning-free adaptation, and (b) set-structured learning problems.Comment: 94 pages, 1 figur

    UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation

    Full text link
    Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts. Numerous methods are available for constructing prediction intervals, such as quantile regression and conformal predictions, among others. Nevertheless, model misspecification (especially in high-dimension) or sub-optimal constructions can frequently result in biased or unnecessarily-wide prediction intervals. In this paper, we propose a novel and widely applicable technique for aggregating multiple prediction intervals to minimize the average width of the prediction band along with coverage guarantee, called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). The method also allows us to directly construct predictive bands based on elementary basis functions. Our approach is based on linear or convex programming which is easy to implement. All of our proposed methodologies are supported by theoretical guarantees on the coverage probability and optimal average length, which are detailed in this paper. The effectiveness of our approach is convincingly demonstrated by applying it to synthetic data and two real datasets on finance and macroeconomics

    Representation Learning Dynamics of Self-Supervised Models

    Full text link
    Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL

    Analysis of High Dimensional Statistical Models with Discontinuity

    Full text link
    This dissertation focuses on analyzing certain statistical models with roots in fields like economics, psychometry, etc. that can be put under the umbrella of threshold estimation problems. The first chapter involves the analysis of Manski's celebrated maximum score estimator in a stochastic utility model in high dimensions both for p/n --> 0 and p >> n. We establish upper and lower bounds for the minimax l_2 error in the utility model with binary responses, that coincide up to a logarithmic factor, and construct a minimax-optimal estimator in the slow growth regime. Some extensions to the multinomial response model are also considered. The second chapter analyzes the canonical change point/plane model in fixed and growing dimensions in presence of heavy-tailed errors. In fixed dimensions, we establish that using a robust loss function (e.g. based on Huber estimating equation) leads to smaller asymptotic confidence intervals at the usual levels compared to the standard least-squares criterion. We have also derived minimax optimal rates for the changing plane estimation problem in growing dimensions and demonstrate that Huber estimation attains the optimal rate while the least-squares estimation gives a sub-optimal rate. The third and final chapter proposes and analyzes a new methodology for efficient estimation of homogeneous treatment effect in presence of endogeneity both in fixed and growing dimensions. In the fixed dimension regime, our method yields a root(n)- consistent, asymptotically normal, and semi-parametrically efficient estimator. In the growing/high dimensional regime, a debiased-lasso type estimator is shown to be asymptotically normal with root(n) convergence rate.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/174585/1/mdeb_1.pd
    corecore