Search CORE

758 research outputs found

A U-statistic estimator for the variance of resampling-based error estimators

Author: Boulesteix Anne-Laure
De Bin Riccardo
Fuchs Mathias
Hornung Roman
Publication venue
Publication date: 01/01/2013
Field of study

We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Therefore, several standard theorems on properties of U-statistics apply. In particular, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is an unbiased estimator for this minimal variance if the total sample size is at least the double learning set size plus two. In this case, we exhibit such an estimator which is another U-statistic. It enjoys, again, various optimality properties and yields an asymptotically exact hypothesis test of the equality of error rates when two learning algorithms are compared. Our statements apply to any deterministic learning algorithms under weak non-degeneracy assumptions. In an application to tuning parameter choice in lasso regression on a gene expression data set, the test does not reject the null hypothesis of equal rates between two different parameters

arXiv.org e-Print Archive

CiteSeerX

Open Access LMU

Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

Author: Garrido Quentin
Kiani Bobak T.
Lawrence Hannah
LeCun Yann
Mialon Grégoire
Rehman Danyal
Publication venue
Publication date: 11/07/2023
Field of study

Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs

arXiv.org e-Print Archive