1,375 research outputs found

    Generalization error bounds for kernel matrix completion and extrapolation

    Get PDF
    Š 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Prior information can be incorporated in matrix completion to improve estimation accuracy and extrapolate the missing entries. Reproducing kernel Hilbert spaces provide tools to leverage the said prior information, and derive more reliable algorithms. This paper analyzes the generalization error of such approaches, and presents numerical tests confirming the theoretical resultsThis work is supported by ERDF funds (TEC2013-41315-R and TEC2016-75067-C4-2), the Catalan Government (2017 SGR 578), and NSF grants(1500713, 1514056, 1711471 and 1509040).Peer ReviewedPostprint (published version

    Carleson measures, trees, extrapolation, and T(b)T(b) theorems

    Get PDF
    The theory of Carleson measures, stopping time arguments, and atomic decompositions has been well-established in harmonic analysis. More recent is the theory of phase space analysis from the point of view of wave packets on tiles, tree selection algorithms, and tree size estimates. The purpose of this paper is to demonstrate that the two theories are in fact closely related, by taking existing results and reproving them in a unified setting. In particular we give a dyadic version of extrapolation for Carleson measures, with two separate proofs, as well as a two-sided local dyadic T(b)T(b) theorem which generalizes earlier T(b)T(b) theorems of David, Journe, Semmes, and Christ.Comment: 50 pages, 3 figures, to appear, Publications Matematiques Barcelona. A new proof of the extrapolation lemma (due to John Garnett) is now include

    Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

    Full text link
    Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels zz are determined by pairs of features (x,y)(x,y), (b) the training distribution has coverage of certain marginal distributions over xx and yy separately, but (c) the test distribution involves examples from a product distribution over (x,y)(x,y) that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space HH: E[z∣x,y]=⟨f⋆(x),g⋆(y)⟩H\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}, we aim to extrapolate to a test distribution domain that is notnot covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-kk singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.Comment: The 36th Annual Conference on Learning Theory (COLT 2023

    Can neural networks extrapolate? Discussion of a theorem by Pedro Domingos

    Full text link
    Neural networks trained on large datasets by minimizing a loss have become the state-of-the-art approach for resolving data science problems, particularly in computer vision, image processing and natural language processing. In spite of their striking results, our theoretical understanding about how neural networks operate is limited. In particular, what are the interpolation capabilities of trained neural networks? In this paper we discuss a theorem of Domingos stating that "every machine learned by continuous gradient descent is approximately a kernel machine". According to Domingos, this fact leads to conclude that all machines trained on data are mere kernel machines. We first extend Domingo's result in the discrete case and to networks with vector-valued output. We then study its relevance and significance on simple examples. We find that in simple cases, the "neural tangent kernel" arising in Domingos' theorem does provide understanding of the networks' predictions. Furthermore, when the task given to the network grows in complexity, the interpolation capability of the network can be effectively explained by Domingos' theorem, and therefore is limited. We illustrate this fact on a classic perception theory problem: recovering a shape from its boundary

    Fourier neural operator for learning solutions to macroscopic traffic flow models: Application to the forward and inverse problems

    Full text link
    Deep learning methods are emerging as popular computational tools for solving forward and inverse problems in traffic flow. In this paper, we study a neural operator framework for learning solutions to nonlinear hyperbolic partial differential equations with applications in macroscopic traffic flow models. In this framework, an operator is trained to map heterogeneous and sparse traffic input data to the complete macroscopic traffic state in a supervised learning setting. We chose a physics-informed Fourier neural operator (π\pi-FNO) as the operator, where an additional physics loss based on a discrete conservation law regularizes the problem during training to improve the shock predictions. We also propose to use training data generated from random piecewise constant input data to systematically capture the shock and rarefied solutions. From experiments using the LWR traffic flow model, we found superior accuracy in predicting the density dynamics of a ring-road network and urban signalized road. We also found that the operator can be trained using simple traffic density dynamics, e.g., consisting of 2−32-3 vehicle queues and 1−21-2 traffic signal cycles, and it can predict density dynamics for heterogeneous vehicle queue distributions and multiple traffic signal cycles (≥2)(\geq 2) with an acceptable error. The extrapolation error grew sub-linearly with input complexity for a proper choice of the model architecture and training data. Adding a physics regularizer aided in learning long-term traffic density dynamics, especially for problems with periodic boundary data

    Advances in Probabilistic Meta-Learning and the Neural Process Family

    Get PDF
    A natural progression in machine learning research is to automate and learn from data increasingly many components of our learning agents.Meta-learning is a paradigm that fully embraces this perspective, and can be intuitively described as embodying the idea of learning to learn. A goal of meta-learning research is the development of models to assist users in navigating the intricate space of design choices associated with specifying machine learning solutions. This space is particularly formidable when considering deep learning approaches, which involve myriad design choices interacting in complex fashions to affect the performance of the resulting agents. Despite the impressive successes of deep learning in recent years, this challenge remains a significant bottleneck in deploying neural network based solutions in several important application domains. But how can we reason about and design solutions to this daunting task? This thesis is concerned with a particular perspective for meta-learning in supervised settings. We view supervised learning algorithms as mappings that take data sets to predictive models, and consider meta-learning as learning to approximate functions of this form. In particular, we are interested in meta-learners that (i) employ neural networks to approximate these functions in an end-to-end manner, and (ii) provide predictive distributions rather than single predictors. The former is motivated by the success of neural networks as function approximators, and the latter by our interest in the few-shot learning scenario. The introductory chapters of this thesis formalise this notion, and use it to provide a tutorial introducing the Neural Process Family (NPF), a class of models introduced by Garnelo et al (2018) satisfying the above-mentioned modelling desiderata. We then present our own technical contributions to the NPF. First, we focus on fundamental properties of the model-class, such as expressivity and limiting behaviours of the associated training procedures. Next, we study the role of translation equivariance in the NPF. Considering the intimate relationship between the NPF and the representation of functions operating on sets, we extend the underlying theory of DeepSets to include translation equivariance. We then develop novel members of the NPF endowed with this important inductive bias. Through extensive empirical evaluation, we demonstrate that, in many settings, they significantly outperform their non-equivariant counterparts. Finally, we turn our attention to the development of Neural Processes for few-shot image-classification. We introduce models that navigate the important tradeoffs associated with this setting, and describe the specification of their central components. We demonstrate that the resulting models---CNAPs---achieve state-of-the-art performance on a challenging benchmark called Meta-Dataset, while adapting faster and with less computational overhead than their best-performing competitors
    • …
    corecore