1,375 research outputs found
Generalization error bounds for kernel matrix completion and extrapolation
Š 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Prior information can be incorporated in matrix completion to improve estimation accuracy and extrapolate the missing entries. Reproducing kernel Hilbert spaces provide tools to leverage the said prior information, and derive more reliable algorithms. This paper analyzes the generalization error of such approaches, and presents numerical tests confirming the theoretical resultsThis work is supported by ERDF funds (TEC2013-41315-R and TEC2016-75067-C4-2), the Catalan Government (2017 SGR 578), and NSF grants(1500713, 1514056, 1711471 and 1509040).Peer ReviewedPostprint (published version
Carleson measures, trees, extrapolation, and theorems
The theory of Carleson measures, stopping time arguments, and atomic
decompositions has been well-established in harmonic analysis. More recent is
the theory of phase space analysis from the point of view of wave packets on
tiles, tree selection algorithms, and tree size estimates. The purpose of this
paper is to demonstrate that the two theories are in fact closely related, by
taking existing results and reproving them in a unified setting. In particular
we give a dyadic version of extrapolation for Carleson measures, with two
separate proofs, as well as a two-sided local dyadic theorem which
generalizes earlier theorems of David, Journe, Semmes, and Christ.Comment: 50 pages, 3 figures, to appear, Publications Matematiques Barcelona.
A new proof of the extrapolation lemma (due to John Garnett) is now include
Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective
Obtaining rigorous statistical guarantees for generalization under
distribution shift remains an open and active research area. We study a setting
we call combinatorial distribution shift, where (a) under the test- and
training-distributions, the labels are determined by pairs of features
, (b) the training distribution has coverage of certain marginal
distributions over and separately, but (c) the test distribution
involves examples from a product distribution over that is {not}
covered by the training distribution. Focusing on the special case where the
labels are given by bilinear embeddings into a Hilbert space : , we aim to
extrapolate to a test distribution domain that is covered in training,
i.e., achieving bilinear combinatorial extrapolation.
Our setting generalizes a special case of matrix completion from
missing-not-at-random data, for which all existing results require the
ground-truth matrices to be either exactly low-rank, or to exhibit very sharp
spectral cutoffs. In this work, we develop a series of theoretical results that
enable bilinear combinatorial extrapolation under gradual spectral decay as
observed in typical high-dimensional data, including novel algorithms,
generalization guarantees, and linear-algebraic results. A key tool is a novel
perturbation bound for the rank- singular value decomposition approximations
between two matrices that depends on the relative spectral gap rather than the
absolute spectral gap, a result that may be of broader independent interest.Comment: The 36th Annual Conference on Learning Theory (COLT 2023
Can neural networks extrapolate? Discussion of a theorem by Pedro Domingos
Neural networks trained on large datasets by minimizing a loss have become
the state-of-the-art approach for resolving data science problems, particularly
in computer vision, image processing and natural language processing. In spite
of their striking results, our theoretical understanding about how neural
networks operate is limited. In particular, what are the interpolation
capabilities of trained neural networks? In this paper we discuss a theorem of
Domingos stating that "every machine learned by continuous gradient descent is
approximately a kernel machine". According to Domingos, this fact leads to
conclude that all machines trained on data are mere kernel machines. We first
extend Domingo's result in the discrete case and to networks with vector-valued
output. We then study its relevance and significance on simple examples. We
find that in simple cases, the "neural tangent kernel" arising in Domingos'
theorem does provide understanding of the networks' predictions. Furthermore,
when the task given to the network grows in complexity, the interpolation
capability of the network can be effectively explained by Domingos' theorem,
and therefore is limited. We illustrate this fact on a classic perception
theory problem: recovering a shape from its boundary
Fourier neural operator for learning solutions to macroscopic traffic flow models: Application to the forward and inverse problems
Deep learning methods are emerging as popular computational tools for solving
forward and inverse problems in traffic flow. In this paper, we study a neural
operator framework for learning solutions to nonlinear hyperbolic partial
differential equations with applications in macroscopic traffic flow models. In
this framework, an operator is trained to map heterogeneous and sparse traffic
input data to the complete macroscopic traffic state in a supervised learning
setting. We chose a physics-informed Fourier neural operator (-FNO) as the
operator, where an additional physics loss based on a discrete conservation law
regularizes the problem during training to improve the shock predictions. We
also propose to use training data generated from random piecewise constant
input data to systematically capture the shock and rarefied solutions. From
experiments using the LWR traffic flow model, we found superior accuracy in
predicting the density dynamics of a ring-road network and urban signalized
road. We also found that the operator can be trained using simple traffic
density dynamics, e.g., consisting of vehicle queues and traffic
signal cycles, and it can predict density dynamics for heterogeneous vehicle
queue distributions and multiple traffic signal cycles with an
acceptable error. The extrapolation error grew sub-linearly with input
complexity for a proper choice of the model architecture and training data.
Adding a physics regularizer aided in learning long-term traffic density
dynamics, especially for problems with periodic boundary data
Advances in Probabilistic Meta-Learning and the Neural Process Family
A natural progression in machine learning research is to automate and learn from data increasingly many components of our learning agents.Meta-learning is a paradigm that fully embraces this perspective, and can be intuitively described as embodying the idea of learning to learn. A goal of meta-learning research is the development of models to assist users in navigating the intricate space of design choices associated with specifying machine learning solutions. This space is particularly formidable when considering deep learning approaches, which involve myriad design choices interacting in complex fashions to affect the performance of the resulting agents. Despite the impressive successes of deep learning in recent years, this challenge remains a significant bottleneck in deploying neural network based solutions in several important application domains. But how can we reason about and design solutions to this daunting task?
This thesis is concerned with a particular perspective for meta-learning in supervised settings. We view supervised learning algorithms as mappings that take data sets to predictive models, and consider meta-learning as learning to approximate functions of this form. In particular, we are interested in meta-learners that (i) employ neural networks to approximate these functions in an end-to-end manner, and (ii) provide predictive distributions rather than single predictors. The former is motivated by the success of neural networks as function approximators, and the latter by our interest in the few-shot learning scenario. The introductory chapters of this thesis formalise this notion, and use it to provide a tutorial introducing the Neural Process Family (NPF), a class of models introduced by Garnelo et al (2018) satisfying the above-mentioned modelling desiderata. We then present our own technical contributions to the NPF.
First, we focus on fundamental properties of the model-class, such as expressivity and limiting behaviours of the associated training procedures. Next, we study the role of translation equivariance in the NPF. Considering the intimate relationship between the NPF and the representation of functions operating on sets, we extend the underlying theory of DeepSets to include translation equivariance. We then develop novel members of the NPF endowed with this important inductive bias. Through extensive empirical evaluation, we demonstrate that, in many settings, they significantly outperform their non-equivariant counterparts.
Finally, we turn our attention to the development of Neural Processes for few-shot image-classification. We introduce models that navigate the important tradeoffs associated with this setting, and describe the specification of their central components. We demonstrate that the resulting models---CNAPs---achieve state-of-the-art performance on a challenging benchmark called Meta-Dataset, while adapting faster and with less computational overhead than their best-performing competitors
- âŚ