2,934 research outputs found
A Unified Framework for Multiscale Modeling using the Mori-Zwanzig Formalism and the Variational Multiscale Method
We describe a paradigm for multiscale modeling that combines the Mori-Zwanzig
(MZ) formalism of Statistical Mechanics with the Variational Multiscale (VMS)
method. The MZ-VMS approach leverages both VMS scale-separation projectors as
well as phase-space projectors to provide a systematic modeling approach that
is applicable to non-linear partial differential equations. Spectral as well as
continuous and discontinuous finite element methods are considered. The
framework leads to a formally closed equation in which the effect of the
unresolved scales on the resolved scales is non-local in time and appears as a
convolution or memory integral. The resulting non-Markovian system is used as a
starting point for model development. We discover that unresolved scales lead
to memory effects that are driven by an orthogonal projection of the
coarse-scale residual and inter-element jumps. It is further shown that an
MZ-based finite memory model is a variant of the well-known
adjoint-stabilization method. For hyperbolic equations, this stabilization is
shown to have the form of an artificial viscosity term. We further establish
connections between the memory kernel and approximate Riemann solvers. It is
demonstrated that, in the case of one-dimensional linear advection, the
assumption of a finite memory and a linear quadrature leads to a closure term
that is formally equivalent to an upwind flux correction.Comment: 28 page
Online Learning: Sufficient Statistics and the Burkholder Method
We uncover a fairly general principle in online learning: If regret can be
(approximately) expressed as a function of certain "sufficient statistics" for
the data sequence, then there exists a special Burkholder function that 1) can
be used algorithmically to achieve the regret bound and 2) only depends on
these sufficient statistics, not the entire data sequence, so that the online
strategy is only required to keep the sufficient statistics in memory. This
characterization is achieved by bringing the full power of the Burkholder
Method --- originally developed for certifying probabilistic martingale
inequalities --- to bear on the online learning setting.
To demonstrate the scope and effectiveness of the Burkholder method, we
develop a novel online strategy for matrix prediction that attains a regret
bound corresponding to the variance term in matrix concentration inequalities.
We also present a linear-time/space prediction strategy for parameter free
supervised learning with linear classes and general smooth norms
Inference in Sparse Graphs with Pairwise Measurements and Side Information
We consider the statistical problem of recovering a hidden "ground truth"
binary labeling for the vertices of a graph up to low Hamming error from noisy
edge and vertex measurements. We present new algorithms and a sharp
finite-sample analysis for this problem on trees and sparse graphs with poor
expansion properties such as hypergrids and ring lattices. Our method
generalizes and improves over that of Globerson et al. (2015), who introduced
the problem for two-dimensional grid lattices.
For trees we provide a simple, efficient, algorithm that infers the ground
truth with optimal Hamming error has optimal sample complexity and implies
recovery results for all connected graphs. Here, the presence of side
information is critical to obtain a non-trivial recovery rate. We then show how
to adapt this algorithm to tree decompositions of edge-subgraphs of certain
graph families such as lattices, resulting in optimal recovery error rates that
can be obtained efficiently
The thrust of our analysis is to 1) use the tree decomposition along with
edge measurements to produce a small class of viable vertex labelings and 2)
apply an analysis influenced by statistical learning theory to show that we can
infer the ground truth from this class using vertex measurements. We show the
power of our method in several examples including hypergrids, ring lattices,
and the Newman-Watts model for small world graphs. For two-dimensional grids,
our results improve over Globerson et al. (2015) by obtaining optimal recovery
in the constant-height regime.Comment: AISTATS 201
ZigZag: A new approach to adaptive online learning
We develop a novel family of algorithms for the online learning setting with
regret against any data sequence bounded by the empirical Rademacher complexity
of that sequence. To develop a general theory of when this type of adaptive
regret bound is achievable we establish a connection to the theory of
decoupling inequalities for martingales in Banach spaces. When the hypothesis
class is a set of linear functions bounded in some norm, such a regret bound is
achievable if and only if the norm satisfies certain decoupling inequalities
for martingales. Donald Burkholder's celebrated geometric characterization of
decoupling inequalities (1984) states that such an inequality holds if and only
if there exists a special function called a Burkholder function satisfying
certain restricted concavity properties. Our online learning algorithms are
efficient in terms of queries to this function.
We realize our general theory by giving novel efficient algorithms for
classes including lp norms, Schatten p-norms, group norms, and reproducing
kernel Hilbert spaces. The empirical Rademacher complexity regret bound implies
--- when used in the i.i.d. setting --- a data-dependent complexity bound for
excess risk after online-to-batch conversion. To showcase the power of the
empirical Rademacher complexity regret bound, we derive improved rates for a
supervised learning generalization of the online learning with low rank experts
task and for the online matrix prediction task.
In addition to obtaining tight data-dependent regret bounds, our algorithms
enjoy improved efficiency over previous techniques based on Rademacher
complexity, automatically work in the infinite horizon setting, and are
scale-free. To obtain such adaptive methods, we introduce novel machinery, and
the resulting algorithms are not based on the standard tools of online convex
optimization.Comment: 49 page
Uniform Convergence of Gradients for Non-Convex Learning and Optimization
We investigate 1) the rate at which refined properties of the empirical
risk---in particular, gradients---converge to their population counterparts in
standard non-convex learning tasks, and 2) the consequences of this convergence
for optimization. Our analysis follows the tradition of norm-based capacity
control. We propose vector-valued Rademacher complexities as a simple,
composable, and user-friendly tool to derive dimension-free uniform convergence
bounds for gradients in non-convex learning problems. As an application of our
techniques, we give a new analysis of batch gradient descent methods for
non-convex generalized linear models and non-convex robust regression, showing
how to use any algorithm that finds approximate stationary points to obtain
optimal sample complexity, even when dimension is high or possibly infinite and
multiple passes over the dataset are allowed.
Moving to non-smooth models we show----in contrast to the smooth case---that
even for a single ReLU it is not possible to obtain dimension-independent
convergence rates for gradients in the worst case. On the positive side, it is
still possible to obtain dimension-independent rates under a new type of
distributional assumption.Comment: To appear in Neural Information Processing Systems (NIPS) 201
Adaptive Online Learning
We propose a general framework for studying adaptive regret bounds in the
online learning framework, including model selection bounds and data-dependent
bounds. Given a data- or model-dependent bound we ask, "Does there exist some
algorithm achieving this bound?" We show that modifications to recently
introduced sequential complexity measures can be used to answer this question
by providing sufficient conditions under which adaptive rates can be achieved.
In particular each adaptive rate induces a set of so-called offset complexity
measures, and obtaining small upper bounds on these quantities is sufficient to
demonstrate achievability. A cornerstone of our analysis technique is the use
of one-sided tail inequalities to bound suprema of offset random processes.
Our framework recovers and improves a wide variety of adaptive bounds
including quantile bounds, second-order data-dependent bounds, and small loss
bounds. In addition we derive a new type of adaptive bound for online linear
optimization based on the spectral norm, as well as a new online PAC-Bayes
theorem that holds for countably infinite sets
The Adjoint Petrov-Galerkin Method for Non-Linear Model Reduction
We formulate a new projection-based reduced-ordered modeling technique for
non-linear dynamical systems. The proposed technique, which we refer to as the
Adjoint Petrov-Galerkin (APG) method, is derived by decomposing the generalized
coordinates of a dynamical system into a resolved coarse-scale set and an
unresolved fine-scale set. A Markovian finite memory assumption within the
Mori-Zwanzig formalism is then used to develop a reduced-order representation
of the coarse-scales. This procedure leads to a closed reduced-order model that
displays commonalities with the adjoint stabilization method used in finite
elements. The formulation is shown to be equivalent to a Petrov-Galerkin method
with a non-linear, time-varying test basis, thus sharing some similarities with
the least-squares Petrov-Galerkin method. Theoretical analysis examining a
priori error bounds and computational cost is presented. Numerical experiments
on the compressible Navier-Stokes equations demonstrate that the proposed
method can lead to improvements in numerical accuracy, robustness, and
computational efficiency over the Galerkin method on problems of practical
interest. Improvements in numerical accuracy and computational efficiency over
the least-squares Petrov-Galerkin method are observed in most cases.Comment: preprint, 50 page
Parameter-free online learning via model selection
We introduce an efficient algorithmic framework for model selection in online
learning, also known as parameter-free online learning. Departing from previous
work, which has focused on highly structured function classes such as nested
balls in Hilbert space, we propose a generic meta-algorithm framework that
achieves online model selection oracle inequalities under minimal structural
assumptions. We give the first computationally efficient parameter-free
algorithms that work in arbitrary Banach spaces under mild smoothness
assumptions; previous results applied only to Hilbert spaces. We further derive
new oracle inequalities for matrix classes, non-nested convex sets, and
with generic regularizers. Finally, we generalize these
results by providing oracle inequalities for arbitrary non-linear classes in
the online supervised learning model. These results are all derived through a
unified meta-algorithm scheme using a novel "multi-scale" algorithm for
prediction with expert advice based on random playout, which may be of
independent interest.Comment: NIPS 201
Private Causal Inference
Causal inference deals with identifying which random variables "cause" or
control other random variables. Recent advances on the topic of causal
inference based on tools from statistical estimation and machine learning have
resulted in practical algorithms for causal inference. Causal inference has the
potential to have significant impact on medical research, prevention and
control of diseases, and identifying factors that impact economic changes to
name just a few. However, these promising applications for causal inference are
often ones that involve sensitive or personal data of users that need to be
kept private (e.g., medical records, personal finances, etc). Therefore, there
is a need for the development of causal inference methods that preserve data
privacy. We study the problem of inferring causality using the current, popular
causal inference framework, the additive noise model (ANM) while simultaneously
ensuring privacy of the users. Our framework provides differential privacy
guarantees for a variety of ANM variants. We run extensive experiments, and
demonstrate that our techniques are practical and easy to implement
Learning in Games: Robustness of Fast Convergence
We show that learning algorithms satisfying a property experience fast convergence to approximate optimality in a
large class of repeated games. Our property, which simply requires that each
learner has small regret compared to a -multiplicative
approximation to the best action in hindsight, is ubiquitous among learning
algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results
improve upon recent work of Syrgkanis et al. [SALS15] in a number of ways. We
require only that players observe payoffs under other players' realized
actions, as opposed to expected payoffs. We further show that convergence
occurs with high probability, and show convergence under bandit feedback.
Finally, we improve upon the speed of convergence by a factor of , the
number of players. Both the scope of settings and the class of algorithms for
which our analysis provides fast convergence are considerably broader than in
previous work.
Our framework applies to dynamic population games via a low approximate
regret property for shifting experts. Here we strengthen the results of
Lykouris et al. [LST16] in two ways: We allow players to select learning
algorithms from a larger class, which includes a minor variant of the basic
Hedge algorithm, and we increase the maximum churn in players for which
approximate optimality is achieved.
In the bandit setting we present a new algorithm which provides a "small
loss"-type bound with improved dependence on the number of actions in utility
settings, and is both simple and efficient. This result may be of independent
interest.Comment: 27 pages. NIPS 201
- …