2,909 research outputs found

    A Unified Framework for Multiscale Modeling using the Mori-Zwanzig Formalism and the Variational Multiscale Method

    Full text link
    We describe a paradigm for multiscale modeling that combines the Mori-Zwanzig (MZ) formalism of Statistical Mechanics with the Variational Multiscale (VMS) method. The MZ-VMS approach leverages both VMS scale-separation projectors as well as phase-space projectors to provide a systematic modeling approach that is applicable to non-linear partial differential equations. Spectral as well as continuous and discontinuous finite element methods are considered. The framework leads to a formally closed equation in which the effect of the unresolved scales on the resolved scales is non-local in time and appears as a convolution or memory integral. The resulting non-Markovian system is used as a starting point for model development. We discover that unresolved scales lead to memory effects that are driven by an orthogonal projection of the coarse-scale residual and inter-element jumps. It is further shown that an MZ-based finite memory model is a variant of the well-known adjoint-stabilization method. For hyperbolic equations, this stabilization is shown to have the form of an artificial viscosity term. We further establish connections between the memory kernel and approximate Riemann solvers. It is demonstrated that, in the case of one-dimensional linear advection, the assumption of a finite memory and a linear quadrature leads to a closure term that is formally equivalent to an upwind flux correction.Comment: 28 page

    Online Learning: Sufficient Statistics and the Burkholder Method

    Full text link
    We uncover a fairly general principle in online learning: If regret can be (approximately) expressed as a function of certain "sufficient statistics" for the data sequence, then there exists a special Burkholder function that 1) can be used algorithmically to achieve the regret bound and 2) only depends on these sufficient statistics, not the entire data sequence, so that the online strategy is only required to keep the sufficient statistics in memory. This characterization is achieved by bringing the full power of the Burkholder Method --- originally developed for certifying probabilistic martingale inequalities --- to bear on the online learning setting. To demonstrate the scope and effectiveness of the Burkholder method, we develop a novel online strategy for matrix prediction that attains a regret bound corresponding to the variance term in matrix concentration inequalities. We also present a linear-time/space prediction strategy for parameter free supervised learning with linear classes and general smooth norms

    Inference in Sparse Graphs with Pairwise Measurements and Side Information

    Full text link
    We consider the statistical problem of recovering a hidden "ground truth" binary labeling for the vertices of a graph up to low Hamming error from noisy edge and vertex measurements. We present new algorithms and a sharp finite-sample analysis for this problem on trees and sparse graphs with poor expansion properties such as hypergrids and ring lattices. Our method generalizes and improves over that of Globerson et al. (2015), who introduced the problem for two-dimensional grid lattices. For trees we provide a simple, efficient, algorithm that infers the ground truth with optimal Hamming error has optimal sample complexity and implies recovery results for all connected graphs. Here, the presence of side information is critical to obtain a non-trivial recovery rate. We then show how to adapt this algorithm to tree decompositions of edge-subgraphs of certain graph families such as lattices, resulting in optimal recovery error rates that can be obtained efficiently The thrust of our analysis is to 1) use the tree decomposition along with edge measurements to produce a small class of viable vertex labelings and 2) apply an analysis influenced by statistical learning theory to show that we can infer the ground truth from this class using vertex measurements. We show the power of our method in several examples including hypergrids, ring lattices, and the Newman-Watts model for small world graphs. For two-dimensional grids, our results improve over Globerson et al. (2015) by obtaining optimal recovery in the constant-height regime.Comment: AISTATS 201

    Uniform Convergence of Gradients for Non-Convex Learning and Optimization

    Full text link
    We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization. Our analysis follows the tradition of norm-based capacity control. We propose vector-valued Rademacher complexities as a simple, composable, and user-friendly tool to derive dimension-free uniform convergence bounds for gradients in non-convex learning problems. As an application of our techniques, we give a new analysis of batch gradient descent methods for non-convex generalized linear models and non-convex robust regression, showing how to use any algorithm that finds approximate stationary points to obtain optimal sample complexity, even when dimension is high or possibly infinite and multiple passes over the dataset are allowed. Moving to non-smooth models we show----in contrast to the smooth case---that even for a single ReLU it is not possible to obtain dimension-independent convergence rates for gradients in the worst case. On the positive side, it is still possible to obtain dimension-independent rates under a new type of distributional assumption.Comment: To appear in Neural Information Processing Systems (NIPS) 201

    ZigZag: A new approach to adaptive online learning

    Full text link
    We develop a novel family of algorithms for the online learning setting with regret against any data sequence bounded by the empirical Rademacher complexity of that sequence. To develop a general theory of when this type of adaptive regret bound is achievable we establish a connection to the theory of decoupling inequalities for martingales in Banach spaces. When the hypothesis class is a set of linear functions bounded in some norm, such a regret bound is achievable if and only if the norm satisfies certain decoupling inequalities for martingales. Donald Burkholder's celebrated geometric characterization of decoupling inequalities (1984) states that such an inequality holds if and only if there exists a special function called a Burkholder function satisfying certain restricted concavity properties. Our online learning algorithms are efficient in terms of queries to this function. We realize our general theory by giving novel efficient algorithms for classes including lp norms, Schatten p-norms, group norms, and reproducing kernel Hilbert spaces. The empirical Rademacher complexity regret bound implies --- when used in the i.i.d. setting --- a data-dependent complexity bound for excess risk after online-to-batch conversion. To showcase the power of the empirical Rademacher complexity regret bound, we derive improved rates for a supervised learning generalization of the online learning with low rank experts task and for the online matrix prediction task. In addition to obtaining tight data-dependent regret bounds, our algorithms enjoy improved efficiency over previous techniques based on Rademacher complexity, automatically work in the infinite horizon setting, and are scale-free. To obtain such adaptive methods, we introduce novel machinery, and the resulting algorithms are not based on the standard tools of online convex optimization.Comment: 49 page

    Adaptive Online Learning

    Full text link
    We propose a general framework for studying adaptive regret bounds in the online learning framework, including model selection bounds and data-dependent bounds. Given a data- or model-dependent bound we ask, "Does there exist some algorithm achieving this bound?" We show that modifications to recently introduced sequential complexity measures can be used to answer this question by providing sufficient conditions under which adaptive rates can be achieved. In particular each adaptive rate induces a set of so-called offset complexity measures, and obtaining small upper bounds on these quantities is sufficient to demonstrate achievability. A cornerstone of our analysis technique is the use of one-sided tail inequalities to bound suprema of offset random processes. Our framework recovers and improves a wide variety of adaptive bounds including quantile bounds, second-order data-dependent bounds, and small loss bounds. In addition we derive a new type of adaptive bound for online linear optimization based on the spectral norm, as well as a new online PAC-Bayes theorem that holds for countably infinite sets

    The Adjoint Petrov-Galerkin Method for Non-Linear Model Reduction

    Full text link
    We formulate a new projection-based reduced-ordered modeling technique for non-linear dynamical systems. The proposed technique, which we refer to as the Adjoint Petrov-Galerkin (APG) method, is derived by decomposing the generalized coordinates of a dynamical system into a resolved coarse-scale set and an unresolved fine-scale set. A Markovian finite memory assumption within the Mori-Zwanzig formalism is then used to develop a reduced-order representation of the coarse-scales. This procedure leads to a closed reduced-order model that displays commonalities with the adjoint stabilization method used in finite elements. The formulation is shown to be equivalent to a Petrov-Galerkin method with a non-linear, time-varying test basis, thus sharing some similarities with the least-squares Petrov-Galerkin method. Theoretical analysis examining a priori error bounds and computational cost is presented. Numerical experiments on the compressible Navier-Stokes equations demonstrate that the proposed method can lead to improvements in numerical accuracy, robustness, and computational efficiency over the Galerkin method on problems of practical interest. Improvements in numerical accuracy and computational efficiency over the least-squares Petrov-Galerkin method are observed in most cases.Comment: preprint, 50 page

    Parameter-free online learning via model selection

    Full text link
    We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and Rd\mathbb{R}^{d} with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel "multi-scale" algorithm for prediction with expert advice based on random playout, which may be of independent interest.Comment: NIPS 201

    Private Causal Inference

    Full text link
    Causal inference deals with identifying which random variables "cause" or control other random variables. Recent advances on the topic of causal inference based on tools from statistical estimation and machine learning have resulted in practical algorithms for causal inference. Causal inference has the potential to have significant impact on medical research, prevention and control of diseases, and identifying factors that impact economic changes to name just a few. However, these promising applications for causal inference are often ones that involve sensitive or personal data of users that need to be kept private (e.g., medical records, personal finances, etc). Therefore, there is a need for the development of causal inference methods that preserve data privacy. We study the problem of inferring causality using the current, popular causal inference framework, the additive noise model (ANM) while simultaneously ensuring privacy of the users. Our framework provides differential privacy guarantees for a variety of ANM variants. We run extensive experiments, and demonstrate that our techniques are practical and easy to implement

    Learning in Games: Robustness of Fast Convergence

    Full text link
    We show that learning algorithms satisfying a low approximate regret\textit{low approximate regret} property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a (1+ϵ)(1+\epsilon)-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. [SALS15] in a number of ways. We require only that players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and show convergence under bandit feedback. Finally, we improve upon the speed of convergence by a factor of nn, the number of players. Both the scope of settings and the class of algorithms for which our analysis provides fast convergence are considerably broader than in previous work. Our framework applies to dynamic population games via a low approximate regret property for shifting experts. Here we strengthen the results of Lykouris et al. [LST16] in two ways: We allow players to select learning algorithms from a larger class, which includes a minor variant of the basic Hedge algorithm, and we increase the maximum churn in players for which approximate optimality is achieved. In the bandit setting we present a new algorithm which provides a "small loss"-type bound with improved dependence on the number of actions in utility settings, and is both simple and efficient. This result may be of independent interest.Comment: 27 pages. NIPS 201
    • …