493 research outputs found

    A* Sampling

    Full text link
    The problem of drawing samples from a discrete distribution can be converted into a discrete optimization problem. In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space. Central to the method is a stochastic process recently described in mathematical statistics that we call the Gumbel process. We present a new construction of the Gumbel process and A* sampling, a practical generic sampling algorithm that searches for the maximum of a Gumbel process using A* search. We analyze the correctness and convergence time of A* sampling and demonstrate empirically that it makes more efficient use of bound and likelihood evaluations than the most closely related adaptive rejection sampling-based algorithms.Comment: V2: - reworded the last paragraph of Section 2 to clarify that the argmax is a sample from the normalized measure. - fixed notation in Algorithm 1. - fixed a typo in paragraph 2 of Section

    A framework for time-dependent Ice Sheet Uncertainty Quantification, applied to three West Antarctic ice streams

    Get PDF
    Ice sheet models are the main tool to generate forecasts of ice sheet mass loss; a significant contributor to sea-level rise, thus knowing the likelihood of such projections is of critical societal importance. However, to capture the complete range of possible projections of mass loss, ice sheet models need efficient methods to quantify the forecast uncertainty. Uncertainties originate from the model structure, from the climate and ocean forcing used to run the model and from model calibration. Here we quantify the latter, applying an error propagation framework to a realistic setting in West Antarctica. As in many other ice-sheet modelling studies we use a control method to calibrate grid-scale flow parameters (parameters describing the basal drag and ice stiffness) with remotely-sensed observations. Yet our framework augments the control method with a Hessian-based Bayesian approach that estimates the posterior covariance of the inverted parameters. This enables us to quantify the impact of the calibration uncertainty on forecasts of sea-level rise contribution or volume above flotation (VAF), due to the choice of different regularisation strengths (prior strengths), sliding laws and velocity inputs. We find that by choosing different satellite ice velocity products our model leads to different estimates of VAF after 40 years. We use this difference in model output to quantify the variance that projections of VAF are expected to have after 40 years and identify prior strengths that can reproduce that variability. We demonstrate that if we use prior strengths suggested by L-curve analysis, as is typically done in ice-sheet calibration studies, our uncertainty quantification is not able to reproduce that same variability. The regularisation suggested by the L-curves is too strong and thus propagating the observational error through to VAF uncertainties under this choice of prior leads to errors that are smaller than those suggested by our 2-member &ldquo;sample&rdquo; of observed velocity fields. Additionally, our experiments suggest that large amounts of data points may be redundant, with implications for the error propagation of VAF.</p

    Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions

    Full text link
    Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel. We then prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption that positive pairs have similar labels. Our analysis is based on a decomposition of the target function in terms of the eigenfunctions of a positive-pair Markov chain, and a surprising equivalence between these eigenfunctions and the output of Kernel PCA. We give generalization bounds for downstream linear prediction using our Kernel PCA representation, and show empirically on a set of synthetic tasks that applying Kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength.Comment: Published at ICLR 202

    Dual Space Preconditioning for Gradient Descent

    Get PDF
    The conditions of relative smoothness and relative strong convexity were recently introduced for the analysis of Bregman gradient methods for convex optimization. We introduce a generalized left-preconditioning method for gradient descent, and show that its convergence on an essentially smooth convex objective function can be guaranteed via an application of relative smoothness in the dual space. Our relative smoothness assumption is between the designed preconditioner and the convex conjugate of the objective, and it generalizes the typical Lipschitz gradient assumption. Under dual relative strong convexity, we obtain linear convergence with a generalized condition number that is invariant under horizontal translations, distinguishing it from Bregman gradient methods. Thus, in principle our method is capable of improving the conditioning of gradient descent on problems with non-Lipschitz gradient or non-strongly convex structure. We demonstrate our method on p-norm regression and exponential penalty function minimization.Comment: SIAM J. Optim, accepte
    • …
    corecore