9 research outputs found

    Conditional Density Estimations from Privacy-Protected Data

    Full text link
    Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid inferences on the parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms

    Transporting Higher-Order Quadrature Rules: Quasi-Monte Carlo Points and Sparse Grids for Mixture Distributions

    Full text link
    Integration against, and hence sampling from, high-dimensional probability distributions is of essential importance in many application areas and has been an active research area for decades. One approach that has drawn increasing attention in recent years has been the generation of samples from a target distribution Ptar\mathbb{P}_{\mathrm{tar}} using transport maps: if Ptar=T#Pref\mathbb{P}_{\mathrm{tar}} = T_\# \mathbb{P}_{\mathrm{ref}} is the pushforward of an easily-sampled probability distribution Pref\mathbb{P}_{\mathrm{ref}} under the transport map TT, then the application of TT to Pref\mathbb{P}_{\mathrm{ref}}-distributed samples yields Ptar\mathbb{P}_{\mathrm{tar}}-distributed samples. This paper proposes the application of transport maps not just to random samples, but also to quasi-Monte Carlo points, higher-order nets, and sparse grids in order for the transformed samples to inherit the original convergence rates that are often better than N−1/2N^{-1/2}, NN being the number of samples/quadrature nodes. Our main result is the derivation of an explicit transport map for the case that Ptar\mathbb{P}_{\mathrm{tar}} is a mixture of simple distributions, e.g.\ a Gaussian mixture, in which case application of the transport map TT requires the solution of an \emph{explicit} ODE with \emph{closed-form} right-hand side. Mixture distributions are of particular applicability and interest since many methods proceed by first approximating Ptar\mathbb{P}_{\mathrm{tar}} by a mixture and then sampling from that mixture (often using importance reweighting). Hence, this paper allows for the sampling step to provide a better convergence rate than N−1/2N^{-1/2} for all such methods.Comment: 24 page

    Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo

    Get PDF
    Intractable generative models are models for which the likelihood is unavailable but sampling is possible. Most approaches to parameter inference in this setting require the computation of some discrepancy between the data and the generative model. This is for example the case for minimum distance estimation and approximate Bayesian computation. These approaches require sampling a high number of realisations from the model for different parameter values, which can be a significant challenge when simulating is an expensive operation. In this paper, we propose to enhance this approach by enforcing "sample diversity" in simulations of our models. This will be implemented through the use of quasi-Monte Carlo (QMC) point sets. Our key results are sample complexity bounds which demonstrate that, under smoothness conditions on the generator, QMC can significantly reduce the number of samples required to obtain a given level of accuracy when using three of the most common discrepancies: the maximum mean discrepancy, the Wasserstein distance, and the Sinkhorn divergence. This is complemented by a simulation study which highlights that an improved accuracy is sometimes also possible in some settings which are not covered by the theory.Comment: minor presentation changes and updated reference

    Quasi-Monte Carlo for Efficient Fourier Pricing of Multi-Asset Options

    Get PDF
    Efficiently pricing multi-asset options poses a significant challenge in quantitative finance. The Monte Carlo (MC) method remains the prevalent choice for pricing engines; however, its slow convergence rate impedes its practical application. Fourier methods leverage the knowledge of the characteristic function to accurately and rapidly value options with up to two assets. Nevertheless, they face hurdles in the high-dimensional settings due to the tensor product (TP) structure of commonly employed quadrature techniques. This work advocates using the randomized quasi-MC (RQMC) quadrature to improve the scalability of Fourier methods with high dimensions. The RQMC technique benefits from the smoothness of the integrand and alleviates the curse of dimensionality while providing practical error estimates. Nonetheless, the applicability of RQMC on the unbounded domain, Rd\mathbb{R}^d, requires a domain transformation to [0,1]d[0,1]^d, which may result in singularities of the transformed integrand at the corners of the hypercube, and deteriorate the rate of convergence of RQMC. To circumvent this difficulty, we design an efficient domain transformation procedure based on the derived boundary growth conditions of the integrand. This transformation preserves the sufficient regularity of the integrand and hence improves the rate of convergence of RQMC. To validate this analysis, we demonstrate the efficiency of employing RQMC with an appropriate transformation to evaluate options in the Fourier space for various pricing models, payoffs, and dimensions. Finally, we highlight the computational advantage of applying RQMC over MC or TP in the Fourier domain, and over MC in the physical domain for options with up to 15 assets

    Randomized quasi-Monte Carlo methods with applications to quantitative risk management

    Get PDF
    We use randomized quasi-Monte Carlo (RQMC) techniques to construct computational tools for working with normal mixture models, which include automatic integration routines for density and distribution function evaluation, as well as fitting algorithms. We also provide open source software with all our methods implemented. In many practical problems, combining RQMC with importance sampling (IS) gives further variance reduction. However, the optimal IS density is typically not known, nor can it be sampled from. We solve this problem in the setting of single index models by finding a near optimal location-scale transform of the original density that approximates the optimal IS density for the univariate index. Sampling from complicated multivariate models, such as generalized inverse Gaussian mixtures, often involves sampling from a multivariate normal by inversion and from another univariate distribution, say W, whose quantile function is not known nor easily approxi- mated. We explore how we can still use RQMC in this setting and propose several methods when sampling of W is only possible via a black box random variate generator. We also study different ways to feed acceptance rejection (AR) algorithms for W with quasi-random numbers. RQMC methods on triangles have recently been developed by K. Basu and A. Owen. We show that one of the proposed sequences has suboptimal projection properties and address this issue by proposing to use their sequence to construct a stratified sampling scheme. Furthermore, we provide an extensible lattice construction for triangles and perform a simulation study

    Data-driven parameter and model order reduction for industrial optimisation problems with applications in naval engineering

    Get PDF
    In this work we study data-driven reduced order models with a specific focus on reduction in parameter space to fight the curse of dimensionality, especially for functions with low-intrinsic structure, in the context of digital twins. To this end we proposed two different methods to improve the accuracy of responce surfaces built using the Active Subspaces (AS): a kernel-based approach which maps the inputs onto an higher dimensional space before applying AS, and a local approach in which a clustering induced by the presence of a global active subspace is exploited to construct localized regressors. We also used AS within a multi-fidelity nonlinear autoregressive scheme to reduced the approximation error of high-dimensional scalar function using only high-fidelity data. This multi-fidelity approach has also been integrated within a non-intrusive Proper Oorthogonal Decomposition (POD) based framework in which every modal coefficient is reconstructed with a greater precision. Moving to optimization algorithms we devised an extension of the classical genetic algorithm exploiting AS to accelerate the convergence, especially for highdimensional optimization problems. We applied different combinations of such methods in a diverse range of engineering problems such as structural optimization of cruise ships, shape optimization of a combatant hull and a NACA airfoil profile, and the prediction of hydroacoustic noises. A specific attention has been devoted to the naval engineering applications and many of the methodological advances in this work have been inspired by them. This work has been conducted within the framework of the IRONTH project, an industrial Ph.D. grant financed by Fincantieri S.p.A
    corecore