9 research outputs found
Conditional Density Estimations from Privacy-Protected Data
Many modern statistical analysis and machine learning applications require
training models on sensitive user data. Differential privacy provides a formal
guarantee that individual-level information about users does not leak. In this
framework, randomized algorithms inject calibrated noise into the confidential
data, resulting in privacy-protected datasets or queries. However, restricting
access to only privatized data during statistical analysis makes it
computationally challenging to make valid inferences on the parameters
underlying the confidential data. In this work, we propose simulation-based
inference methods from privacy-protected datasets. In addition to sequential
Monte Carlo approximate Bayesian computation, we use neural conditional density
estimators as a flexible family of distributions to approximate the posterior
distribution of model parameters given the observed private query results. We
illustrate our methods on discrete time-series data under an infectious disease
model and with ordinary linear regression models. Illustrating the
privacy-utility trade-off, our experiments and analysis demonstrate the
necessity and feasibility of designing valid statistical inference procedures
to correct for biases introduced by the privacy-protection mechanisms
Transporting Higher-Order Quadrature Rules: Quasi-Monte Carlo Points and Sparse Grids for Mixture Distributions
Integration against, and hence sampling from, high-dimensional probability
distributions is of essential importance in many application areas and has been
an active research area for decades. One approach that has drawn increasing
attention in recent years has been the generation of samples from a target
distribution using transport maps: if
is the pushforward
of an easily-sampled probability distribution under
the transport map , then the application of to
-distributed samples yields
-distributed samples. This paper proposes the
application of transport maps not just to random samples, but also to
quasi-Monte Carlo points, higher-order nets, and sparse grids in order for the
transformed samples to inherit the original convergence rates that are often
better than , being the number of samples/quadrature nodes. Our
main result is the derivation of an explicit transport map for the case that
is a mixture of simple distributions, e.g.\ a
Gaussian mixture, in which case application of the transport map requires
the solution of an \emph{explicit} ODE with \emph{closed-form} right-hand side.
Mixture distributions are of particular applicability and interest since many
methods proceed by first approximating by a mixture
and then sampling from that mixture (often using importance reweighting).
Hence, this paper allows for the sampling step to provide a better convergence
rate than for all such methods.Comment: 24 page
Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo
Intractable generative models are models for which the likelihood is
unavailable but sampling is possible. Most approaches to parameter inference in
this setting require the computation of some discrepancy between the data and
the generative model. This is for example the case for minimum distance
estimation and approximate Bayesian computation. These approaches require
sampling a high number of realisations from the model for different parameter
values, which can be a significant challenge when simulating is an expensive
operation. In this paper, we propose to enhance this approach by enforcing
"sample diversity" in simulations of our models. This will be implemented
through the use of quasi-Monte Carlo (QMC) point sets. Our key results are
sample complexity bounds which demonstrate that, under smoothness conditions on
the generator, QMC can significantly reduce the number of samples required to
obtain a given level of accuracy when using three of the most common
discrepancies: the maximum mean discrepancy, the Wasserstein distance, and the
Sinkhorn divergence. This is complemented by a simulation study which
highlights that an improved accuracy is sometimes also possible in some
settings which are not covered by the theory.Comment: minor presentation changes and updated reference
Quasi-Monte Carlo for Efficient Fourier Pricing of Multi-Asset Options
Efficiently pricing multi-asset options poses a significant challenge in quantitative finance. The Monte Carlo (MC) method remains the prevalent choice for pricing engines; however, its slow convergence rate impedes its practical application. Fourier methods leverage the knowledge of the characteristic function to accurately and rapidly value options with up to two assets. Nevertheless, they face hurdles in the high-dimensional settings due to the tensor product (TP) structure of commonly employed quadrature techniques. This work advocates using the randomized quasi-MC (RQMC) quadrature to improve the scalability of Fourier methods with high dimensions. The RQMC technique benefits from the smoothness of the integrand and alleviates the curse of dimensionality while providing practical error estimates. Nonetheless, the applicability of RQMC on the unbounded domain, , requires a domain transformation to , which may result in singularities of the transformed integrand at the corners of the hypercube, and deteriorate the rate of convergence of RQMC. To circumvent this difficulty, we design an efficient domain transformation procedure based on the derived boundary growth conditions of the integrand. This transformation preserves the sufficient regularity of the integrand and hence improves the rate of convergence of RQMC. To validate this analysis, we demonstrate the efficiency of employing RQMC with an appropriate transformation to evaluate options in the Fourier space for various pricing models, payoffs, and dimensions. Finally, we highlight the computational advantage of applying RQMC over MC or TP in the Fourier domain, and over MC in the physical domain for options with up to 15 assets
Randomized quasi-Monte Carlo methods with applications to quantitative risk management
We use randomized quasi-Monte Carlo (RQMC) techniques to construct computational tools for working with normal mixture models, which include automatic integration routines for density and distribution function evaluation, as well as fitting algorithms. We also provide open source software with all our methods implemented.
In many practical problems, combining RQMC with importance sampling (IS) gives further variance reduction. However, the optimal IS density is typically not known, nor can it be sampled from. We solve this problem in the setting of single index models by finding a near optimal location-scale transform of the original density that approximates the optimal IS density for the univariate index.
Sampling from complicated multivariate models, such as generalized inverse Gaussian mixtures, often involves sampling from a multivariate normal by inversion and from another univariate distribution, say W, whose quantile function is not known nor easily approxi- mated. We explore how we can still use RQMC in this setting and propose several methods when sampling of W is only possible via a black box random variate generator. We also study different ways to feed acceptance rejection (AR) algorithms for W with quasi-random numbers.
RQMC methods on triangles have recently been developed by K. Basu and A. Owen. We show that one of the proposed sequences has suboptimal projection properties and address this issue by proposing to use their sequence to construct a stratified sampling scheme. Furthermore, we provide an extensible lattice construction for triangles and perform a simulation study
Data-driven parameter and model order reduction for industrial optimisation problems with applications in naval engineering
In this work we study data-driven reduced order models with a specific focus on reduction in parameter space to fight the curse of dimensionality, especially for functions with low-intrinsic structure, in the context of digital twins. To this end we proposed two different methods to improve the accuracy of responce surfaces built using the Active Subspaces (AS): a kernel-based approach which maps the inputs onto an higher dimensional space before applying AS, and a local approach in which a clustering induced by the presence of a global active subspace is exploited to construct localized regressors. We also used AS within a multi-fidelity nonlinear autoregressive scheme to reduced the approximation error of high-dimensional scalar function using only high-fidelity data. This multi-fidelity approach has also been integrated within a non-intrusive Proper Oorthogonal Decomposition (POD)
based framework in which every modal coefficient is reconstructed with a greater precision.
Moving to optimization algorithms we devised an extension of the classical genetic algorithm exploiting AS to accelerate the convergence, especially for highdimensional optimization problems. We applied different combinations of such methods in a diverse range of engineering problems such as structural optimization of cruise ships, shape optimization of a combatant hull and a NACA airfoil profile, and the prediction of hydroacoustic noises. A specific attention has been devoted to the naval engineering applications
and many of the methodological advances in this work have been inspired by them. This work has been conducted within the framework of the IRONTH project, an
industrial Ph.D. grant financed by Fincantieri S.p.A