708,067 research outputs found
Computationally Efficient Nonparametric Importance Sampling
The variance reduction established by importance sampling strongly depends on
the choice of the importance sampling distribution. A good choice is often hard
to achieve especially for high-dimensional integration problems. Nonparametric
estimation of the optimal importance sampling distribution (known as
nonparametric importance sampling) is a reasonable alternative to parametric
approaches.In this article nonparametric variants of both the self-normalized
and the unnormalized importance sampling estimator are proposed and
investigated. A common critique on nonparametric importance sampling is the
increased computational burden compared to parametric methods. We solve this
problem to a large degree by utilizing the linear blend frequency polygon
estimator instead of a kernel estimator. Mean square error convergence
properties are investigated leading to recommendations for the efficient
application of nonparametric importance sampling. Particularly, we show that
nonparametric importance sampling asymptotically attains optimal importance
sampling variance. The efficiency of nonparametric importance sampling
algorithms heavily relies on the computational efficiency of the employed
nonparametric estimator. The linear blend frequency polygon outperforms kernel
estimators in terms of certain criteria such as efficient sampling and
evaluation. Furthermore, it is compatible with the inversion method for sample
generation. This allows to combine our algorithms with other variance reduction
techniques such as stratified sampling. Empirical evidence for the usefulness
of the suggested algorithms is obtained by means of three benchmark integration
problems. As an application we estimate the distribution of the queue length of
a spam filter queueing system based on real data.Comment: 29 pages, 7 figure
Concentration inequalities for order statistics
This note describes non-asymptotic variance and tail bounds for order
statistics of samples of independent identically distributed random variables.
Those bounds are checked to be asymptotically tight when the sampling
distribution belongs to a maximum domain of attraction. If the sampling
distribution has non-decreasing hazard rate (this includes the Gaussian
distribution), we derive an exponential Efron-Stein inequality for order
statistics: an inequality connecting the logarithmic moment generating function
of centered order statistics with exponential moments of Efron-Stein
(jackknife) estimates of variance. We use this general connection to derive
variance and tail bounds for order statistics of Gaussian sample. Those bounds
are not within the scope of the Tsirelson-Ibragimov-Sudakov
Gaussian concentration inequality. Proofs are elementary and combine
R\'enyi's representation of order statistics and the so-called entropy approach
to concentration inequalities popularized by M. Ledoux.Comment: 13 page
Latin hypercube sampling with dependence and applications in finance
In Monte Carlo simulation, Latin hypercube sampling (LHS) [McKay et al. (1979)] is a well-known variance reduction technique for vectors of independent random variables. The method presented here, Latin hypercube sampling with dependence (LHSD), extends LHS to vectors of dependent random variables. The resulting estimator is shown to be consistent and asymptotically unbiased. For the bivariate case and under some conditions on the joint distribution, a central limit theorem together with a closed formula for the limit variance are derived. It is shown that for a class of estimators satisfying some monotonicity condition, the LHSD limit variance is never greater than the corresponding Monte Carlo limit variance. In some valuation examples of financial payoffs, when compared to standard Monte Carlo simulation, a variance reduction of factors up to 200 is achieved. LHSD is suited for problems with rare events and for high-dimensional problems, and it may be combined with Quasi-Monte Carlo methods. --Monte Carlo simulation,variance reduction,Latin hypercube sampling,stratified sampling
Quantitative estimation of sampling uncertainties for mycotoxins in cereal shipments
Many countries receive shipments of bulk cereals from primary producers. There is a volume of work that is ongoing that seeks to arrive at appropriate standards for the quality of the shipments and the means to assess the shipments as they are out-loaded. Of concern are mycotoxin and heavy metal levels, pesticide and herbicide residue levels, and contamination by genetically modified organisms (GMOs). As the ability to quantify these contaminants improves through improved analytical techniques, the sampling methodologies applied to the shipments must also keep pace to ensure that the uncertainties attached to the sampling procedures do not overwhelm the analytical uncertainties. There is a need to understand and quantify sampling uncertainties under varying conditions of contamination. The analysis required is statistical and is challenging as the nature of the distribution of contaminants within a shipment is not well understood; very limited data exist. Limited work has been undertaken to quantify the variability of the contaminant concentrations in the flow of grain coming from a ship and the impact that this has on the variance of sampling. Relatively recent work by Paoletti et al. in 2006 [Paoletti C, Heissenberger A, Mazzara M, Larcher S, Grazioli E, Corbisier P, Hess N, Berben G, Lubeck PS, De Loose M, et al. 2006. Kernel lot distribution assessment (KeLDA): a study on the distribution of GMO in large soybean shipments. Eur Food Res Tech. 224:129â139] provides some insight into the variation in GMO concentrations in soybeans on cargo out-turn. Paoletti et al. analysed the data using correlogram analysis with the objective of quantifying the sampling uncertainty (variance) that attaches to the final cargo analysis, but this is only one possible means of quantifying sampling uncertainty. It is possible that in many cases the levels of contamination passing the sampler on out-loading are essentially random, negating the value of variographic quantitation of the sampling variance. GMOs and mycotoxins appear to have a highly heterogeneous distribution in a cargo depending on how the ship was loaded (the grain may have come from more than one terminal and set of storage silos) and mycotoxin growth may have occurred in transit. This paper examines a statistical model based on random contamination that can be used to calculate the sampling uncertainty arising from primary sampling of a cargo; it deals with what is thought to be a worst-case scenario. The determination of the sampling variance is treated both analytically and by Monte Carlo simulation. The latter approach provides the entire sampling distribution and not just the sampling variance. The sampling procedure is based on rules provided by the Canadian Grain Commission (CGC) and the levels of contamination considered are those relating to allowable levels of ochratoxin A (OTA) in wheat. The results of the calculations indicate that at a loading rate of 1000 tonnes h-1, primary sample increment masses of 10.6 kg, a 2000-tonne lot and a primary composite sample mass of 1900 kg, the relative standard deviation (RSD) is about 1.05 (105%) and the distribution of the mycotoxin (MT) level in the primary composite samples is highly skewed. This result applies to a mean MT level of 2 ng g-1. The rate of false-negative results under these conditions is estimated to be 16.2%. The corresponding contamination is based on initial average concentrations of MT of 4000 ng g-1 within average spherical volumes of 0.3m diameter, which are then diluted by a factor of 2 each time they pass through a handling stage; four stages of handling are assumed. The Monte Carlo calculations allow for variation in the initial volume of the MT-bearing grain, the average concentration and the dilution factor. The Monte Carlo studies seek to show the effect of variation in the sampling frequency while maintaining a primary composite sample mass of 1900 kg. The overall results are presented in terms of operational characteristic curves that relate only to the sampling uncertainties in the primary sampling of the grain. It is concluded that cross-stream sampling is intrinsically unsuited to sampling for mycotoxins and that better sampling methods and equipment are needed to control sampling uncertainties. At the same time, it is shown that some combination of crosscutting sampling conditions may, for a given shipment mass and MT content, yield acceptable sampling performance
Variance Reduction For A Discrete Velocity Gas
We extend a variance reduction technique developed by Baker and Hadjiconstantinou [1] to a discrete velocity gas. In our previous work, the collision integral was evaluated by importance sampling of collision partners [2]. Significant computational effort may be wasted by evaluating the collision integral in regions where the flow is in equilibrium. In the current approach, substantial computational savings are obtained by only solving for the deviations from equilibrium. In the near continuum regime, the deviations from equilibrium are small and low noise evaluation of the collision integral can be achieved with very coarse statistical sampling. Spatially homogenous relaxation of the Bobylev-Krook-Wu distribution [3,4], was used as a test case to verify that the method predicts the correct evolution of a highly non-equilibrium distribution to equilibrium. When variance reduction is not used, the noise causes the entropy to undershoot, but the method with variance reduction matches the analytic curve for the same number of collisions. We then extend the work to travelling shock waves and compare the accuracy and computational savings of the variance reduction method to DSMC over Mach numbers ranging from 1.2 to 10.Aerospace Engineering and Engineering Mechanic
- âŠ