179,863 research outputs found
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
wsrf: An R Package for Classification with Scalable Weighted Subspace Random Forests
We describe a parallel implementation in R of the weighted subspace random forest algorithm (Xu, Huang, Williams, Wang, and Ye 2012) available as the wsrf package. A novel variable weighting method is used for variable subspace selection in place of the traditional approach of random variable sampling. This new approach is particularly useful in building models for high dimensional data - often consisting of thousands of variables. Parallel computation is used to take advantage of multi-core machines and clusters of machines to build random forest models from high dimensional data in considerably shorter times. A series of experiments presented in this paper demonstrates that wsrf is faster than existing packages whilst retaining and often improving on the classification performance, particularly for high dimensional data
Sampling Arborescences in Parallel
We study the problem of sampling a uniformly random directed rooted spanning tree, also known as an arborescence, from a possibly weighted directed graph. Classically, this problem has long been known to be polynomial-time solvable; the exact number of arborescences can be computed by a determinant [Tutte, 1948], and sampling can be reduced to counting [Jerrum et al., 1986; Jerrum and Sinclair, 1996]. However, the classic reduction from sampling to counting seems to be inherently sequential. This raises the question of designing efficient parallel algorithms for sampling. We show that sampling arborescences can be done in RNC.
For several well-studied combinatorial structures, counting can be reduced to the computation of a determinant, which is known to be in NC [Csanky, 1975]. These include arborescences, planar graph perfect matchings, Eulerian tours in digraphs, and determinantal point processes. However, not much is known about efficient parallel sampling of these structures. Our work is a step towards resolving this mystery
Estimating Equilibrium Expectations from Time-Correlated Simulation Data at Multiple Thermodynamic States
Computing the equilibrium properties of complex systems, such as free energy
differences, is often hampered by rare events in the dynamics. Enhanced
sampling methods may be used in order to speed up sampling by, for example,
using high temperatures, as in parallel tempering, or simulating with a
biasing potential such as in the case of umbrella sampling. The equilibrium
properties of the thermodynamic state of interest (e.g., lowest temperature or
unbiased potential) can be computed using reweighting estimators such as the
weighted histogram analysis method or the multistate Bennett acceptance ratio
(MBAR). weighted histogram analysis method and MBAR produce unbiased
estimates, the simulation samples from the global equilibria at their
respective thermodynamic states—a requirement that can be prohibitively
expensive for some simulations such as a large parallel tempering ensemble of
an explicitly solvated biomolecule. Here, we introduce the transition-based
reweighting analysis method (TRAM)—a class of estimators that exploit ideas
from Markov modeling and only require the simulation data to be in local
equilibrium within subsets of the configuration space. We formulate the
expanded TRAM (xTRAM) estimator that is shown to be asymptotically unbiased
and a generalization of MBAR. Using four exemplary systems of varying
complexity, we demonstrate the improved convergence (ranging from a twofold
improvement to several orders of magnitude) of xTRAM in comparison to a direct
counting estimator and MBAR, with respect to the invested simulation effort.
Lastly, we introduce a random-swapping simulation protocol that can be used
with xTRAM, gaining orders-of-magnitude advantages over simulation protocols
that require the constraint of sampling from a global equilibrium
xTRAM: Estimating equilibrium expectations from time-correlated simulation data at multiple thermodynamic states
Computing the equilibrium properties of complex systems, such as free energy
differences, is often hampered by rare events in the dynamics. Enhanced
sampling methods may be used in order to speed up sampling by, for example,
using high temperatures, as in parallel tempering, or simulating with a biasing
potential such as in the case of umbrella sampling. The equilibrium properties
of the thermodynamic state of interest (e.g., lowest temperature or unbiased
potential) can be computed using reweighting estimators such as the weighted
histogram analysis method or the multistate Bennett acceptance ratio (MBAR).
weighted histogram analysis method and MBAR produce unbiased estimates, the
simulation samples from the global equilibria at their respective thermodynamic
state--a requirement that can be prohibitively expensive for some simulations
such as a large parallel tempering ensemble of an explicitly solvated
biomolecule. Here, we introduce the transition-based reweighting analysis
method (TRAM)--a class of estimators that exploit ideas from Markov modeling
and only require the simulation data to be in local equilibrium within subsets
of the configuration space. We formulate the expanded TRAM (xTRAM) estimator
that is shown to be asymptotically unbiased and a generalization of MBAR. Using
four exemplary systems of varying complexity, we demonstrate the improved
convergence (ranging from a twofold improvement to several orders of magnitude)
of xTRAM in comparison to a direct counting estimator and MBAR, with respect to
the invested simulation effort. Lastly, we introduce a random-swapping
simulation protocol that can be used with xTRAM, gaining orders-of-magnitude
advantages over simulation protocols that require the constraint of sampling
from a global equilibrium.Comment: 23 pages with appendices, 5 figure
mfEGRA: Multifidelity Efficient Global Reliability Analysis through Active Learning for Failure Boundary Location
This paper develops mfEGRA, a multifidelity active learning method using
data-driven adaptively refined surrogates for failure boundary location in
reliability analysis. This work addresses the issue of prohibitive cost of
reliability analysis using Monte Carlo sampling for expensive-to-evaluate
high-fidelity models by using cheaper-to-evaluate approximations of the
high-fidelity model. The method builds on the Efficient Global Reliability
Analysis (EGRA) method, which is a surrogate-based method that uses adaptive
sampling for refining Gaussian process surrogates for failure boundary location
using a single-fidelity model. Our method introduces a two-stage adaptive
sampling criterion that uses a multifidelity Gaussian process surrogate to
leverage multiple information sources with different fidelities. The method
combines expected feasibility criterion from EGRA with one-step lookahead
information gain to refine the surrogate around the failure boundary. The
computational savings from mfEGRA depends on the discrepancy between the
different models, and the relative cost of evaluating the different models as
compared to the high-fidelity model. We show that accurate estimation of
reliability using mfEGRA leads to computational savings of 46% for an
analytic multimodal test problem and 24% for a three-dimensional acoustic horn
problem, when compared to single-fidelity EGRA. We also show the effect of
using a priori drawn Monte Carlo samples in the implementation for the acoustic
horn problem, where mfEGRA leads to computational savings of 45% for the
three-dimensional case and 48% for a rarer event four-dimensional case as
compared to single-fidelity EGRA
- …