15,374 research outputs found
Fast Deterministic Selection
The Median of Medians (also known as BFPRT) algorithm, although a landmark
theoretical achievement, is seldom used in practice because it and its variants
are slower than simple approaches based on sampling. The main contribution of
this paper is a fast linear-time deterministic selection algorithm
QuickselectAdaptive based on a refined definition of MedianOfMedians. The
algorithm's performance brings deterministic selection---along with its
desirable properties of reproducible runs, predictable run times, and immunity
to pathological inputs---in the range of practicality. We demonstrate results
on independent and identically distributed random inputs and on
normally-distributed inputs. Measurements show that QuickselectAdaptive is
faster than state-of-the-art baselines.Comment: Pre-publication draf
Modern Approaches to Exact Diagonalization and Selected Configuration Interaction with the Adaptive Sampling CI Method.
Recent advances in selected configuration interaction methods have made them competitive with the most accurate techniques available and, hence, creating an increasingly powerful tool for solving quantum Hamiltonians. In this work, we build on recent advances from the adaptive sampling configuration interaction (ASCI) algorithm. We show that a useful paradigm for generating efficient selected CI/exact diagonalization algorithms is driven by fast sorting algorithms, much in the same way iterative diagonalization is based on the paradigm of matrix vector multiplication. We present several new algorithms for all parts of performing a selected CI, which includes new ASCI search, dynamic bit masking, fast orbital rotations, fast diagonal matrix elements, and residue arrays. The ASCI search algorithm can be used in several different modes, which includes an integral driven search and a coefficient driven search. The algorithms presented here are fast and scalable, and we find that because they are built on fast sorting algorithms they are more efficient than all other approaches we considered. After introducing these techniques, we present ASCI results applied to a large range of systems and basis sets to demonstrate the types of simulations that can be practically treated at the full-CI level with modern methods and hardware, presenting double- and triple-ζ benchmark data for the G1 data set. The largest of these calculations is Si2H6 which is a simulation of 34 electrons in 152 orbitals. We also present some preliminary results for fast deterministic perturbation theory simulations that use hash functions to maintain high efficiency for treating large basis sets
Parallel resampling in the particle filter
Modern parallel computing devices, such as the graphics processing unit
(GPU), have gained significant traction in scientific and statistical
computing. They are particularly well-suited to data-parallel algorithms such
as the particle filter, or more generally Sequential Monte Carlo (SMC), which
are increasingly used in statistical inference. SMC methods carry a set of
weighted particles through repeated propagation, weighting and resampling
steps. The propagation and weighting steps are straightforward to parallelise,
as they require only independent operations on each particle. The resampling
step is more difficult, as standard schemes require a collective operation,
such as a sum, across particle weights. Focusing on this resampling step, we
analyse two alternative schemes that do not involve a collective operation
(Metropolis and rejection resamplers), and compare them to standard schemes
(multinomial, stratified and systematic resamplers). We find that, in certain
circumstances, the alternative resamplers can perform significantly faster on a
GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover,
in single precision, the standard approaches are numerically biased for upwards
of hundreds of thousands of particles, while the alternatives are not. This is
particularly important given greater single- than double-precision throughput
on modern devices, and the consequent temptation to use single precision with a
greater number of particles. Finally, we provide auxiliary functions useful for
implementation, such as for the permutation of ancestry vectors to enable
in-place propagation.Comment: 21 pages, 6 figure
- …