Search CORE

10,296 research outputs found

Recursive Algorithms for Distributed Forests of Octrees

Author: Burstedde Carsten
Ghattas Omar
Isaac Tobin
Wilcox Lucas C.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 18/11/2014
Field of study

The forest-of-octrees approach to parallel adaptive mesh refinement and coarsening (AMR) has recently been demonstrated in the context of a number of large-scale PDE-based applications. Although linear octrees, which store only leaf octants, have an underlying tree structure by definition, it is not often exploited in previously published mesh-related algorithms. This is because the branches are not explicitly stored, and because the topological relationships in meshes, such as the adjacency between cells, introduce dependencies that do not respect the octree hierarchy. In this work we combine hierarchical and topological relationships between octree branches to design efficient recursive algorithms. We present three important algorithms with recursive implementations. The first is a parallel search for leaves matching any of a set of multiple search criteria. The second is a ghost layer construction algorithm that handles arbitrarily refined octrees that are not covered by previous algorithms, which require a 2:1 condition between neighboring leaves. The third is a universal mesh topology iterator. This iterator visits every cell in a domain partition, as well as every interface (face, edge and corner) between these cells. The iterator calculates the local topological information for every interface that it visits, taking into account the nonconforming interfaces that increase the complexity of describing the local topology. To demonstrate the utility of the topology iterator, we use it to compute the numbering and encoding of higher-order

C^0

nodal basis functions. We analyze the complexity of the new recursive algorithms theoretically, and assess their performance, both in terms of single-processor efficiency and in terms of parallel scalability, demonstrating good weak and strong scaling up to 458k cores of the JUQUEEN supercomputer.Comment: 35 pages, 15 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Juelich Shared Electronic Resources

Calhoun, Institutional Archive of the Naval Postgraduate School

Equal-time correlation function for directed percolation

Author: Cardy J
H Hinrichsen
Henkel M
Henkel M
Henkel M
Hinrichsen H
Hinrichsen H
I Beljakov
Polyakov A M
Publication venue: 'IOP Publishing'
Publication date: 12/08/2010
Field of study

We suggest an equal-time n-point correlation function for systems in the directed percolation universality class which is well defined in all phases and independent of initial conditions. It is defined as the probability that all points are connected with a common ancestor in the past by directed paths.Comment: LaTeX, 12 pages, 8 eps figure

arXiv.org e-Print Archive

Crossref

Optimal Resource Allocation in Random Networks with Transportation Bandwidths

Author: C H Yeung
Challet D
Hertz J
Ho Y C
Jordan M I
K Y Michael Wong
Kabashima Y
Mackey D J C
Mézard M
Nishimori H
Opper M
Peterson L
Rardin R L
Shenker S
Publication venue: 'IOP Publishing'
Publication date: 01/01/2009
Field of study

We apply statistical physics to study the task of resource allocation in random sparse networks with limited bandwidths for the transportation of resources along the links. Useful algorithms are obtained from recursive relations. Bottlenecks emerge when the bandwidths are small, causing an increase in the fraction of idle links. For a given total bandwidth per node, the efficiency of allocation increases with the network connectivity. In the high connectivity limit, we find a phase transition at a critical bandwidth, above which clusters of balanced nodes appear, characterised by a profile of homogenized resource allocation similar to the Maxwell's construction.Comment: 28 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Hong Kong University of Science and Technology Institutional Repository

Path storage in the particle filter

Author: Jacob Pierre E.
Murray Lawrence
Rubenthaler Sylvain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/01/2014
Field of study

This article considers the problem of storing the paths generated by a particle filter and more generally by a sequential Monte Carlo algorithm. It provides a theoretical result bounding the expected memory cost by

T + C N \log N

where

T

is the time horizon,

N

is the number of particles and

C

is a constant, as well as an efficient algorithm to realise this. The theoretical result and the algorithm are illustrated with numerical experiments.Comment: 9 pages, 5 figures. To appear in Statistics and Computin

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

HAL-UNICE

Parallel resampling in the particle filter

Author: Jacob Pierre E.
Lee Anthony
Murray Lawrence M.
Publication venue
Publication date: 11/06/2015
Field of study

Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through repeated propagation, weighting and resampling steps. The propagation and weighting steps are straightforward to parallelise, as they require only independent operations on each particle. The resampling step is more difficult, as standard schemes require a collective operation, such as a sum, across particle weights. Focusing on this resampling step, we analyse two alternative schemes that do not involve a collective operation (Metropolis and rejection resamplers), and compare them to standard schemes (multinomial, stratified and systematic resamplers). We find that, in certain circumstances, the alternative resamplers can perform significantly faster on a GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover, in single precision, the standard approaches are numerically biased for upwards of hundreds of thousands of particles, while the alternatives are not. This is particularly important given greater single- than double-precision throughput on modern devices, and the consequent temptation to use single precision with a greater number of particles. Finally, we provide auxiliary functions useful for implementation, such as for the permutation of ancestry vectors to enable in-place propagation.Comment: 21 pages, 6 figure

arXiv.org e-Print Archive

FigShare