844 research outputs found
Expansion around half-integer values, binomial sums and inverse binomial sums
I consider the expansion of transcendental functions in a small parameter
around rational numbers. This includes in particular the expansion around
half-integer values. I present algorithms which are suitable for an
implementation within a symbolic computer algebra system. The method is an
extension of the technique of nested sums. The algorithms allow in addition the
evaluation of binomial sums, inverse binomial sums and generalizations thereof.Comment: 21 page
The infrared structure of e+ e- --> 3 jets at NNLO reloaded
This paper gives detailed information on the structure of the infrared
singularities for the process e+ e- --> 3 jets at next-to-next-to-leading order
in perturbation theory. Particular emphasis is put on singularities associated
to soft gluons. The knowledge of the singularity structure allows the
construction of appropriate subtraction terms, which in turn can be implemented
into a numerical Monte Carlo program.Comment: 59 pages, additional comments added, version to be publishe
SFC-based Communication Metadata Encoding for Adaptive Mesh
This volume of the series “Advances in Parallel Computing” contains the proceedings of the International Conference on Parallel Programming – ParCo 2013 – held from 10 to 13 September 2013 in Garching, Germany. The conference was hosted by the Technische Universität München (Department of Informatics) and the Leibniz Supercomputing Centre.The present paper studies two adaptive mesh refinement (AMR) codes
whose grids rely on recursive subdivison in combination with space-filling curves
(SFCs). A non-overlapping domain decomposition based upon these SFCs yields
several well-known advantageous properties with respect to communication demands,
balancing, and partition connectivity. However, the administration of the
meta data, i.e. to track which partitions exchange data in which cardinality, is nontrivial
due to the SFC’s fractal meandering and the dynamic adaptivity. We introduce
an analysed tree grammar for the meta data that restricts it without loss of
information hierarchically along the subdivision tree and applies run length encoding.
Hence, its meta data memory footprint is very small, and it can be computed
and maintained on-the-fly even for permanently changing grids. It facilitates a forkjoin
pattern for shared data parallelism. And it facilitates replicated data parallelism
tackling latency and bandwidth constraints respectively due to communication in
the background and reduces memory requirements by avoiding adjacency information
stored per element. We demonstrate this at hands of shared and distributed
parallelized domain decompositions.This work was supported by the German Research Foundation (DFG) as part of the
Transregional Collaborative Research Centre “Invasive Computing (SFB/TR 89). It is
partially based on work supported by Award No. UK-c0020, made by the King Abdullah
University of Science and Technology (KAUST)
On-the-fly memory compression for multibody algorithms.
Memory and bandwidth demands challenge developers of particle-based codes that have to scale on new architectures, as the growth of concurrency outperforms improvements in memory access facilities, as the memory per core tends to stagnate, and as communication networks cannot increase bandwidth arbitrary. We propose to analyse each particle of such a code to find out whether a hierarchical data representation storing data with reduced precision caps the memory demands without exceeding given error bounds. For admissible candidates, we perform this compression and thus reduce the pressure on the memory subsystem, lower the total memory footprint and reduce the data to be exchanged via MPI. Notably, our analysis and transformation changes the data compression dynamically, i.e. the choice of data format follows the solution characteristics, and it does not require us to alter the core simulation code
Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes
We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP’s map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude
Fully differential QCD corrections to single top quark final states
A new next-to-leading order Monte Carlo program for calculation of fully
differential single top quark final states is described and first results
presented. Both the s- and t-channel contributions are included.Comment: 3 pages, 3 figures, talk presented at DPF2000, August 9-12, 2000. To
appear in International Journal of Modern Physics
- …