8,495 research outputs found
Afivo: a framework for quadtree/octree AMR with shared-memory parallelization and geometric multigrid methods
Afivo is a framework for simulations with adaptive mesh refinement (AMR) on
quadtree (2D) and octree (3D) grids. The framework comes with a geometric
multigrid solver, shared-memory (OpenMP) parallelism and it supports output in
Silo and VTK file formats. Afivo can be used to efficiently simulate AMR
problems with up to about unknowns on desktops, workstations or single
compute nodes. For larger problems, existing distributed-memory frameworks are
better suited. The framework has no built-in functionality for specific physics
applications, so users have to implement their own numerical methods. The
included multigrid solver can be used to efficiently solve elliptic partial
differential equations such as Poisson's equation. Afivo's design was kept
simple, which in combination with the shared-memory parallelism facilitates
modification and experimentation with AMR algorithms. The framework was already
used to perform 3D simulations of streamer discharges, which required tens of
millions of cells
DISPATCH: A Numerical Simulation Framework for the Exa-scale Era. I. Fundamentals
We introduce a high-performance simulation framework that permits the
semi-independent, task-based solution of sets of partial differential
equations, typically manifesting as updates to a collection of `patches' in
space-time. A hybrid MPI/OpenMP execution model is adopted, where work tasks
are controlled by a rank-local `dispatcher' which selects, from a set of tasks
generally much larger than the number of physical cores (or hardware threads),
tasks that are ready for updating. The definition of a task can vary, for
example, with some solving the equations of ideal magnetohydrodynamics (MHD),
others non-ideal MHD, radiative transfer, or particle motion, and yet others
applying particle-in-cell (PIC) methods. Tasks do not have to be grid-based,
while tasks that are, may use either Cartesian or orthogonal curvilinear
meshes. Patches may be stationary or moving. Mesh refinement can be static or
dynamic. A feature of decisive importance for the overall performance of the
framework is that time steps are determined and applied locally; this allows
potentially large reductions in the total number of updates required in cases
when the signal speed varies greatly across the computational domain, and
therefore a corresponding reduction in computing time. Another feature is a
load balancing algorithm that operates `locally' and aims to simultaneously
minimise load and communication imbalance. The framework generally relies on
already existing solvers, whose performance is augmented when run under the
framework, due to more efficient cache usage, vectorisation, local
time-stepping, plus near-linear and, in principle, unlimited OpenMP and MPI
scaling.Comment: 17 pages, 8 figures. Accepted by MNRA
Hydra: A Parallel Adaptive Grid Code
We describe the first parallel implementation of an adaptive
particle-particle, particle-mesh code with smoothed particle hydrodynamics.
Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a
Cray proprietary language which allows rapid implementation of a serial code on
a parallel machine by allowing global addressing of distributed memory.
The collisionless variant of the code has already completed several 16.8
million particle cosmological simulations on a 128 processor Cray T3D whilst
the full hydrodynamic code has completed several 4.2 million particle combined
gas and dark matter runs. The efficiency of the code now allows parameter-space
explorations to be performed routinely using particles of each species.
A complete run including gas cooling, from high redshift to the present epoch
requires approximately 10 hours on 64 processors.
In this paper we present implementation details and results of the
performance and scalability of the CRAFT version of Hydra under varying degrees
of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure
A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws
We report on the development of a computational framework for the parallel,
mesh-adaptive solution of systems of hyperbolic conservation laws like the
time-dependent Euler equations in compressible gas dynamics or
Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh
refinement is realized by the recursive bisection of grid blocks along each
spatial dimension, implemented numerical schemes include standard
finite-differences as well as shock-capturing central schemes, both in
connection with Runge-Kutta type integrators. Parallel execution is achieved
through a configurable hybrid of POSIX-multi-threading and MPI-distribution
with dynamic load balancing. One- two- and three-dimensional test computations
for the Euler equations have been carried out and show good parallel scaling
behavior. The Racoon framework is currently used to study the formation of
singularities in plasmas and fluids.Comment: late submissio
Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors
A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values
A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations
We discuss the scalable parallel solution of the Poisson equation within a
Particle-In-Cell (PIC) code for the simulation of electron beams in particle
accelerators of irregular shape. The problem is discretized by Finite
Differences. Depending on the treatment of the Dirichlet boundary the resulting
system of equations is symmetric or `mildly' nonsymmetric positive definite. In
all cases, the system is solved by the preconditioned conjugate gradient
algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG)
preconditioning. We investigate variants of the implementation of SA-AMG that
lead to considerable improvements in the execution times. We demonstrate good
scalability of the solver on distributed memory parallel processor with up to
2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver
that is more commonly used for applications in beam dynamics
Quantitive analysis of electric vehicle flexibility : a data-driven approach
The electric vehicle (EV) flexibility, indicates to what extent the charging load can be coordinated (i.e., to flatten the load curve or to utilize renewable energy resources). However, such flexibility is neither well analyzed nor effectively quantified in literature. In this paper we fill this gap and offer an extensive analysis of the flexibility characteristics of 390k EV charging sessions and propose measures to quantize their flexibility exploitation. Our contributions include: (1) characterization of the EV charging behavior by clustering the arrival and departure time combinations that leads to the identification of type of EV charging behavior, (2) in-depth analysis of the characteristics of the charging sessions in each behavioral cluster and investigation of the influence of weekdays and seasonal changes on those characteristics including arrival, sojourn and idle times, and (3) proposing measures and an algorithm to quantitatively analyze how much flexibility (in terms of duration and amount) is used at various times of a day, for two representative scenarios. Understanding the characteristics of that flexibility (e.g., amount, time and duration of availability) and when it is used (in terms of both duration and amount) helps to develop more realistic price and incentive schemes in DR algorithms to efficiently exploit the offered flexibility or to estimate when to stimulate additional flexibility. (C) 2017 Elsevier Ltd. All rights reserved
Flow-level performance analysis of data networks using processor sharing models
Most telecommunication systems are dynamic in nature. The state of the network changes constantly as new transmissions appear and depart. In order to capture the behavior of such systems and to realistically evaluate their performance, it is essential to use dynamic models in the analysis. In this thesis, we model and analyze networks carrying elastic data traffic at flow level using stochastic queueing systems. We develop performance analysis methodology, as well as model and analyze example systems.
The exact analysis of stochastic models is difficult and usually becomes computationally intractable when the size of the network increases, and hence efficient approximative methods are needed. In this thesis, we use two performance approximation methods. Value extrapolation is a novel approximative method developed during this work and based on the theory of Markov decision processes. It can be used to approximate the performance measures of Markov processes. When applied to queueing systems, value extrapolation makes possible heavy state space truncation while providing accurate results without significant computational penalties. Balanced fairness is a capacity allocation scheme recently introduced by Bonald and Proutière that simplifies performance analysis and requires less restrictive assumptions about the traffic than other capacity allocation schemes. We introduce an approximation method based on balanced fairness and the Monte Carlo method for evaluating large sums that can be used to estimate the performance of systems of moderate size with low or medium loads.
The performance analysis methods are applied in two settings: load balancing in fixed networks and the analysis of wireless networks. The aim of load balancing is to divide the traffic load efficiently between the network resources in order to improve the performance. On the basis of the insensitivity results of Bonald and Proutière, we study both packet- and flow-level balancing in fixed data networks. We also study load balancing between multiple parallel discriminatory processor sharing queues and compare different balancing policies.
In the final part of the thesis, we analyze the performance of wireless networks carrying elastic data traffic. Wireless networks are gaining more and more popularity, as their advantages, such as easier deployment and mobility, outweigh their downsides. First, we discuss a simple cellular network with link adaptation consisting of two base stations and customers located on a line between them. We model the system and analyze the performance using different capacity allocation policies. Wireless multihop networks are analyzed using two different MAC schemes. On the basis of earlier work by Penttinen et al., we analyze the performance of networks using the STDMA MAC protocol. We also study multihop networks with random access, assuming that the transmission probabilities can be adapted upon flow arrivals and departures. We compare the throughput behavior of flow-optimized random access against the throughput obtained by optimal scheduling assuming balanced fairness capacity allocation
Efficient hierarchical approximation of high-dimensional option pricing problems
A major challenge in computational finance is the pricing of options that depend on a large number of risk factors. Prominent examples are basket or index options where dozens or even hundreds of stocks constitute the underlying asset and determine the dimensionality of the corresponding degenerate parabolic equation. The objective of this article is to show how an efficient discretisation can be achieved by hierarchical approximation as well as asymptotic expansions of the underlying continuous problem. The relation to a number of state-of-the-art methods is highlighted
A New Parallel N-body Gravity Solver: TPM
We have developed a gravity solver based on combining the well developed
Particle-Mesh (PM) method and TREE methods. It is designed for and has been
implemented on parallel computer architectures. The new code can deal with tens
of millions of particles on current computers, with the calculation done on a
parallel supercomputer or a group of workstations. Typically, the spatial
resolution is enhanced by more than a factor of 20 over the pure PM code with
mass resolution retained at nearly the PM level. This code runs much faster
than a pure TREE code with the same number of particles and maintains almost
the same resolution in high density regions. Multiple time step integration has
also been implemented with the code, with second order time accuracy. The
performance of the code has been checked in several kinds of parallel computer
configuration, including IBM SP1, SGI Challenge and a group of workstations,
with the speedup of the parallel code on a 32 processor IBM SP2 supercomputer
nearly linear (efficiency ) in the number of processors. The
computation/communication ratio is also very high (), which means the
code spends of its CPU time in computation.Comment: 21 Pages Latex file Figures available from anonymous ftp to
astro.princeton.edu under /xu/tpm.ps, POP-57
- …