8,495 research outputs found

    Afivo: a framework for quadtree/octree AMR with shared-memory parallelization and geometric multigrid methods

    Get PDF
    Afivo is a framework for simulations with adaptive mesh refinement (AMR) on quadtree (2D) and octree (3D) grids. The framework comes with a geometric multigrid solver, shared-memory (OpenMP) parallelism and it supports output in Silo and VTK file formats. Afivo can be used to efficiently simulate AMR problems with up to about 10810^{8} unknowns on desktops, workstations or single compute nodes. For larger problems, existing distributed-memory frameworks are better suited. The framework has no built-in functionality for specific physics applications, so users have to implement their own numerical methods. The included multigrid solver can be used to efficiently solve elliptic partial differential equations such as Poisson's equation. Afivo's design was kept simple, which in combination with the shared-memory parallelism facilitates modification and experimentation with AMR algorithms. The framework was already used to perform 3D simulations of streamer discharges, which required tens of millions of cells

    DISPATCH: A Numerical Simulation Framework for the Exa-scale Era. I. Fundamentals

    Full text link
    We introduce a high-performance simulation framework that permits the semi-independent, task-based solution of sets of partial differential equations, typically manifesting as updates to a collection of `patches' in space-time. A hybrid MPI/OpenMP execution model is adopted, where work tasks are controlled by a rank-local `dispatcher' which selects, from a set of tasks generally much larger than the number of physical cores (or hardware threads), tasks that are ready for updating. The definition of a task can vary, for example, with some solving the equations of ideal magnetohydrodynamics (MHD), others non-ideal MHD, radiative transfer, or particle motion, and yet others applying particle-in-cell (PIC) methods. Tasks do not have to be grid-based, while tasks that are, may use either Cartesian or orthogonal curvilinear meshes. Patches may be stationary or moving. Mesh refinement can be static or dynamic. A feature of decisive importance for the overall performance of the framework is that time steps are determined and applied locally; this allows potentially large reductions in the total number of updates required in cases when the signal speed varies greatly across the computational domain, and therefore a corresponding reduction in computing time. Another feature is a load balancing algorithm that operates `locally' and aims to simultaneously minimise load and communication imbalance. The framework generally relies on already existing solvers, whose performance is augmented when run under the framework, due to more efficient cache usage, vectorisation, local time-stepping, plus near-linear and, in principle, unlimited OpenMP and MPI scaling.Comment: 17 pages, 8 figures. Accepted by MNRA

    Hydra: A Parallel Adaptive Grid Code

    Full text link
    We describe the first parallel implementation of an adaptive particle-particle, particle-mesh code with smoothed particle hydrodynamics. Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a Cray proprietary language which allows rapid implementation of a serial code on a parallel machine by allowing global addressing of distributed memory. The collisionless variant of the code has already completed several 16.8 million particle cosmological simulations on a 128 processor Cray T3D whilst the full hydrodynamic code has completed several 4.2 million particle combined gas and dark matter runs. The efficiency of the code now allows parameter-space explorations to be performed routinely using 64364^3 particles of each species. A complete run including gas cooling, from high redshift to the present epoch requires approximately 10 hours on 64 processors. In this paper we present implementation details and results of the performance and scalability of the CRAFT version of Hydra under varying degrees of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure

    A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws

    Full text link
    We report on the development of a computational framework for the parallel, mesh-adaptive solution of systems of hyperbolic conservation laws like the time-dependent Euler equations in compressible gas dynamics or Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh refinement is realized by the recursive bisection of grid blocks along each spatial dimension, implemented numerical schemes include standard finite-differences as well as shock-capturing central schemes, both in connection with Runge-Kutta type integrators. Parallel execution is achieved through a configurable hybrid of POSIX-multi-threading and MPI-distribution with dynamic load balancing. One- two- and three-dimensional test computations for the Euler equations have been carried out and show good parallel scaling behavior. The Racoon framework is currently used to study the formation of singularities in plasmas and fluids.Comment: late submissio

    Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors

    Get PDF
    A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values

    A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations

    Full text link
    We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or `mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver that is more commonly used for applications in beam dynamics

    Quantitive analysis of electric vehicle flexibility : a data-driven approach

    Get PDF
    The electric vehicle (EV) flexibility, indicates to what extent the charging load can be coordinated (i.e., to flatten the load curve or to utilize renewable energy resources). However, such flexibility is neither well analyzed nor effectively quantified in literature. In this paper we fill this gap and offer an extensive analysis of the flexibility characteristics of 390k EV charging sessions and propose measures to quantize their flexibility exploitation. Our contributions include: (1) characterization of the EV charging behavior by clustering the arrival and departure time combinations that leads to the identification of type of EV charging behavior, (2) in-depth analysis of the characteristics of the charging sessions in each behavioral cluster and investigation of the influence of weekdays and seasonal changes on those characteristics including arrival, sojourn and idle times, and (3) proposing measures and an algorithm to quantitatively analyze how much flexibility (in terms of duration and amount) is used at various times of a day, for two representative scenarios. Understanding the characteristics of that flexibility (e.g., amount, time and duration of availability) and when it is used (in terms of both duration and amount) helps to develop more realistic price and incentive schemes in DR algorithms to efficiently exploit the offered flexibility or to estimate when to stimulate additional flexibility. (C) 2017 Elsevier Ltd. All rights reserved

    Flow-level performance analysis of data networks using processor sharing models

    Get PDF
    Most telecommunication systems are dynamic in nature. The state of the network changes constantly as new transmissions appear and depart. In order to capture the behavior of such systems and to realistically evaluate their performance, it is essential to use dynamic models in the analysis. In this thesis, we model and analyze networks carrying elastic data traffic at flow level using stochastic queueing systems. We develop performance analysis methodology, as well as model and analyze example systems. The exact analysis of stochastic models is difficult and usually becomes computationally intractable when the size of the network increases, and hence efficient approximative methods are needed. In this thesis, we use two performance approximation methods. Value extrapolation is a novel approximative method developed during this work and based on the theory of Markov decision processes. It can be used to approximate the performance measures of Markov processes. When applied to queueing systems, value extrapolation makes possible heavy state space truncation while providing accurate results without significant computational penalties. Balanced fairness is a capacity allocation scheme recently introduced by Bonald and Proutière that simplifies performance analysis and requires less restrictive assumptions about the traffic than other capacity allocation schemes. We introduce an approximation method based on balanced fairness and the Monte Carlo method for evaluating large sums that can be used to estimate the performance of systems of moderate size with low or medium loads. The performance analysis methods are applied in two settings: load balancing in fixed networks and the analysis of wireless networks. The aim of load balancing is to divide the traffic load efficiently between the network resources in order to improve the performance. On the basis of the insensitivity results of Bonald and Proutière, we study both packet- and flow-level balancing in fixed data networks. We also study load balancing between multiple parallel discriminatory processor sharing queues and compare different balancing policies. In the final part of the thesis, we analyze the performance of wireless networks carrying elastic data traffic. Wireless networks are gaining more and more popularity, as their advantages, such as easier deployment and mobility, outweigh their downsides. First, we discuss a simple cellular network with link adaptation consisting of two base stations and customers located on a line between them. We model the system and analyze the performance using different capacity allocation policies. Wireless multihop networks are analyzed using two different MAC schemes. On the basis of earlier work by Penttinen et al., we analyze the performance of networks using the STDMA MAC protocol. We also study multihop networks with random access, assuming that the transmission probabilities can be adapted upon flow arrivals and departures. We compare the throughput behavior of flow-optimized random access against the throughput obtained by optimal scheduling assuming balanced fairness capacity allocation

    Efficient hierarchical approximation of high-dimensional option pricing problems

    Get PDF
    A major challenge in computational finance is the pricing of options that depend on a large number of risk factors. Prominent examples are basket or index options where dozens or even hundreds of stocks constitute the underlying asset and determine the dimensionality of the corresponding degenerate parabolic equation. The objective of this article is to show how an efficient discretisation can be achieved by hierarchical approximation as well as asymptotic expansions of the underlying continuous problem. The relation to a number of state-of-the-art methods is highlighted

    A New Parallel N-body Gravity Solver: TPM

    Get PDF
    We have developed a gravity solver based on combining the well developed Particle-Mesh (PM) method and TREE methods. It is designed for and has been implemented on parallel computer architectures. The new code can deal with tens of millions of particles on current computers, with the calculation done on a parallel supercomputer or a group of workstations. Typically, the spatial resolution is enhanced by more than a factor of 20 over the pure PM code with mass resolution retained at nearly the PM level. This code runs much faster than a pure TREE code with the same number of particles and maintains almost the same resolution in high density regions. Multiple time step integration has also been implemented with the code, with second order time accuracy. The performance of the code has been checked in several kinds of parallel computer configuration, including IBM SP1, SGI Challenge and a group of workstations, with the speedup of the parallel code on a 32 processor IBM SP2 supercomputer nearly linear (efficiency 80%\approx 80\%) in the number of processors. The computation/communication ratio is also very high (50\sim 50), which means the code spends 95%95\% of its CPU time in computation.Comment: 21 Pages Latex file Figures available from anonymous ftp to astro.princeton.edu under /xu/tpm.ps, POP-57
    corecore