15,183 research outputs found

    Parallel Space Decomposition of the Mesh Adaptive Direct Search Algorithm

    Get PDF
    This paper describes a Parallel Space Decomposition (PSD) technique for the Mesh Adaptive Direct Search (MADS) algorithm. MADS extends Generalized Pattern Search for constrained nonsmooth optimization problems. The objective here is to solve larger problems more efficiently. The new method (PSD-MADS) is an asynchronous parallel algorithm in which the processes solve problems over subsets of variables. The convergence analysis based on the Clarke calculus is essentially the same as for the MADS algorithm. A practical implementation is described and some numerical results on problems with up to 500 variables illustrate advantages and limitations of PSD-MADS

    Extensions Ă  l'algorithme de recherche directe mads pour l'optimisation non lisse

    Get PDF
    Revue de la littérature sur les méthodes de recherche directe pour l'optimisation non lisse -- Démarche et organisation de la thèse -- Nonsmooth optimization through mesh adaptive direct search and variable neighborhood search -- Parallel space decomposition of the mesh adaptive direct search algorithm -- Orthomads : a deterministic mads instance with orthogonal directions

    A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing

    Get PDF
    This work introduces an innovative parallel, fully-distributed finite element framework for growing geometries and its application to metal additive manufacturing. It is well-known that virtual part design and qualification in additive manufacturing requires highly-accurate multiscale and multiphysics analyses. Only high performance computing tools are able to handle such complexity in time frames compatible with time-to-market. However, efficiency, without loss of accuracy, has rarely held the centre stage in the numerical community. Here, in contrast, the framework is designed to adequately exploit the resources of high-end distributed-memory machines. It is grounded on three building blocks: (1) Hierarchical adaptive mesh refinement with octree-based meshes; (2) a parallel strategy to model the growth of the geometry; (3) state-of-the-art parallel iterative linear solvers. Computational experiments consider the heat transfer analysis at the part scale of the printing process by powder-bed technologies. After verification against a 3D benchmark, a strong-scaling analysis assesses performance and identifies major sources of parallel overhead. A third numerical example examines the efficiency and robustness of (2) in a curved 3D shape. Unprecedented parallelism and scalability were achieved in this work. Hence, this framework contributes to take on higher complexity and/or accuracy, not only of part-scale simulations of metal or polymer additive manufacturing, but also in welding, sedimentation, atherosclerosis, or any other physical problem where the physical domain of interest grows in time

    Strict lower bounds with separation of sources of error in non-overlapping domain decomposition methods

    Get PDF
    This article deals with the computation of guaranteed lower bounds of the error in the framework of finite element (FE) and domain decomposition (DD) methods. In addition to a fully parallel computation, the proposed lower bounds separate the algebraic error (due to the use of a DD iterative solver) from the discretization error (due to the FE), which enables the steering of the iterative solver by the discretization error. These lower bounds are also used to improve the goal-oriented error estimation in a substructured context. Assessments on 2D static linear mechanic problems illustrate the relevance of the separation of sources of error and the lower bounds' independence from the substructuring. We also steer the iterative solver by an objective of precision on a quantity of interest. This strategy consists in a sequence of solvings and takes advantage of adaptive remeshing and recycling of search directions.Comment: International Journal for Numerical Methods in Engineering, Wiley, 201

    The cosmological simulation code GADGET-2

    Full text link
    We discuss the cosmological simulation code GADGET-2, a new massively parallel TreeSPH code, capable of following a collisionless fluid with the N-body method, and an ideal gas by means of smoothed particle hydrodynamics (SPH). Our implementation of SPH manifestly conserves energy and entropy in regions free of dissipation, while allowing for fully adaptive smoothing lengths. Gravitational forces are computed with a hierarchical multipole expansion, which can optionally be applied in the form of a TreePM algorithm, where only short-range forces are computed with the `tree'-method while long-range forces are determined with Fourier techniques. Time integration is based on a quasi-symplectic scheme where long-range and short-range forces can be integrated with different timesteps. Individual and adaptive short-range timesteps may also be employed. The domain decomposition used in the parallelisation algorithm is based on a space-filling curve, resulting in high flexibility and tree force errors that do not depend on the way the domains are cut. The code is efficient in terms of memory consumption and required communication bandwidth. It has been used to compute the first cosmological N-body simulation with more than 10^10 dark matter particles, reaching a homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has also been used to carry out very large cosmological SPH simulations that account for radiative cooling and star formation, reaching total particle numbers of more than 250 million. We present the algorithms used by the code and discuss their accuracy and performance using a number of test problems. GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code available at http://www.mpa-garching.mpg.de/gadge

    A Parallel Adaptive P3M code with Hierarchical Particle Reordering

    Full text link
    We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

    Scalable parallel simulation of small-scale structures in cold dark matter

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2005.Includes bibliographical references (p. 179-181).We present a parallel implementation of the particle-particle/particle-mesh (PÂłM) algorithm for distributed memory clusters. The llp3m-hc code uses a hybrid method for both computation and domain decomposition. Long-range forces are computed using a Fourier transform gravity solver on a regular mesh; the mesh is distributed across parallel processes using a static one-dimensional slab domain decomposition. Short-range forces are computed by direct summation of close pairs; particles are distributed using a dynamic domain decomposition based on a space-filling Hilbert curve. A nearly-optimal method was devised to dynamically repartition the particle distribution so as to maintain load balance even for extremely inhomogeneous mass distributions. Tests using 800Âł simulations on a 40-processor Beowulf cluster showed good load balance and scalability up to 80 processes. We discuss the limits on scalability imposed by communication and extreme clustering and suggest how they may be removed by extending our algorithm to include a new adaptive PÂłM technique, which we then introduce and present as a new llap3m-hc code. We optimize free parameters of adaptive PÂłM to minimize force errors and the timing required to compute short range forces. We apply our codes to simulate small scale structure of the universe at redshift z > 50. We observe and analyze the formation of caustics in the structure and compare it with the predictions of semi-analytic models of structure formation. The current limits on neutralino detection experiments assume a Maxwell-Boltzmann velocity distribution and smooth spatial distribution of dark matter.(cont.) It is shown in this thesis that inhomogeneous distribution of dark matter on small scales significantly changes the predicted event rates in direct detection dark matter experiments. The effect of spatial inhomogeneity weakens the upper limits on neutralino cross section produced in the Cryogenic Dark Matter Search Experiment.by Alexander V. Shirokov.Ph.D

    A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations

    Full text link
    We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or `mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver that is more commonly used for applications in beam dynamics
    • …
    corecore