98 research outputs found

    ColDICE: a parallel Vlasov-Poisson solver using moving adaptive simplicial tessellation

    Full text link
    Resolving numerically Vlasov-Poisson equations for initially cold systems can be reduced to following the evolution of a three-dimensional sheet evolving in six-dimensional phase-space. We describe a public parallel numerical algorithm consisting in representing the phase-space sheet with a conforming, self-adaptive simplicial tessellation of which the vertices follow the Lagrangian equations of motion. The algorithm is implemented both in six- and four-dimensional phase-space. Refinement of the tessellation mesh is performed using the bisection method and a local representation of the phase-space sheet at second order relying on additional tracers created when needed at runtime. In order to preserve in the best way the Hamiltonian nature of the system, refinement is anisotropic and constrained by measurements of local Poincar\'e invariants. Resolution of Poisson equation is performed using the fast Fourier method on a regular rectangular grid, similarly to particle in cells codes. To compute the density projected onto this grid, the intersection of the tessellation and the grid is calculated using the method of Franklin and Kankanhalli (1993) generalised to linear order. As preliminary tests of the code, we study in four dimensional phase-space the evolution of an initially small patch in a chaotic potential and the cosmological collapse of a fluctuation composed of two sinusoidal waves. We also perform a "warm" dark matter simulation in six-dimensional phase-space that we use to check the parallel scaling of the code.Comment: Code and illustration movies available at: http://www.vlasix.org/index.php?n=Main.ColDICE - Article submitted to Journal of Computational Physic

    Moleclar-dynamics simulations using spatial decomposition and task-based parallelism

    Get PDF
    Molecular Dynamics (MD) simulations are an integral method in the computational studies of materials. This thesis discusses an algorithm for large-scale MD simulations using modern multiand many-core systems on distributed computing networks. In order to utilize the full processing power of these systems, algorithms must be updated to account for newer hardware, such as the many-core Intel Xeon Phi co-processor. The hybrid method is a data-parallel method of parallelization which combines spatial decomposition using the Message Passing Interface (MPI) to distribute the system onto multiple nodes, along with the cell-task method used for task based parallelism on each node. This allows for the improved performance of task based parallelism on single compute nodes in addition to the benefit of distributed computing allowed by MPI. Results from benchmark simulations on Intel Xeon multi-core processors, and Intel Xeon Phi coprocessors are presented. Results show that the hybrid method provides better performance than either spatial decomposition or cell-task methods alone on single nodes, and that the hybrid method outperforms the spatial decomposition method on multiple nodes, on a variety of system configurations.Master of Science (MSc) in Computational Science

    Algorithmische und Code-Optimierungen Molekulardynamiksimulationen für Verfahrenstechnik

    Get PDF
    The focus of this work lies on implementational improvements and, in particular, node-level performance optimization of the simulation software ls1-mardyn. Through data structure improvements, SIMD vectorization and, especially, OpenMP parallelization, the world’s first simulation of 2*1013 molecules at over 1 PFLOP/sec was enabled. To allow for long-range interactions, the Fast Multipole Method was introduced to ls1-mardyn. The algorithm was optimized for sequential, shared-memory, and distributed-memory execution on up to 32,768 MPI processes.Der Fokus dieser Arbeit liegt auf Code-Optimierungen und insbesondere Leistungsoptimierung auf Knoten-Ebene für die Simulationssoftware ls1-mardyn. Durch verbesserte Datenstrukturen, SIMD-Vektorisierung und vor allem OpenMP-Parallelisierung wurde die weltweit erste Petaflop-Simulation von 2*1013 Molekülen ermöglicht. Zur Simulation von langreichweitigen Wechselwirkungen wurde die Fast-Multipole-Methode in ls1-mardyn eingeführt. Sequenzielle, Shared- und Distributed-Memory-Optimierungen wurden angewandt und erlaubten eine Ausführung auf bis zu 32768 MPI-Prozessen

    Predictive Modelling of Tribological Systems using Movable Cellular Automata

    Get PDF
    In the science of tribology, where there is an enormous degree of uncertainty, mathematical models that convey state-of-the-art scientific knowledge are invaluable tools for unveiling the underlying phenomena. A well-structured modelling framework that guarantees a connection between mathematical representations and experimental observations, can help in the systematic identification of the most realistic hypotheses among a pool of possibilities. This thesis is concerned with identifying the most appropriate computational model for the prediction of friction and wear in tribological applications, and the development of a predictive model and simulation tool based on the identified method. Accordingly, a thorough review of the literature has been conducted to find the most appropriate approach for predicting friction and wear using computer simulations, with the multi-scale approach in mind. It was concluded that the Movable Cellular Automata (MCA) method is the most suitable method for multi-scale modelling of tribological systems. It has been established from the state-of-the-art review in Chapter 2 of this thesis, that it is essential to be able to model continuous as well as discontinuous behaviour of materials on a range of scales from atomistic to micro scales to be able to simulate the first-bodies and third body simultaneously (also known as a multi-body) in a tribological system. This can only be done using a multi-scale particle-based method because continuum methods such as FEM are none-predictive and are not capable of describing the discontinuous nature of materials on the micro scale. The most important and well-known particle-based methods are molecular dynamics (MD) and the discrete element methods (DEM). Although MD has been widely used to simulate elastic and plastic deformation of materials, it is limited to the atomistic and nanoscales and cannot be used to simulate materials on the macro-scale. On the other hand, DEM is capable of simulating materials on the meso/micro scales and has been expanded since the algorithm was first proposed by Cundall and Strack, in 1979 and adopted by a number of scientific and engineering disciplines. However, it is limited to the simulation of granular materials and elastic brittle solid materials due to its contact configurations and laws. Even with the use of bond models to simulate cohesive and plastic materials, it shows major limitations with parametric estimations and validation against experimental results because its contact laws use parameters that cannot be directly obtained from the material properties or from experiments. The MCA method solves these problems using a hybrid technique, combining advantages of the classical cellular automata method and molecular dynamics and forming a model for simulating elasticity, plasticity and fracture in ductile consolidated materials. It covers both the meso and micro scales, and can even “theoretically” be used on the nano scale if the simulation tool is computationally powerful enough. A distinguishing feature of the MCA method is the description of interaction of forces between automata in terms of stress tensor components. This way a direct relationship between the MCA model parameters of particle interactions and tensor parameters of material constitutive law is established. This makes it possible to directly simulate materials and to implement different models and criteria of elasticity, plasticity and fracture, and describe elastic-plastic deformation using the theory of plastic flow. Hence, in MCA there is no need for parametric fitting because all model parameters can be directly obtained from the material mechanical properties. To model surfaces in contact and friction behaviour using MCA, the particle size can be chosen large enough to consider the contacting surface as a rough plane, which is the approach used in all MCA studies of contacting surfaces so far. The other approach is to specify a very small particle size so that it can directly simulate a real surface, which allows for the direct investigation of material behaviour and processes on all three scale levels (atomic, meso and macro) in an explicit form. This has still been proven difficult to do because it is too computationally extensive and only a small area of the contact can be simulated due to the high numbers of particles required to simulate a real solid. Furthermore, until now, no commercial software is available for MCA simulations, only a 2D MCA demo-version which was developed by the Laboratory of CAD of Materials at the Institute of Strength Physics and Materials Science in Tomsk, Russia, in 2005. The developers of the MCA method use their own in-house codes. This thesis presents the successful development of a 3D MCA open-source software for the scientific and tribology communities to use. This was done by implementing the MCA method within the framework of the open-source code LIGGGHTS. It follows the formulations of the 3D elastic-plastic model developed by the authors including Sergey G. Psakhie, Valentin L. Popov, Evgeny V. Shilko, and the external supervisor on this thesis Alexey Yu. Smolin, which has been successfully implemented in the open-source code LIGGGHTS. Details of the mathematical formulations can be found in [1]–[3], and section 3.5 of this thesis. The MCA model has been successfully implemented to simulate ductile consolidated materials. Specifically, new interaction laws were implemented, as well as features related to particle packing, particle interaction forces, bonding of particles, and others. The model has also been successfully verified, validated, and used in simulating indentation. The validation against experimental results showed that using the developed model, correct material mechanical response can be simulated using direct macroscopic mechanical material properties. The implemented code still shows limitations in terms of computational capacity because the parallelization of the code has not been completely implemented yet. Nevertheless, this thesis extends the capabilities of LIGGGHTS software to provide an open-source tool for using the MCA method to simulate solid material deformation behaviour. It also significantly increases the potential of using MCA in an HPC environment, producing results otherwise difficult to obtain

    Improving performance and maintainability through refactoring in C++11

    Get PDF
    Abstraction based programming has been traditionally seen as an approach that improves software quality at the cost of losing performance. In this paper, we explore the cost of abstraction by transforming the PARSEC benchmark uidanimate application from low-level, hand-optimized C to a higher-level and more general C++ version that is a more direct representation of the algorithms. We eliminate global variables and constants, use vectors of a user-de ned particle type rather than vectors of built-in types, and separate the concurrency model from the application model. The result is a C++ program that is smaller, less complex, and measurably faster than the original. The benchmark was chosen to be representative of many applications and our transformations are systematic and based on principles. Consequently, our techniques can be used to improve the performance, exibility, and maintainability of a large class of programs. The handling of concurrency issues has been collected into a small new library, YAPL.J. Daniel Garcia's work was partially supported by Fundación CajaMadrid through their grant programme for Madrid University Professors. Bjarne Stroustrup's work was partially supported by NSF grant #083319

    SKIRT: hybrid parallelization of radiative transfer simulations

    Full text link
    We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modeling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behavior of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.Comment: 21 pages, 20 figure

    Evaluating the performance of legacy applications on emerging parallel architectures

    Get PDF
    The gap between a supercomputer's theoretical maximum (\peak") oatingpoint performance and that actually achieved by applications has grown wider over time. Today, a typical scientific application achieves only 5{20% of any given machine's peak processing capability, and this gap leaves room for significant improvements in execution times. This problem is most pronounced for modern \accelerator" architectures { collections of hundreds of simple, low-clocked cores capable of executing the same instruction on dozens of pieces of data simultaneously. This is a significant change from the low number of high-clocked cores found in traditional CPUs, and effective utilisation of accelerators typically requires extensive code and algorithmic changes. In many cases, the best way in which to map a parallel workload to these new architectures is unclear. The principle focus of the work presented in this thesis is the evaluation of emerging parallel architectures (specifically, modern CPUs, GPUs and Intel MIC) for two benchmark codes { the LU benchmark from the NAS Parallel Benchmark Suite and Sandia's miniMD benchmark { which exhibit complex parallel behaviours that are representative of many scientific applications. Using combinations of low-level intrinsic functions, OpenMP, CUDA and MPI, we demonstrate performance improvements of up to 7x for these workloads. We also detail a code development methodology that permits application developers to target multiple architecture types without maintaining completely separate implementations for each platform. Using OpenCL, we develop performance portable implementations of the LU and miniMD benchmarks that are faster than the original codes, and at most 2x slower than versions highly-tuned for particular hardware. Finally, we demonstrate the importance of evaluating architectures at scale (as opposed to on single nodes) through performance modelling techniques, highlighting the problems associated with strong-scaling on emerging accelerator architectures
    corecore