427 research outputs found

    Tracing Communications and Computational Workload in LJS (Lennard-Jones with Spatial Decomposition)

    Get PDF
    LJS (Lennard-Jones with Spatial decomposition) is a molecular dynamics application developed by Steve Plimpton at Sandia National Laboratories [1]. It performs thermodynamic simulations of a system containing fixed large number (millions) of atoms or molecules confined within a regular, three-dimensional domain. Since the simulations model interactions on atomic scale, the computations carried out in a single timestep (iteration) correspond to femtoseconds of the real time. Hence, a meaningful simulation of the evolution of the system's state typically requires a large number (thousands and more) of timesteps. The particles in LJS are represented as material points subjected to forces resulting from interactions with other particles. While the general case involves N-body solvers, LJS implements only pair-wise material point interactions using derivative of Lennard-Jones potential energy for each particle pair to evaluate the acting forces. The velocities and positions of particles are updated by integrating Newton's equations (classical molecular dynamics). The interaction range depends on the modeled problem type; LJS focuses on short-range forces, implementing a cutoff distance rc outside which the interactions are ignored. The computational complexity of O(N2), characteristic for systems with long-range interactions, is therefore substantially alleviated. LJS deploys spatial decomposition of the domain volume to distribute the computations across the available processors on a parallel computer. The decomposition process uniformly divides parallelepiped containing all particles into volumes equal in size and as close in shape to a cube as possible, assigning each of such formed cells to a CPU. The correctness of computations requires the positions of some particles (depending on the value of rc) residing in the neighboring cells to be known to the local process. This information is exchanged in every timestep via explicit communication with the neighbor nodes in all three dimensions (for details see [2]). LJS also takes the advantage of the third Newton's law to calculate the force only once per particle pair; if the involved particles belong to cells located on different processors, the results are forwarded to the other node in a "reverse communication" phase. Besides communications occurring in every iteration, additional messages are sent once every preset number of timesteps. Their purpose is to adjust cell assignments of particles due to their movement. To minimize the overhead of the construction of particle neighbor lists, LJS replaces rc with extended cutoff radius rs (rs > rc), which accounts for possible particle movement before any list updates need to be carried out. Due to a relatively small impact of that phase on the overall behavior of the application, we ignored it in our analysis

    SFC-based Communication Metadata Encoding for Adaptive Mesh

    Get PDF
    This volume of the series “Advances in Parallel Computing” contains the proceedings of the International Conference on Parallel Programming – ParCo 2013 – held from 10 to 13 September 2013 in Garching, Germany. The conference was hosted by the Technische Universität München (Department of Informatics) and the Leibniz Supercomputing Centre.The present paper studies two adaptive mesh refinement (AMR) codes whose grids rely on recursive subdivison in combination with space-filling curves (SFCs). A non-overlapping domain decomposition based upon these SFCs yields several well-known advantageous properties with respect to communication demands, balancing, and partition connectivity. However, the administration of the meta data, i.e. to track which partitions exchange data in which cardinality, is nontrivial due to the SFC’s fractal meandering and the dynamic adaptivity. We introduce an analysed tree grammar for the meta data that restricts it without loss of information hierarchically along the subdivision tree and applies run length encoding. Hence, its meta data memory footprint is very small, and it can be computed and maintained on-the-fly even for permanently changing grids. It facilitates a forkjoin pattern for shared data parallelism. And it facilitates replicated data parallelism tackling latency and bandwidth constraints respectively due to communication in the background and reduces memory requirements by avoiding adjacency information stored per element. We demonstrate this at hands of shared and distributed parallelized domain decompositions.This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing (SFB/TR 89). It is partially based on work supported by Award No. UK-c0020, made by the King Abdullah University of Science and Technology (KAUST)

    Bootstrapping Real-world Deployment of Future Internet Architectures

    Full text link
    The past decade has seen many proposals for future Internet architectures. Most of these proposals require substantial changes to the current networking infrastructure and end-user devices, resulting in a failure to move from theory to real-world deployment. This paper describes one possible strategy for bootstrapping the initial deployment of future Internet architectures by focusing on providing high availability as an incentive for early adopters. Through large-scale simulation and real-world implementation, we show that with only a small number of adopting ISPs, customers can obtain high availability guarantees. We discuss design, implementation, and evaluation of an availability device that allows customers to bridge into the future Internet architecture without modifications to their existing infrastructure

    Lock-free Concurrent Data Structures

    Full text link
    Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin
    • …
    corecore