128,030 research outputs found

    Adapting the Phylogenetic Program FITCH for Distributed Processing

    Get PDF
    The ability to reconstruct optimal phylogenies (evolutionary trees) based on objective criteria impacts directly on our understanding the relationships among organisms, including human evolution, as well as the spread of infectious disease. Numerous tree construction methods have been implemented for execution on single processors, however inferring large phylogenies using computationally intense algorithms can be beyond the practical capacity of a single processor. Distributed and parallel processing provides a means for overcoming this hurdle. FITCH is a freely available, single-processor implementation of a distance-based, tree-building algorithm commonly used by the biological community. Through an alternating least squares approach to branch length optimization and tree comparison, FITCH iteratively builds up evolutionary trees through species addition and branch rearrangement. To extend the utility of this program, I describe the design, implementation, and performance of mpiFITCH, a parallel processing version of FITCH developed using the Message Passing Interface for message exchange. Balanced load distribution required the conversion of tree generation from recursive linked list traversal to iterative, array-based traversal. Execution of mpiFITCH on a Beowulf cluster running 64 processors revealed maximum performance enhancement of up to ~28 fold with an efficiency of ~ 40%

    Improved neighbor list algorithm in molecular simulations using cell decomposition and data sorting method

    Full text link
    An improved neighbor list algorithm is proposed to reduce unnecessary interatomic distance calculations in molecular simulations. It combines the advantages of Verlet table and cell linked list algorithms by using cell decomposition approach to accelerate the neighbor list construction speed, and data sorting method to lower the CPU data cache miss rate, as well as partial updating method to minimize the unnecessary reconstruction of the neighbor list. Both serial and parallel performance of molecular dynamics simulation are evaluated using the proposed algorithm and compared with those using conventional Verlet table and cell linked list algorithms. Results show that the new algorithm outperforms the conventional algorithms by a factor of 2~3 in cases of both small and large number of atoms.Comment: 14 pages, 7 figures. Submitted to Computer Physics Communication

    Efficiency of linked cell algorithms

    Get PDF
    The linked cell list algorithm is an essential part of molecular simulation software, both molecular dynamics and Monte Carlo. Though it scales linearly with the number of particles, there has been a constant interest in increasing its efficiency, because a large part of CPU time is spent to identify the interacting particles. Several recent publications proposed improvements to the algorithm and investigated their efficiency by applying them to particular setups. In this publication we develop a general method to evaluate the efficiency of these algorithms, which is mostly independent of the parameters of the simulation, and test it for a number of linked cell list algorithms. We also propose a combination of linked cell reordering and interaction sorting that shows a good efficiency for a broad range of simulation setups.Comment: Submitted to Computer Physics Communications on 22 December 2009, still awaiting a referee repor

    An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs

    Full text link
    Maximizing the performance potential of the modern day GPU architecture requires judicious utilization of available parallel resources. Although dramatic reductions can often be obtained through straightforward mappings, further performance improvements often require algorithmic redesigns to more closely exploit the target architecture. In this paper, we focus on efficient molecular simulations for the GPU and propose a novel cell list algorithm that better utilizes its parallel resources. Our goal is an efficient GPU implementation of large-scale Monte Carlo simulations for the grand canonical ensemble. This is a particularly challenging application because there is inherently less computation and parallelism than in similar applications with molecular dynamics. Consistent with the results of prior researchers, our simulation results show traditional cell list implementations for Monte Carlo simulations of molecular systems offer effectively no performance improvement for small systems [5, 14], even when porting to the GPU. However for larger systems, the cell list implementation offers significant gains in performance. Furthermore, our novel cell list approach results in better performance for all problem sizes when compared with other GPU implementations with or without cell lists.Comment: 30 page

    A Parallel Adaptive P3M code with Hierarchical Particle Reordering

    Full text link
    We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

    Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems

    Get PDF
    This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists is divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time.Benchmark simulations on a typical 12 core machine show that the described algorithm achieves excellent parallel efficiencies above 80 % for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These simulations demonstrate that the algorithm scales well to large numbers of cores.Comment: 12 pages, 8 figure
    corecore