Search CORE

128,030 research outputs found

Adapting the Phylogenetic Program FITCH for Distributed Processing

Author: Dubin Robert A.
Publication venue: Montclair State University Digital Commons
Publication date: 01/03/2007
Field of study

The ability to reconstruct optimal phylogenies (evolutionary trees) based on objective criteria impacts directly on our understanding the relationships among organisms, including human evolution, as well as the spread of infectious disease. Numerous tree construction methods have been implemented for execution on single processors, however inferring large phylogenies using computationally intense algorithms can be beyond the practical capacity of a single processor. Distributed and parallel processing provides a means for overcoming this hurdle. FITCH is a freely available, single-processor implementation of a distance-based, tree-building algorithm commonly used by the biological community. Through an alternating least squares approach to branch length optimization and tree comparison, FITCH iteratively builds up evolutionary trees through species addition and branch rearrangement. To extend the utility of this program, I describe the design, implementation, and performance of mpiFITCH, a parallel processing version of FITCH developed using the Message Passing Interface for message exchange. Balanced load distribution required the conversion of tree generation from recursive linked list traversal to iterative, array-based traversal. Execution of mpiFITCH on a Beowulf cluster running 64 processors revealed maximum performance enhancement of up to ~28 fold with an efficiency of ~ 40%

Montclair State University Digital Commons

Improved neighbor list algorithm in molecular simulations using cell decomposition and data sorting method

Author: Allen
Chialvo
Chialvo
Chialvo
Eisenhauer
Fincham
Frenkel
Glikman
Gui-Rong Liu
Hansen
Hockney
Hoover
Jian-Sheng Wang
Matin
Mattson
Min Cheng
Nośe
Quentrec
Rycerz
Thijssen
Verlet
Walther
Weiser
Zhenhua Yao
Publication venue: 'Elsevier BV'
Publication date: 04/02/2004
Field of study

An improved neighbor list algorithm is proposed to reduce unnecessary interatomic distance calculations in molecular simulations. It combines the advantages of Verlet table and cell linked list algorithms by using cell decomposition approach to accelerate the neighbor list construction speed, and data sorting method to lower the CPU data cache miss rate, as well as partial updating method to minimize the unnecessary reconstruction of the neighbor list. Both serial and parallel performance of molecular dynamics simulation are evaluated using the proposed algorithm and compared with those using conventional Verlet table and cell linked list algorithms. Results show that the new algorithm outperforms the conventional algorithms by a factor of 2~3 in cases of both small and large number of atoms.Comment: 14 pages, 7 figures. Submitted to Computer Physics Communication

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

Efficiency of linked cell algorithms

Author: Allen
Anderson
Auerbach
Benetis
Chialvo
Frenkel
Gonnet
Guido Germano
Heffelfinger
Heinz
Hockney
Kadau
Knuth
Krämer
Liu
Mason
Mattson
Mecke
Meloni
Petrella
Plimpton
Pütz
Quentrec
Rapaport
Shaw
Sutmann
Swope
Tuckerman
Ulrich Welling
Verlet
Wang
Wilson
Wu
Yao
Publication venue: 'Elsevier BV'
Publication date: 07/06/2010
Field of study

The linked cell list algorithm is an essential part of molecular simulation software, both molecular dynamics and Monte Carlo. Though it scales linearly with the number of particles, there has been a constant interest in increasing its efficiency, because a large part of CPU time is spent to identify the interacting particles. Several recent publications proposed improvements to the algorithm and investigated their efficiency by applying them to particular setups. In this publication we develop a general method to evaluate the efficiency of these algorithms, which is mostly independent of the parameters of the simulation, and test it for a number of linked cell list algorithms. We also propose a combination of linked cell reordering and interaction sorting that shows a good efficiency for a broad range of simulation setups.Comment: Submitted to Computer Physics Communications on 22 December 2009, still awaiting a referee repor

arXiv.org e-Print Archive

Crossref

UCL Discovery

An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs

Author: Hailat Eyad
Mick Jason
Potoff Jeffrey
Rushaidat Kamel
Schwiebert Loren
Publication venue
Publication date: 16/08/2014
Field of study

Maximizing the performance potential of the modern day GPU architecture requires judicious utilization of available parallel resources. Although dramatic reductions can often be obtained through straightforward mappings, further performance improvements often require algorithmic redesigns to more closely exploit the target architecture. In this paper, we focus on efficient molecular simulations for the GPU and propose a novel cell list algorithm that better utilizes its parallel resources. Our goal is an efficient GPU implementation of large-scale Monte Carlo simulations for the grand canonical ensemble. This is a particularly challenging application because there is inherently less computation and parallelism than in similar applications with molecular dynamics. Consistent with the results of prior researchers, our simulation results show traditional cell list implementations for Monte Carlo simulations of molecular systems offer effectively no performance improvement for small systems [5, 14], even when porting to the GPU. However for larger systems, the cell list implementation offers significant gains in performance. Furthermore, our novel cell list approach results in better performance for all problem sizes when compared with other GPU implementations with or without cell lists.Comment: 30 page

arXiv.org e-Print Archive

CiteSeerX

A Parallel Adaptive P3M code with Hierarchical Particle Reordering

Author: Anderson
Bagla
Balsara
Barnes
Becciani
Blumenthal
Bode
Boris
Brieu
Couchman
Couchman
Dave
Decyk
Dubinski
Dubinski
Eastwood
Efstathiou
Evrard
Ferrell
Frenk
Frigo
Gingold
Greengard
H.M.P. Couchman
Hernquist
Hernquist
Hockney
Kawata
Kravtsov
Li
Lia
MacFarland
Miocchi
Monaghan
Navarro
Pearce
Robert J. Thacker
Serna
Snir
Spergel
Springel
Springel
Steinmetz
Sugimoto
Swarztrauber
Thacker
Thacker
Thacker
Thacker
Theuns
Vetterling
Wadsley
White
Wisdom
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems

Author: Meyer R.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/11/2013
Field of study

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists is divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time.Benchmark simulations on a typical 12 core machine show that the described algorithm achieves excellent parallel efficiencies above 80 % for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These simulations demonstrate that the algorithm scales well to large numbers of cores.Comment: 12 pages, 8 figure

arXiv.org e-Print Archive

Crossref

LU|ZONE|UL