2 research outputs found

    Parallel Performance of Molecular Dynamics Trajectory Analysis

    Full text link
    The performance of biomolecular molecular dynamics simulations has steadily increased on modern high performance computing resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations is becoming a bottleneck. To close this gap, we studied the performance of parallel trajectory analysis with MPI and the Python MDAnalysis library on three different XSEDE supercomputers where trajectories were read from a Lustre parallel file system. Strong scaling performance was impeded by stragglers, MPI processes that were slower than the typical process. Stragglers were less prevalent for compute-bound workloads, thus pointing to file reading as a bottleneck for scaling. However, a more complicated picture emerged in which both the computation and the data ingestion exhibited close to ideal strong scaling behavior whereas stragglers were primarily caused by either large MPI communication costs or long times to open the single shared trajectory file. We improved overall strong scaling performance by either subfiling (splitting the trajectory into separate files) or MPI-IO with Parallel HDF5 trajectory files. The parallel HDF5 approach resulted in near ideal strong scaling on up to 384 cores (16 nodes), thus reducing trajectory analysis times by two orders of magnitude compared to the serial approach.Comment: accepted manuscript, to appear in 'Concurrency and Computation: Practice and Experience

    Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

    No full text
    As a new generation of parallel supercomputers enables researchers to conduct scientific simulations of unprecedented scale and resolution, terabyte-scale simulation output has become increasingly commonplace. Analysis of such massive data sets is typically I/O-bound: many parallel analysis programs spend most of their execution time reading data from disk rather than performing useful computation. To overcome this I/O bottleneck, we have developed a new data access method. Our main idea is to cache a copy of simulation output files on the local disks of an analysis cluster’s compute nodes, and to use a novel task-assignment protocol to co-locate data access with computation. We have implemented our methodology in a parallel disk cache system called Zazen. By avoiding the overhead associated with querying metadata servers and by reading data in parallel from local disks, Zazen is able to deliver a sustained read bandwidth of over 20 gigabytes per second on a commodity Linux cluster with 100 nodes, approaching the optimal aggregated I/O bandwidth attainable on these nodes. Compared with conventional NFS, PVFS2, and Hadoop/HDFS, respectively, Zazen is 75, 18, and 6 times faster for accessing large (1-GB) files, and 25, 13, and 85 times faster for accessing small (2-MB) files. We have deployed Zazen in conjunction with Anton—a special-purpose supercomputer that dramatically accelerates molecular dynamics (MD) simulations—and have been able to accelerate the parallel analysis of terabyte-scale MD trajectories by about an order of magnitude.
    corecore