5 research outputs found

    Performance evaluation of grid-enabled code: a case study

    No full text
    Abstract—This paper presents the performance analysis of a grid-enabled MIP solver in a grid environment consisting of three clusters on a campus LAN. In particular, the paper focuses on the analysis of the behavior of the application using two networks connecting the clusters with different latency and bandwidth. I

    Enhancing Checkpoint Performance with Staging IO and SSD

    No full text
    With the ever-growing size of computer clusters and applications, system failures are becoming inevitable. Checkpointing, a strategy to ensure fault tolerance, has become imperative in such an environment. How-ever existing mechanism of checkpoint writing to par-allel file systems doesn’t perform well with increasing job size. Solid State Disk(SSD) is attracting more and more attention due to its technical merits such as good random access performance, low power consumption and shock resistance. However, how to apply SSDs into a parallel storage system to improve checkpoint writing still remains an open question. In this paper we propose a new strategy to en-hance checkpoint writing performance by aggregating checkpoint writing at client side, and utilizing staging IO on data servers. We also explore the potentials to substitute traditional hard disks with SSDs on data server to achieve better write bandwidth. Our strat-egy achieves up to 6.3 times higher write bandwidth than a popular parallel file system PVFS2 [6] with 8 client nodes and 4 data servers. In experiments with real applications using 64 application processes and 4 data servers, our strategy can accelerate checkpoint writing by up to 9.9 times compared to PVFS2.
    corecore