994 research outputs found

    Auto-Tuning MPI Collective Operations on Large-Scale Parallel Systems

    Get PDF
    MPI libraries are widely used in applications of high performance computing. Yet, effective tuning of MPI collectives on large parallel systems is an outstanding challenge. This process often follows a trial-and-error approach and requires expert insights into the subtle interactions between software and the underlying hardware. This paper presents an empirical approach to choose and switch MPI communication algorithms at runtime to optimize the application performance. We achieve this by first modeling offline, through microbenchmarks, to find how the runtime parameters with different message sizes affect the choice of MPI communication algorithms. We then apply the knowledge to automatically optimize new unseen MPI programs. We evaluate our approach by applying it to NPB and HPCC benchmarks on a 384-node computer cluster of the Tianhe-2 supercomputer. Experimental results show that our approach achieves, on average, 22.7% (up to 40.7%) improvement over the default setting

    Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems

    Get PDF
    MPI libraries are widely used in applications of high performance computing. Yet, effective tuning of MPI colletives on large parallel systems is an outstanding challenge. This process often follows a trial-and-error approach and requires expert insights into the subtle interactions between software and the underlying hardware. This paper presents an empirical approach to choose and switch MPI communication algorithms at runtime to optimize the application performance. We achieve this by first modeling offline, through microbenchmarks, to find how the runtime parameters with different message sizes affect the choice of MPI communication algorithms. We then apply the knowledge to automatically optimize new unseen MPI programs. We evaluate our approach by applying it to NPB and HPCC benchmarks on a 384-node computer cluster of the Tianhe-2 supercomputer. Experimental results show that our approach achieves, on average, 22.7% (up to 40.7%) improvement over the default setting

    Extensions of an Empirical Automated Tuning Framework

    Get PDF
    Empirical auto-tuning has been successfully applied to scientific computing applications and web-based cluster servers over the last few years. However, few studies are focused on applying this method on optimizing the performance of database systems. In this thesis, we present a strategy that uses Active Harmony, an empirical automated tuning framework to optimize the throughput of PostgreSQL server by tuning its settings such as memory and buffer sizes. We used Nelder-Mead simplex method as the search engine, and we showed how our strategy performs compared to the hand-tuned and default results. Another part of this thesis focuses on using data from prior runs of auto-tuning. Prior data has been proved to be useful in many cases, such as modeling the search space or finding a good starting point for hill-climbing. We present several methods that were developed to manage the prior data in Active Harmony. Our intention was to provide tuners a complete set of information for their tuning tasks
    corecore