3 research outputs found

    Fast and generic concurrent message-passing

    Get PDF
    Communication hardware and software have a significant impact on the performance of clusters and supercomputers. Message passing model and the Message-Passing Interface (MPI) is a widely used model of communications in the High-Performance Computing (HPC) community with great success. However, it has recently faced new challenges due to the emergence of many-core architecture and of programming models with dynamic task parallelism, assuming a large number of concurrent, light-weight threads. These applications come from important classes of applications such as graph and data analytics. Using MPI with these languages/runtimes is inefficient because MPI implementation is not able to perform well with threads. Using MPI as a communication middleware is also not efficient since MPI has to provide many abstractions that are not needed for many of the frameworks, thus having extra overheads. In this thesis, we studied MPI performance under the new assumptions. We identified several factors in the message-passing model which were inherently problematic for scalability and performance. Next, we analyzed the communication of a number of graph, threading and data-flow frameworks to identify generic patterns. We then proposed a low-level communication interface (LCI) to bridge the gap between communication architecture and runtime. The core of our idea is to attach to each message a few simple operations which fit better with the current hardware and can be implemented efficiently. We show that with only a few carefully chosen primitives and appropriate design, message-passing under this interface can easily outperform production MPI when running atop of multi-threaded environment. Further, using LCI is simple for various types of usage
    corecore