316 research outputs found

    Improving MPI Threading Support for Current Hardware Architectures

    Get PDF
    Threading support for Message Passing Interface (MPI) has been defined in the MPI standard for more than twenty years. While many standard-compliance MPI implementations fully support multithreading, the threading support in MPI still cannot provide the optimal performance on the same level as the non-threading environment. The performance disparity leads to low adoption rate from applications, and eventually, lesser interest in optimizing MPI threading support. However, with the current advancement in computation hardware, the number of CPU core per packet is growing drastically. Using shared-memory MPI communication has become more costly. MPI threading without local communication is one of the alternatives and the some interests are shifting back toward threading to MPI.In this work, we investigate different approaches to leverage the power of thread parallelism and tools to help us to raise the multi-threaded MPI performance to reasonable level. We propose a novel multi-threaded MPI benchmark with multiple communication patterns to stress multiple points of the MPI implementation, with the ability to switch between using MPI process and threads for quick comparison between two modes. Enabling the us, and the others MPI developers to stress test their implementation design.We address the interoperability between MPI implementation and threading frameworks by introducing the thread-synchronization object, an object that gives the MPI implementation more control over user-level thread, allowing for more thread utilization in MPI. In our implementation, the synchronization object relieves the lock contention on the internal progress engine and able to achieve up to 7x the performance of the original implementation. Moving forward, we explore the possibility of harnessing the true thread concurrency. We proposed several strategies to address the bottlenecks in MPI implementation. From our evaluation, with our novel threading optimization, we can achieve up to 22x the performance comparing to the legacy MPI designs

    Evaluating worksharing tasks on distributed environments

    Get PDF
    ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hybrid programming is a promising approach to exploit clusters of multicore systems. Our focus is on the combination of MPI and tasking. This hybrid approach combines the low-latency and high throughput of MPI with the flexibility of tasking models and their inherent ability to handle load imbalance. However, combining tasking with standard MPI implementations can be a challenge. The Task-Aware MPI library (TAMPI) eases the development of applications combining tasking with MPI. TAMPI enables developers to overlap computation and communication phases by relying on the tasking data-flow execution model. Using this approach, the original computation that was distributed in many different MPI ranks is grouped together in fewer MPI ranks, and split into several tasks per rank. Nevertheless, programmers must be careful with task granularity. Too fine-grained tasks introduce too much overhead, while too coarse-grained tasks lead to lack of parallelism. An adequate granularity may not always exist, especially in distributed environments where the same amount of work is distributed among many more cores. Worksharing tasks are a special kind of tasks, recently proposed, that internally leverage worksharing techniques. By doing so, a single worksharing task may run in several cores concurrently. Nonetheless, the task management costs remain the same than a regular task. In this work, we study the combination of worksharing tasks and TAMPI on distributed environments using two well known mini-apps: HPCCG and LULESH. Our results show significant improvements using worksharing tasks compared to regular tasks, and to other state-of-the-art alternatives such as OpenMP worksharing.This project is supported by the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No.s 754304 (DEEP-EST) and 823767 (PRACE), the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB) and by the Generalitat de Catalunya (2017-SGR1481). The work has been performed under the Project HPCEUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme; in particular, the author gratefully acknowledges the support of Dr Mark Bull (EPCC) and the computer resources and technical support provided by EPCC.Peer ReviewedPostprint (author's final draft

    A Review on Modern Distributed Computing Paradigms: Cloud Computing, Jungle Computing and Fog Computing

    Get PDF
    The distributed computing attempts to improve performance in large-scale computing problems by resource sharing. Moreover, rising low-cost computing power coupled with advances in communications/networking and the advent of big data, now enables new distributed computing paradigms such as Cloud, Jungle and Fog computing.Cloud computing brings a number of advantages to consumers in terms of accessibility and elasticity. It is based on centralization of resources that possess huge processing power and storage capacities. Fog computing, in contrast, is pushing the frontier of computing away from centralized nodes to the edge of a network, to enable computing at the source of the data. On the other hand, Jungle computing includes a simultaneous combination of clusters, grids, clouds, and so on, in order to gain maximum potential computing power.To understand these new buzzwords, reviewing these paradigms together can be useful. Therefore, this paper describes the advent of new forms of distributed computing. It provides a definition for Cloud, Jungle and Fog computing, and the key characteristics of them are determined. In addition, their architectures are illustrated and, finally, several main use cases are introduced

    Measuring Thread Timing to Assess the Feasibility of Early-bird Message Delivery

    Full text link
    Early-bird communication is a communication/computation overlap technique that combines fine-grained communication with partitioned communication to improve application run-time. Communication is divided among the compute threads such that each individual thread can initiate transmission of its portion of the data as soon as it is complete rather than waiting for all of the threads. However, the benefit of early-bird communication depends on the completion timing of the individual threads. In this paper, we measure and evaluate the potential overlap, the idle time each thread experiences between finishing their computation and the final thread finishing. These measurements help us understand whether a given application could benefit from early-bird communication. We present our technique for gathering this data and evaluate data collected from three proxy applications: MiniFE, MiniMD, and MiniQMC. To characterize the behavior of these workloads, we study the thread timings at both a macro level, i.e., across all threads across all runs of an application, and a micro level, i.e., within a single process of a single run. We observe that these applications exhibit significantly different behavior. While MiniFE and MiniQMC appear to be well-suited for early-bird communication because of their wider thread distribution and more frequent laggard threads, the behavior of MiniMD may limit its ability to leverage early-bird communication

    Distributed Computing in a Pandemic: A Review of Technologies Available for Tackling COVID-19

    Full text link
    The current COVID-19 global pandemic caused by the SARS-CoV-2 betacoronavirus has resulted in over a million deaths and is having a grave socio-economic impact, hence there is an urgency to find solutions to key research challenges. Much of this COVID-19 research depends on distributed computing. In this article, I review distributed architectures -- various types of clusters, grids and clouds -- that can be leveraged to perform these tasks at scale, at high-throughput, with a high degree of parallelism, and which can also be used to work collaboratively. High-performance computing (HPC) clusters will be used to carry out much of this work. Several bigdata processing tasks used in reducing the spread of SARS-CoV-2 require high-throughput approaches, and a variety of tools, which Hadoop and Spark offer, even using commodity hardware. Extremely large-scale COVID-19 research has also utilised some of the world's fastest supercomputers, such as IBM's SUMMIT -- for ensemble docking high-throughput screening against SARS-CoV-2 targets for drug-repurposing, and high-throughput gene analysis -- and Sentinel, an XPE-Cray based system used to explore natural products. Grid computing has facilitated the formation of the world's first Exascale grid computer. This has accelerated COVID-19 research in molecular dynamics simulations of SARS-CoV-2 spike protein interactions through massively-parallel computation and was performed with over 1 million volunteer computing devices using the Folding@home platform. Grids and clouds both can also be used for international collaboration by enabling access to important datasets and providing services that allow researchers to focus on research rather than on time-consuming data-management tasks.Comment: 21 pages (15 excl. refs), 2 figures, 3 table

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    Survey and Analysis of Production Distributed Computing Infrastructures

    Full text link
    This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures
    • …
    corecore