516 research outputs found

    A Review of Lightweight Thread Approaches for High Performance Computing

    Get PDF
    High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonly-found patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns andthat those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DEAC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft

    A Generic Checkpoint-Restart Mechanism for Virtual Machines

    Full text link
    It is common today to deploy complex software inside a virtual machine (VM). Snapshots provide rapid deployment, migration between hosts, dependability (fault tolerance), and security (insulating a guest VM from the host). Yet, for each virtual machine, the code for snapshots is laboriously developed on a per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for virtual machines. The mechanism is based on a plugin on top of an unmodified user-space checkpoint-restart package, DMTCP. Checkpoint-restart is demonstrated for three virtual machines: Lguest, user-space QEMU, and KVM/QEMU. The plugins for Lguest and KVM/QEMU require just 200 lines of code. The Lguest kernel driver API is augmented by 40 lines of code. DMTCP checkpoints user-space QEMU without any new code. KVM/QEMU, user-space QEMU, and DMTCP need no modification. The design benefits from other DMTCP features and plugins. Experiments demonstrate checkpoint and restart in 0.2 seconds using forked checkpointing, mmap-based fast-restart, and incremental Btrfs-based snapshots

    Programming with process groups: Group and multicast semantics

    Get PDF
    Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects

    Argobots: A Lightweight Low-Level Threading and Tasking Framework

    Get PDF
    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing a rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach

    Analysis of threading libraries for high performance computing

    Get PDF
    © 2020 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] With the appearance of multi-/many core machines, applications and runtime systems have evolved in order to exploit the new on-node concurrency brought by new software paradigms. POSIX threads (Pthreads) was widely-adopted for that purpose and it remains as the most used threading solution in current hardware. Lightweight thread (LWT) libraries emerged as an alternative offering lighter mechanisms to tackle the massive concurrency of current hardware. In this article, we analyze in detail the most representative threading libraries including Pthread- and LWT-based solutions. In addition, to examine the suitability of LWTs for different use cases, we develop a set of microbenchmarks consisting of OpenMP patterns commonly found in current parallel codes, and we compare the results using threading libraries and OpenMP implementations. Moreover, we study the semantics offered by threading libraries in order to expose the similarities among different LWT application programming interfaces and their advantages over Pthreads. This article exposes that LWT libraries outperform solutions based on operating system threads when tasks and nested parallelism are required.The researchers from the Universitat Jaume I and Universitat Politecnica de Valencia were supported by project TIN2014-53495-R of the MINECO and FEDER, and the Generalitat Valenciana fellowship programme Vali+d 2015. Antonio J. Pena is financed by the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Grant No. 749516. This work was partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DE-AC02-06CH11357.Castelló, A.; Mayo Gual, R.; Seo, S.; Balaji, P.; Quintana Ortí, ES.; Peña, AJ. (2020). Analysis of threading libraries for high performance computing. IEEE Transactions on Computers. 69(9):1279-1292. https://doi.org/10.1109/TC.2020.2970706S1279129269

    Holistic debugging - enabling instruction set simulation for software quality assurance

    Get PDF
    We present holistic debugging, a novel method for observing execution of complex and distributed software. It builds on an instruction set simulator, which provides reproducible experiments and non-intrusive probing of state in a distributed system. Instruction set simulators, however, only provide low-level information, so a holistic debugger contains a translation framework that maps this information to higher abstraction level observation tools, such as source code debuggers. We have created Nornir, a proof-of-concept holistic debugger, built on the simulator Simics. For each observed process in the simulated system, Nornir creates an abstraction translation stack, with virtual machine translators that map machine-level storage contents (e.g. physical memory, registers) provided by Simics, to application-level data (e.g. virtual memory contents) by parsing the data structures of operating systems and virtual machines. Nornir includes a modified version of the GNU debugger (GDB), which supports non-intrusive symbolic debugging of distributed applications. Nornir's main interface is a debugger shepherd, a programmable interface that controls multiple debuggers, and allows users to coherently inspect the entire state of heterogeneous, distributed applications. It provides a robust observation platform for construction of new observation tools
    corecore