516 research outputs found
A Review of Lightweight Thread Approaches for High Performance Computing
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonly-found patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns andthat those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced
Scientific Computing Research (SC-21), under contract DEAC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft
A Generic Checkpoint-Restart Mechanism for Virtual Machines
It is common today to deploy complex software inside a virtual machine (VM).
Snapshots provide rapid deployment, migration between hosts, dependability
(fault tolerance), and security (insulating a guest VM from the host). Yet, for
each virtual machine, the code for snapshots is laboriously developed on a
per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for
virtual machines. The mechanism is based on a plugin on top of an unmodified
user-space checkpoint-restart package, DMTCP. Checkpoint-restart is
demonstrated for three virtual machines: Lguest, user-space QEMU, and KVM/QEMU.
The plugins for Lguest and KVM/QEMU require just 200 lines of code. The Lguest
kernel driver API is augmented by 40 lines of code. DMTCP checkpoints
user-space QEMU without any new code. KVM/QEMU, user-space QEMU, and DMTCP need
no modification. The design benefits from other DMTCP features and plugins.
Experiments demonstrate checkpoint and restart in 0.2 seconds using forked
checkpointing, mmap-based fast-restart, and incremental Btrfs-based snapshots
Programming with process groups: Group and multicast semantics
Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects
Argobots: A Lightweight Low-Level Threading and Tasking Framework
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing a rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach
Recommended from our members
Secure Isolation and Migration of Untrusted Legacy Applications
Sting applications often contain security holes that are not patched until after the system has already been compromised. Even when software updates are applied to address security issues, they often result in system services being unavailable for some time. To address these system security and availability issues, we have developed peas and pods. A pea provides a least privilege environment that can restrict processes to the minimal subset of system resources needed to run. This mechanism enables the creation of environments for privileged program execution that can help with intrusion prevention and containment. A pod provides a group of processes and associated users with a consistent, machine-independent virtualized environment. Pods are coupled with a novel checkpoint-restart mechanism which allows processes to be migrated across minor operating system kernel versions with different security patches. This mechanism allows system administrators the flexibility to patch their operating systems immediately without worrying over potential loss of data or needing to schedule system downtime. We have implemented peas and pods in Linux without requiring any application or operating system kernel changes. Our measurements on real world desktop and server applications demonstrate that peas and pods impose little overhead and enable secure isolation and migration of untrusted applications
Recommended from our members
Performance of Size-Changing Algorithms in Stackable File Systems
Stackable file systems can provide extensible file system functionality with minimal performance overhead and development cost. However, previous approaches are limited in the functionality they provide. In particular, they do not support size-changing algorithms, which are important and useful for many applications, such as compression and security. We propose fast index files, a technique for efficient support of size-changing algorithms in stackable file systems. Fast index files provide a page mapping between file system layers in a way that can be used with any size-changing algorithm. Index files are designed to be recoverable if lost and add less than 0.1\% disk space overhead. We have implemented fast indexing using portable stackable templates, and we have used this system to build several example file systems with size-changing algorithms. We demonstrate that fast index files have very low overhead for typical workloads, only 2.3\% over other stacked file systems. Our system can deliver much better performance on size-changing algorithms than user-level applications, as much as five times faster
Analysis of threading libraries for high performance computing
© 2020 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] With the appearance of multi-/many core machines, applications and runtime systems have evolved in order to exploit the new on-node concurrency brought by new software paradigms. POSIX threads (Pthreads) was widely-adopted for that purpose and it remains as the most used threading solution in current hardware. Lightweight thread (LWT) libraries emerged as an alternative offering lighter mechanisms to tackle the massive concurrency of current hardware. In this article, we analyze in detail the most representative threading libraries including Pthread- and LWT-based solutions. In addition, to examine the suitability of LWTs for different use cases, we develop a set of microbenchmarks consisting of OpenMP patterns commonly found in current parallel codes, and we compare the results using threading libraries and OpenMP implementations. Moreover, we study the semantics offered by threading libraries in order to expose the similarities among different LWT application programming interfaces and their advantages over Pthreads. This article exposes that LWT libraries outperform solutions based on operating system threads when tasks and nested parallelism are required.The researchers from the Universitat Jaume I and Universitat Politecnica de Valencia were supported by project TIN2014-53495-R of the MINECO and FEDER, and the Generalitat Valenciana fellowship programme Vali+d 2015. Antonio J. Pena is financed by the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Grant No. 749516. This work was partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DE-AC02-06CH11357.Castelló, A.; Mayo Gual, R.; Seo, S.; Balaji, P.; Quintana Ortí, ES.; Peña, AJ. (2020). Analysis of threading libraries for high performance computing. IEEE Transactions on Computers. 69(9):1279-1292. https://doi.org/10.1109/TC.2020.2970706S1279129269
Holistic debugging - enabling instruction set simulation for software quality assurance
We present holistic debugging, a novel method for observing execution of complex and distributed software. It builds on an instruction set simulator, which provides reproducible experiments and non-intrusive probing of state in a distributed system. Instruction set simulators, however, only provide low-level information, so a holistic debugger contains a translation framework that maps this information to higher abstraction level observation tools, such as source code debuggers. We have created Nornir, a proof-of-concept holistic debugger, built on the simulator Simics. For each observed process in the simulated system, Nornir creates an abstraction translation stack, with virtual machine translators that map machine-level storage contents (e.g. physical memory, registers) provided by Simics, to application-level data (e.g. virtual memory contents) by parsing the data structures of operating systems and virtual machines. Nornir includes a modified version of the GNU debugger (GDB), which supports non-intrusive symbolic debugging of distributed applications. Nornir's main interface is a debugger shepherd, a programmable interface that controls multiple debuggers, and allows users to coherently inspect the entire state of heterogeneous, distributed applications. It provides a robust observation platform for construction of new observation tools
- …