Search CORE

25,488 research outputs found

MP-LOCKs: Replacing hardware synchronization primitives with message passing

Author: Carter John B.
Kuo Chen-Chi
Publication venue: University of Utah
Publication date: 01/05/2011
Field of study

Journal ArticleShared memory programs guarantee the correctness of concurrent accesses to shared data using interprocessor synchronization operations. The most common synchronization operators are locks, which are traditionally implemented in user-level libraries via a mix of shared memory accesses and hardware synchronization primitives like test-and-set. In this paper, we argue that synchronization operations implemented using fast message passing and kernel-embedded lock managers are an attractive alternative to dedicated synchronization hardware. We propose three message passing lock (MP-LOCK) algorithms (centralized, distributed, and reactive) and provide guidelines for implementing them efficiently. MP-LOCKs redice tje design complexity and runtime occupancy of DSM controllers and can exploit software's inherent flexibility to adapt to differing applications lock access patterns. We compared the performance of MP-LOCKs with two common shared memory lock algorithms: test-and-set and MCS locks and found that MP-LOCKs scale better. For machines with 16 to 32 nides, applications using MP-LOCKs ran up to 186% faster than the same applications with shared memory locks. For small systems (up to 8 nodes), MP-LOCK performance lags shared memory lock performance due to the higher software overhead. However, three of the MP-LOCK applications slow down by no more than 18%, while the other two slowed by no more than 180%. Given these results, we conclude that locks based on message passing should be considered as a replacement for hardware locks in future scalable multiprocessors that supports efficient message passing mechanisms. In addition, it is possible to implement efficient software synchronization primitives in clusters of workstations by using the guidelines we proposed

The University of Utah: J. Willard Marriott Digital Library

Parallel Implementation of AES using XTS Mode of Operation

Author: Shrestha Muna
Publication venue: The Repository at St. Cloud State
Publication date: 01/05/2018
Field of study

Data encryption is essential for protecting data from unauthorized access. The Advanced Encryption Standard (AES), among many encryption algorithms, is the most popular algorithm currently employed to secure static and dynamic data. There are several modes of AES operation. Each of these modes defines a unique way to perform data encryption. XTS mode is the latest mode developed to protect data stored in hard-disk-like sector-based storage devices. A recent increase in the rate of data breaches has triggered the necessity to encrypt stored data as well. AES encryption, however, is a complex process. As it involves a lot of computations, encrypting huge amount of data would undoubtedly be computationally intensive. Parallel computers have been used mostly in high-performance computation research to solve computationally intensive problems. Parallel systems are currently gaining popularity configured as general purpose multi-core system, even at a desktop level. Several programming models have been developed to assist the writing of parallel programs, and some have already been used to parallelize AES. As a result, AES data encryption has become more efficient and applicable. The message passing model is a popular parallel communication/synchronization model with an early origin. Message Passing Interface (MPI) is the first standardized, vendor-independent, message passing library interface that establishes a portable, efficient, and flexible standard for message passing during computation. Therefore, this paper describes an implementation of AES using XTS mode in parallel via MPI

St. Cloud State University

A comprehensive approach in performance evaluation for modernreal-time operating systems

Author: Conde Jesús F.
García-Martínez Alberto
Viña Ángel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

In real-time computing the accurate characterization of the performance and determinism that a particular real-time operating system/hardware combination can provide for real-time applications is essential. This issue is not properly addressed by existing performance metrics mainly due to the lack of completeness and generalization. In this paper we present a set of comprehensive, easy-to-implement and useful metrics covering three basic real-time operating system features: response to external events, intertask synchronization and resource sharing, and intertask data transferring. The evaluation of real-time operating systems using a set of fine-grained metrics is fundamental to guarantee that we can reach the required determinism in real-world applications.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Exploring Fully Offloaded GPU Stream-Aware Message Passing

Author: Kandalla Krishna
Kaplan Larry
Namashivayam Naveen
Pagel Mark
White III James B
Publication venue
Publication date: 27/06/2023
Field of study

Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs, and high-speed network interconnects. Communication libraries supporting efficient data transfers involving memory buffers from the GPU memory typically require the CPU to orchestrate the data transfer operations. A new offload-friendly communication strategy, stream-triggered (ST) communication, was explored to allow offloading the synchronization and data movement operations from the CPU to the GPU. A Message Passing Interface (MPI) one-sided active target synchronization based implementation was used as an exemplar to illustrate the proposed strategy. A latency-sensitive nearest neighbor microbenchmark was used to explore the various performance aspects of the implementation. The offloaded implementation shows significant on-node performance advantages over standard MPI active RMA (36%) and point-to-point (61%) communication. The current multi-node improvement is less (23% faster than standard active RMA but 11% slower than point-to-point), but plans are in progress to purse further improvements.Comment: 12 pages, 17 figure

arXiv.org e-Print Archive