Search CORE

68,091 research outputs found

Synchronization of processes

Author: Berztiss Alfs T.
Publication venue: 'Sociological Research Online'
Publication date: 01/01/1982
Field of study

The study of the synchronization of processes is a very interesting field. It-brings together concepts that have originated in the design of operating systems, and of high level programming languages. Also it is becoming clear that the design of algorithms for parallel execution is intimately connected with synchronization problems. Some specialized synchronization problems have arisen in the design of data base systems. Indeed, distributed data bases provide an example of distributed processing that has immense practical significance. To summarize, synchronization of processes is a universal activity whose importance is being felt throughout computer science. The time has therefore come for the synchronization of processes to be studied as a topic in its own right. In this course I am taking such a broad viewpoint, and am trying to integrate some aspects of operating systems, languages, and parallel algorithms. However, this being a first attempt, the integration is not as thorough as I would have wished. Also, in the short time at my disposal, I am not able to discuss several very important topics, such as reliability

Research Online

Recommended from our members

A model of time dependent behavior in concurrent software systems

Author: Lane Debra S.
Publication venue: eScholarship, University of California
Publication date: 02/11/1987
Field of study

A great difficulty in building distributed systems lies in being able to predict what the systems behavior will be. A distributed or communicating system is defined here to be one in in which the hardware consists of a set of processors each with their own memory, connected by some communication medium (there is no shared memory), and the software is assumed to be of the CSP (Hoare's Communicating Sequential Processes) type.In the past few years some theories have been proposed to model features of communicating systems. Milner's Calculus of communicating Systems (CCS), Winskel's Synchronization Trees (ST), Hennessy's Acceptance Trees (AT), and Hoare and Brookes's theory of communicating processes are examples of formal models of such systems. All of these models concentrate on modelling observable properties of a system.Event Dependency Trees (EDT) is a new representation of communicating systems that models the time dependent nature of such systems. None of the representations mentioned above explicitly represent time but time is precisely the factor that introduces so much variability and complexity into such software and systems. EDT provides a representation based on trees and a set of operations over the EDT trees that can be used to produce deadlock-free software. The model supplies potentially important information for the design and construction of distributed, parallel software systems

eScholarship - University of California

Workstation Clusters for Parallel Computing

Author: Erçal Fikret
Stone J.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2001
Field of study

Workstation clusters have become an increasingly popular alternative to traditional parallel supercomputers for many workloads requiring high performance computing. The use of parallel computing for scientific simulations has increased tremendously in the last ten years, and parallel implementations of scientific simulation codes are now in widespread use. There are two dominant parallel hardware/software architectures in use today: distributed memory, and shared memory. Systems implementing shared memory provide cooperating processes with a shared memory address space that can be accessed by all processors. In shared memory systems, parallel processing occurs through the use of shared data structures, or through emulation of message passing semantics in software. Distributed memory systems are composed of a number of interconnected computational nodes, which do not share memory, but can communicate with each other through a high-performance network of some kind. Parallelism is achieved on distributed memory systems with multiple copies of the parallel program running on different nodes, sending messages to each other to coordinate computations. The messages used in a distributed memory parallel program typically contain application data, synchronization information, and other data that controls the execution of the parallel program

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Task-aware LPF: integrating a model-compliant communication layer with task-based programming models

Author: Cinca Roca Arnau
Publication venue: Universitat Politècnica de Catalunya
Publication date: 26/06/2023
Field of study

The rapid advancement of high-performance computing (HPC) systems has led to the emergence of exascale computing, characterized by distributed memory nodes and high parallel computing capabilities. To effectively utilize these systems, the HPC community has embraced programming models that harness both inter-node and intra-node parallelism. Inter-node parallelism is typically addressed using distributed-memory programming models like MPI and GASPI, while intra-node parallelism is exploited through shared-memory programming models such as OpenMP and OmpSs-2. However, the two-sided communication model used in MPI, which requires both the sender and receiver processes to post an operation, can impose performance limitations due to the inherent synchronization. In contrast, one-sided communication models like GASPI and Lightweight Parallel Foundations (LPF) leverage modern network fabric features and remote direct memory access (RDMA) to efficiently exchange data in distributed memory systems without the need for explicit receive operations. In this project, we combine the Bulk Synchronous Parallel (BSP) model of LPF with the data-flow model of OmpSs-2 to exploit parallelism at both intra-node and inter-node levels. This approach maintains the simplicity of the BSP model and the performance of the data-flow model. By enabling optimal overlap between computation, communication, and synchronization phases, we effectively utilize available resources. The flexibility of the data-flow model allows for adjusting computation tasks that are not tightly bound to BSP model phases, facilitating early or delayed execution based on resource availability. To optimize the BSP model, new zero-cost synchronization methods are designed, improving performance and flexibility. These methods offer localized synchronization but require a fixed communication pattern or user-defined criteria, limiting programmability. Additionally, bi-directional communication is often required, necessitating the inclusion of empty messages in applications without bi-directional communication. Our implementation is evaluated against Task-Aware MPI (TAMPI), demonstrating that with a single coarse-grained synchronization primitive, we can still hide synchronization overheads and reach competitive performance. The results show that the zero-cost synchronization methods perform similarly to TAMPI, indicating that coarse synchronization is sufficient for iterative applications. The evaluation highlights the effectiveness of the proposed approach in improving performance and programmability in HPC applications

UPCommons. Portal del coneixement obert de la UPC

A shared memory algorithm and proof for the alternative construct in CSP

Author: Feng Hwa-chung
Fujimoto Richard M.
Publication venue: University of Utah
Publication date: 01/01/1987
Field of study

technical reportCommunicating Sequential Processes (CSP) is a paradigm for communication and synchronization among distributed processes. The alternative construct is a key feature of CSP that allows nondeterministic selection of one among several possible communicants. Previous algorithms for this construct assume a message passing architecture and are not appropriate for multiprocessor systems that feature shared memory. This paper describes a distributed algorithm for the alternative construct that exploits the capabilities of a parallel computer with shared memory. The algorithm assumes a generalized version of Hoare's original alternative construct that allows output commands to be included in guards. A correctness proof of the proposed algorithm is presented to show that the algorithm conforms to some safety and liveness criteria. Extensions to allow termination of processes and to ensure fairness in guard selection are also given. Keywords: communicating sequential processes; alternative operation; shared memory multiprocessor; parallel processing

The University of Utah: J. Willard Marriott Digital Library

Garbage Collection for General Graphs

Author: Krishnan Hari
Publication venue: LSU Digital Commons
Publication date: 01/01/2016
Field of study

Garbage collection is moving from being a utility to a requirement of every modern programming language. With multi-core and distributed systems, most programs written recently are heavily multi-threaded and distributed. Distributed and multi-threaded programs are called concurrent programs. Manual memory management is cumbersome and difficult in concurrent programs. Concurrent programming is characterized by multiple independent processes/threads, communication between processes/threads, and uncertainty in the order of concurrent operations. The uncertainty in the order of operations makes manual memory management of concurrent programs difficult. A popular alternative to garbage collection in concurrent programs is to use smart pointers. Smart pointers can collect all garbage only if developer identifies cycles being created in the reference graph. Smart pointer usage does not guarantee protection from memory leaks unless cycle can be detected as process/thread create them. General garbage collectors, on the other hand, can avoid memory leaks, dangling pointers, and double deletion problems in any programming environment without help from the programmer. Concurrent programming is used in shared memory and distributed memory systems. State of the art shared memory systems use a single concurrent garbage collector thread that processes the reference graph. Distributed memory systems have very few complete garbage collection algorithms and those that exist use global barriers, are centralized and do not scale well. This thesis focuses on designing garbage collection algorithms for shared memory and distributed memory systems that satisfy the following properties: concurrent, parallel, scalable, localized (decentralized), low pause time, high promptness, no global synchronization, safe, complete, and operates in linear time

Louisiana State University

Fault-Tolerant Adaptive Parallel and Distributed Simulation

Author: Armaroli Lorenzo
D'Angelo Gabriele
Ferretti Stefano
Marzolla Moreno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take advantage of multiple execution units which are found in multicore processors, cluster of workstations or HPC systems. The current generation of HPC systems includes hundreds of thousands of computing nodes and a vast amount of ancillary components. Despite improvements in manufacturing processes, failures of some components are frequent, and the situation will get worse as larger systems are built. In this paper we describe FT-GAIA, a software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation middleware. FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some protection against byzantine failures since synchronization messages are replicated as well, so that the receiving entity can identify and discard corrupted messages. We provide an experimental evaluation of FT-GAIA on a running prototype. Results show that a high degree of fault tolerance can be achieved, at the cost of a moderate increase in the computational load of the execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2016

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

Author: Alistarh Dan
Chilimbi Trishul
Devlin Jacob
Ho Qirong
Hoefler T.
Hoefler T.
Hoefler T.
Hsieh Kevin
Interface Forum Message Passing
Jayarajan Anand
Lian Xiangru
Recht B.
Seide Frank
Strom Nikko
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate the gradients at every training step. In this paper, we propose eager-SGD, which relaxes the global synchronization for decentralized accumulation. To implement eager-SGD, we propose to use two partial collectives: solo and majority. With solo allreduce, the faster processes contribute their gradients eagerly without waiting for the slower processes, whereas with majority allreduce, at least half of the participants must contribute gradients before continuing, all without using a central parameter server. We theoretically prove the convergence of the algorithms and describe the partial collectives in detail. Experimental results on load-imbalanced environments (CIFAR-10, ImageNet, and UCF101 datasets) show that eager-SGD achieves 1.27x speedup over the state-of-the-art synchronous SGD, without losing accuracy.Comment: Published in Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'20), pp. 45-61. 202

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Lock-free Concurrent Data Structures

Author: Cederman Daniel
Gidenstam Anders
Ha Phuong
Papatriantafilou Marina
Sundell Håkan
Tsigas Philippas
Publication venue
Publication date: 01/01/2013
Field of study

Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin

arXiv.org e-Print Archive

Chalmers Research