23 research outputs found

    Turning Futexes Inside-Out: Efficient and Deterministic User Space Synchronization Primitives for Real-Time Systems with IPCP

    Get PDF
    In Linux and other operating systems, futexes (fast user space mutexes) are the underlying synchronization primitives to implement POSIX synchronization mechanisms, such as blocking mutexes, condition variables, and semaphores. Futexes allow one to implement mutexes with excellent performance by avoiding system calls in the fast path. However, futexes are fundamentally limited to synchronization mechanisms that are expressible as atomic operations on 32-bit variables. At operating system kernel level, futex implementations require complex mechanisms to look up internal wait queues making them susceptible to determinism issues. In this paper, we present an alternative design for futexes by completely moving the complexity of wait queue management from the operating system kernel into user space, i. e. we turn futexes "inside out". The enabling mechanisms for "inside-out futexes" are an efficient implementation of the immediate priority ceiling protocol (IPCP) to achieve non-preemptive critical sections in user space, spinlocks for mutual exclusion, and interwoven services to suspend or wake up threads. The design allows us to implement common thread synchronization mechanisms in user space and to move determinism concerns out of the kernel while keeping the performance properties of futexes. The presented approach is suitable for multi-processor real-time systems with partitioned fixed-priority (P-FP) scheduling on each processor. We evaluate the approach with an implementation for mutexes and condition variables in a real-time operating system (RTOS). Experimental results on 32-bit ARM platforms show that the approach is feasible, and overheads are driven by low-level synchronization primitives

    Distributed Programming with Shared Data

    Get PDF
    Until recently, at least one thing was clear about parallel programming: tightly coupled (shared memory) machines were programmed in a language based on shared variables and loosely coupled (distributed) systems were programmed using message passing. The explosive growth of research on distributed systems and their languages, however, has led to several new methodologies that blur this simple distinction. Operating system primitives (e.g., problem-oriented shared memory, Shared Virtual Memory, the Agora shared memory) and languages (e.g., Concurrent Prolog, Linda, Emerald) for programming distributed systems have been proposed that support the shared variable paradigm without the presence of physical shared memory. In this paper we will look at the reasons for this evolution, the resemblances and differences among these new proposals, and the key issues in their design and implementation. It turns out that many implementations are based on replication of data. We take this idea one step further, and discuss how automatic replication (initiated by the run time system) can be used as a basis for a new model, called the shared data-object model, whose semantics are similar to the shared variable model. Finally, we discuss the design of a new language for distributed programming, Orca, based on the shared data-object model. 1

    Interprocess communication in highly distributed systems

    Get PDF
    Issued as Final technical report, Project no. G-36-632Final technical report has title: Interprocess communication in highly distributed system

    Reliable file transfer across a 10 megabit ethernet

    Get PDF
    The Ethernet communications network is a broadcast, multi-access system for local computing networks. Such a network was used to connect six 68000 based Charles River Data Systems for the purpose of file transfer. Each system required hardware installation and connection to the Ethernet cable. The software is an implementation which conforms to Xerox PUP File Transfer Protocol Specifications . This required the writing of two programs, the FTP user and the FTP server. Each program was built upon common communication packages which also had to be written. These communication routines transferred data over the Ethernet using the PARC Universal Packets (PUP) format

    The Problem of Mutual Exclusion: A New Distributed Solution

    Get PDF
    In both centralized and distributed systems, processes cooperate and compete with each other to access the system resources. Some of these resources must be used exclusively. It is then required that only one process access the shared resource at a given time. This is referred to as the problem of mutual exclusion. Several synchronization mechanisms have been proposed to solve this problem. In this thesis, an effort has been made to compile most of the existing mutual exclusion solutions for both shared memory and message-passing based systems. A new distributed algorithm, which uses a dynamic information structure, is presented to solve the problem of mutual exclusion. It is proved to be free from both deadlock and starvation. This solution is shown to be economical in terms of the number of message exchanges required per critical section execution. Procedures for recovery from both site and link failures are also given

    A comparative study of concurrency control algorithms for distributed databases

    Get PDF
    The declining cost of computer hardware and the increasing data processing needs of geographically dispersed organizations have led to substantial interest in distributed data management. These characteristics have led to reconsider the design of centralized databases. Distributed databases have appeared as a result of those considerations. A number of advantages result from having duplicate copies of data in a distributed databases. Some of these advantages are: increased data accesibility, more responsive data access, higher reliability, and load sharing. These and other benefits must be balanced against the additional cost and complexity introduced in doing so. This thesis considers the problem of concurrency control of multiple copy databases. Several synchronization techniques are mentioned and a few algorithms for concurrency control are evaluated and compared

    A retrospective on the VAX VMM security kernel

    Full text link

    Open predicate path expressions for distributed environments: notation, implementation, and extensions

    Get PDF
    This dissertation introduces open predicate path expressions --a non-procedural, very-high-level language notation for the synchronization of concurrent accesses to shared data in distributed computer systems. The target environment is one in which resource modules (totally encapsulated instances of abstract data types) are the basic building blocks in a network of conventional, von Neumann computers or of functional, highly parallel machines. Each resource module will contain two independent submodules: a synchronization submodule which coordinates requests for access to the resource\u27s data and an access-mechanism submodule which localizes the code for operations on that data;Open predicate path expressions are proposed as a specification language for the synchronization submodule and represent a blend of two existing path notations: open path expressions and predicate path expressions. Motivations for the adoption of this new notation are presented, and an implementation semantics for the notation is presented in the form of dataflow graphs;An algorithm is presented which will automatically synthesize an open predicate path expression into a dataflow graph, which is then implemented by a network of communicating submodules written in either a sequential or an applicative language. Finally, an extended notation for the synchronization submodule is proposed, the purpose of which is to provide greater expressive power for certain synchronization problems which are difficult to specify using path expressions alone

    Efficient hardware and software assist for many-core performance

    Get PDF
    In recent years, the number of available cores in a processor are increasing rapidly while the pace of performance improvement of an individual core has been lagged. It led application developers to extract more parallelism from a number of cores to make their applications run faster. However, writing a parallel program that scales well with the increasing core counts is challenging. Consequently, many parallel applications suffer from performance bugs caused by scalability limiters. We expect core counts to continue to increase for the foreseeable future and hence, addressing scalability limiters is important for better performance on future hardware. With this thesis, I propose both software frameworks and hardware improvements that I developed to address three important scalability limiters: load imbalance, barrier latency and increasing on-chip packet latency. First, I introduce a debugging framework for load imbalance called LIME. The LIME framework uses profiling, statistical analysis and control flow graph analysis to automatically determine the nature of load imbalance problems and pinpoint the code where the problems are introduced. Second, I address scalability problem of the barrier, which has become costly and difficult to achieve scalable performance. To address this problem, I propose a transmission line (TL) based hardware barrier support, called TLSync, that is orders of magnitude faster than software barrier implementation while supports many (tens) of barriers simultaneously using a single chip-spanning network. Third and lastly, I focus on the increasing packet latency in on-chip network, and propose a hybrid interconnection where a low-latency TL based interconnect is synergistically used with a high-throughput switched interconnect. Also, a new adaptive packet steering policy is created to judiciously use the limited throughput available on the low-latency TL interconnect.Ph.D
    corecore