2,245 research outputs found

    Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries

    Full text link
    This work introduces a runtime model for managing communication with support for latency-hiding. The model enables non-computer science researchers to exploit communication latency-hiding techniques seamlessly. For compiled languages, it is often possible to create efficient schedules for communication, but this is not the case for interpreted languages. By maintaining data dependencies between scheduled operations, it is possible to aggressively initiate communication and lazily evaluate tasks to allow maximal time for the communication to finish before entering a wait state. We implement a heuristic of this model in DistNumPy, an auto-parallelizing version of numerical Python that allows sequential NumPy programs to run on distributed memory architectures. Furthermore, we present performance comparisons for eight benchmarks with and without automatic latency-hiding. The results shows that our model reduces the time spent on waiting for communication as much as 27 times, from a maximum of 54% to only 2% of the total execution time, in a stencil application.Comment: PREPRIN

    Flush communication channels: Effective implementation and verification

    Get PDF
    Flush communication channels, or F-channels, generalize more conventional asynchronous communication paradigms. A distributed system which uses an F-channel allows a programmer to define the delivery order of each message in relation to other messages transmitted on the channel. Unreliable datagrams and FIFO (first-in-first-out) communication channels have strictly defined delivery semantics. No restrictions are allowed on message delivery order with unreliable datagrams--message delivery is completely unordered. FIFO channels, on the other hand, insist messages are delivered in the order of their transmission. Flush channels can provide either of these delivery order semantics; in addition, F-channels allow the user to define the delivery of a message to be after the delivery of all messages previously transmitted or before the delivery of all messages subsequently transmitted or both. A system which communicates with a flush channel has a message delivery order that is a partial order.;Dynamically specifying a partial message delivery order complicates many aspects of how we implement and reason about the communication channel. From the system\u27s perspective, we develop a feasible implementation protocol and prove its correctness. The protocol effectively handles the partially ordered message delivery. From the user\u27s perspective, we derive an axiomatic verification methodology for flush applications. The added flexibility of defining the delivery order dynamically slightly increases the complexity for the application programmer. Our verification work helps the user effectively deal with the partially ordered message delivery in flush communication

    Flush communication channels: Effective implementation and verification

    Get PDF
    Flush communication channels, or F-channels, generalize more conventional asynchronous communication paradigms. A distributed system which uses an F-channel allows a programmer to define the delivery order of each message in relation to other messages transmitted on the channel. Unreliable datagrams and FIFO (first-in-first-out) communication channels have strictly defined delivery semantics. No restrictions are allowed on message delivery order with unreliable datagrams--message delivery is completely unordered. FIFO channels, on the other hand, insist messages are delivered in the order of their transmission. Flush channels can provide either of these delivery order semantics; in addition, F-channels allow the user to define the delivery of a message to be after the delivery of all messages previously transmitted or before the delivery of all messages subsequently transmitted or both. A system which communicates with a flush channel has a message delivery order that is a partial order.;Dynamically specifying a partial message delivery order complicates many aspects of how we implement and reason about the communication channel. From the system\u27s perspective, we develop a feasible implementation protocol and prove its correctness. The protocol effectively handles the partially ordered message delivery. From the user\u27s perspective, we derive an axiomatic verification methodology for flush applications. The added flexibility of defining the delivery order dynamically slightly increases the complexity for the application programmer. Our verification work helps the user effectively deal with the partially ordered message delivery in flush communication

    Doctor of Philosophy

    Get PDF
    dissertationPlaces and distributed places bring new support for message-passing parallelism to Racket. This dissertation describes the programming model and how Racket's sequential runtime-system was modified to support places and distributed places. The freedom to design the places programming model helped make the implementation tractable; specifically, the conventional pain of adding just the right amount of locking to a big, legacy runtime system was avoided. The dissertation presents an evaluation of the places design that includes both real-world applications and standard parallel benchmarks. Distributed places are introduced as a language extension of the places design and architecture. The distributed places extension augments places with the features of remote process launch, remote place invocation, and distributed message passing. Distributed places provide a foundation for constructing higher-level distributed frameworks. Example implementations of RPC, MPI, map reduce, and nested data parallelism demonstrate the extensibility of the distributed places API

    A Message-Passing, Thread-Migrating Operating System for a Non-Cache-Coherent Many-Core Architecture

    Get PDF
    The difference between emerging many-core architectures and their multi-core predecessors goes beyond just the number of cores incorporated on a chip. Current technologies for maintaining cache coherency are not scalable beyond a few dozen cores, and a lack of coherency presents a new paradigm for software developers to work with. While shared memory multithreading has been a viable and popular programming technique for multi-cores, the distributed nature of many-cores is more amenable to a model of share-nothing, message-passing threads. This model places different demands on a many-core operating system, and this thesis aims to understand and accommodate those demands. We introduce Xipx, a port of the lightweight Embedded Xinu operating system to the many-core Intel Single-chip Cloud Computer (SCC). The SCC is a 48-core x86 architecture that lacks cache coherency. It features a fast mesh network-on-chip (NoC) and on-die message passing buffers to facilitate message-passing communications between cores. Running as a separate instance per core, Xipx takes advantage of this hardware in its implementation of a message-passing device. The device multiplexes the message passing hardware, thereby allowing multiple concurrent threads to share the hardware without interfering with each other. Xipx also features a limited framework for transparent thread migration. This achievement required fundamental modifications to the kernel, including incorporation of a new type of thread. Additionally, a minimalistic framework for bare-metal development on the SCC has been produced as a pragmatic offshoot of the work on Xipx. This thesis discusses the design and implementation of the many-core extensions described above. While Xipx serves as a foundation for continued research on many-core operating systems, test results show good performance from both message passing and thread migration suggesting that, as it stands, Xipx is an effective platform for exploration of many-core development at the application level as well

    Distributed dispatchers for partially clairvoyant schedulers

    Get PDF
    This work focuses on the empirical evaluation of distributed dispatching strategies on shared and distributed memory architectures for hard real-time systems. The dispatching model accommodates process parameter variability and analyzes the effect of variable execution times.;Hard real-time systems are modeled in the E-T-C scheduling framework and dispatched if a valid schedule exists. We examine the dispatchability of Partially Clairvoyant schedules of different sizes and varying deadlines under reasonable assumptions. The effect of scaling up the number of processors used by the dispatcher is also studied. The results validate the superiority of the distributed strategies over sequential dispatching and scalability of the distributed strategies. Certain system limitations which lead to Loss of Dispatchability in the experiments were pointed out.;The model finds applications in diverse areas like safety critical systems, robotics and machine control, real-time data management, and this approach is targeted at powering up the controllers
    corecore