1,643 research outputs found

    An Introduction to Message Passing Paradigms

    Get PDF

    Remote Store Programming: Mechanisms and Performance

    Get PDF
    This paper presents remote store programming (RSP). This paradigm combines usability and efficiency through the exploitation of a simple hardware mechanism, the remote store, which can easily be added to existing multicores.Remote store programs are marked by fine-grained and one-sided communication which results in a stream of data flowing from the registers of a sending process to the cache of a destination process. The RSP model and its hardware implementation trade a relatively high store latency for a low load latency because loads are more common than stores, and it is easier to tolerate store latency than load latency. This paper demonstrates the performance advantages of remote store programming by comparing it to both cache-coherent shared memory and direct memory access (DMA) based approaches using the TILEPro64 processor. The paper studies two applications: a two-dimensional Fast Fourier Transform (2D FFT) and an H.264 encoder for high-definition video. For a 2D FFT using 56 cores, RSP is 1.64x faster than DMA and 4.4x faster than shared memory. For an H.264 encoder using 40 cores, RSP achieves the same performance as DMA and 4.8x the performance of shared memory. Along with these performance advantages, RSP requires the least hardware support of the three. RSP's features, performance, and hardware simplicity make it well suited to the embedded processing domain

    A layered operational model for describing inter-tool communication in tool integration frameworks

    Get PDF
    Integration frameworks for building software engineering environments provide at least data, control and presentation integration facilities, together with integration devices which afford access to these facilities by the tools which populate the framework. Typically, an integration device is a specially developed language, or extension to an existing language, in which the integration programmer specifies the desired interactions between the tools comprising the software engineering environment. Surprisingly little effort has been applied to assessing the expressiveness of integration languages, even though the power of such a language limits the level of integration a tool can achieve within the environment. Our work seeks to provide an approach to both assessing and comparing the expressiveness of the integration devices of a range of commercial and research products. The paper presents a layered operational model, based on information structures; this model has been developed for describing the semantics of the inter-tool communication features of integration devices in a precise manner, and in a manner which will facilitate such assessment and comparison

    Out-Of-Place debugging: a debugging architecture to reduce debugging interference

    Get PDF
    Context. Recent studies show that developers spend most of their programming time testing, verifying and debugging software. As applications become more and more complex, developers demand more advanced debugging support to ease the software development process. Inquiry. Since the 70's many debugging solutions were introduced. Amongst them, online debuggers provide a good insight on the conditions that led to a bug, allowing inspection and interaction with the variables of the program. However, most of the online debugging solutions introduce \textit{debugging interference} to the execution of the program, i.e. pauses, latency, and evaluation of code containing side-effects. Approach. This paper investigates a novel debugging technique called \outofplace debugging. The goal is to minimize the debugging interference characteristic of online debugging while allowing online remote capabilities. An \outofplace debugger transfers the program execution and application state from the debugged application to the debugger application, both running in different processes. Knowledge. On the one hand, \outofplace debugging allows developers to debug applications remotely, overcoming the need of physical access to the machine where the debugged application is running. On the other hand, debugging happens locally on the remote machine avoiding latency. That makes it suitable to be deployed on a distributed system and handle the debugging of several processes running in parallel. Grounding. We implemented a concrete out-of-place debugger for the Pharo Smalltalk programming language. We show that our approach is practical by performing several benchmarks, comparing our approach with a classic remote online debugger. We show that our prototype debugger outperforms by a 1000 times a traditional remote debugger in several scenarios. Moreover, we show that the presence of our debugger does not impact the overall performance of an application. Importance. This work combines remote debugging with the debugging experience of a local online debugger. Out-of-place debugging is the first online debugging technique that can minimize debugging interference while debugging a remote application. Yet, it still keeps the benefits of online debugging ( e.g. step-by-step execution). This makes the technique suitable for modern applications which are increasingly parallel, distributed and reactive to streams of data from various sources like sensors, UI, network, etc

    Parameterized Concurrent Multi-Party Session Types

    Full text link
    Session types have been proposed as a means of statically verifying implementations of communication protocols. Although prior work has been successful in verifying some classes of protocols, it does not cope well with parameterized, multi-actor scenarios with inherent asynchrony. For example, the sliding window protocol is inexpressible in previously proposed session type systems. This paper describes System-A, a new typing language which overcomes many of the expressiveness limitations of prior work. System-A explicitly supports asynchrony and parallelism, as well as multiple forms of parameterization. We define System-A and show how it can be used for the static verification of a large class of asynchronous communication protocols.Comment: In Proceedings FOCLASA 2012, arXiv:1208.432

    Parallel Programming Using Shared Objects and Broadcasting

    Get PDF
    The two major design approaches taken to build distributed and parallel computer systems, multiprocessing and multicomputing, are discussed. A model that combines the best properties of both multiprocessor and multicomputer systems, easy-to-build hardware, and a conceptually simple programming model is presented. Using this model, a programmer defines and invokes operations on shared objects, the runtime system handles reads and writes on these objects, and the reliable broadcast layer implements indivisible updates to objects using the sequencing protocol. The resulting system is easy to program, easy to build, and has acceptable performance on problems with a moderate grain size in which reads are much more common than writes. Orca, a procedural language whose sequential constructs are roughly similar to languages like C or Modula 2 but which also supports parallel processes and shared objects and has been used to develop applications for the prototype system, is described

    Towards an MPI-like Framework for Azure Cloud Platform

    Get PDF
    Message passing interface (MPI) has been widely used for implementing parallel and distributed applications. The emergence of cloud computing offers a scalable, fault-tolerant, on-demand al-ternative to traditional on-premise clusters. In this thesis, we investigate the possibility of adopt-ing the cloud platform as an alternative to conventional MPI-based solutions. We show that cloud platform can exhibit competitive performance and benefit the users of this platform with its fault-tolerant architecture and on-demand access for a robust solution. Extensive research is done to identify the difficulties of designing and implementing an MPI-like framework for Azure cloud platform. We present the details of the key components required for implementing such a framework along with our experimental results for benchmarking multiple basic operations of MPI standard implemented in the cloud and its practical application in solving well-known large-scale algorithmic problems

    Implementing generalized deep-copy in MPI

    Get PDF
    In this paper we introduce a framework for implementing deep copy on top of MPI. The process is initiated by passing just the root object of the dynamic data structure. Our framework takes care of all pointer traversal, communication, copying and reconstruction on receiving nodes. The benefit of our approach is that MPI users can deep copy complex dynamic data structures without the need to write bespoke communication or serialize / deserialize methods for each object. These methods can present a challenging implementation problem that can quickly become unwieldy to maintain when working with complex structured data. This paper demonstrates our generic implementation, which encapsulates both approaches. We analyze the approach with a variety of structures (trees, graphs (including complete graphs) and rings) and demonstrate that it performs comparably to hand written implementations, using a vastly simplified programming interface. We make the source code available completely as a convenient header file
    corecore