13 research outputs found

    Replication Techniques for Speeding up Parallel Applications on Distributed Systems

    Get PDF
    This paper discusses the design choices involved in replicating objects and their effect on performance. Important issues are: how to maintain consistency among different copies of an object; how to implement changes to objects; and which strategy for object replication to use. We have implemented several options to determine which ones are most efficient

    Distributed Programming with Shared Data

    Get PDF
    Until recently, at least one thing was clear about parallel programming: tightly coupled (shared memory) machines were programmed in a language based on shared variables and loosely coupled (distributed) systems were programmed using message passing. The explosive growth of research on distributed systems and their languages, however, has led to several new methodologies that blur this simple distinction. Operating system primitives (e.g., problem-oriented shared memory, Shared Virtual Memory, the Agora shared memory) and languages (e.g., Concurrent Prolog, Linda, Emerald) for programming distributed systems have been proposed that support the shared variable paradigm without the presence of physical shared memory. In this paper we will look at the reasons for this evolution, the resemblances and differences among these new proposals, and the key issues in their design and implementation. It turns out that many implementations are based on replication of data. We take this idea one step further, and discuss how automatic replication (initiated by the run time system) can be used as a basis for a new model, called the shared data-object model, whose semantics are similar to the shared variable model. Finally, we discuss the design of a new language for distributed programming, Orca, based on the shared data-object model. 1

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    A Comparison of Two Paradigms for Distributed Shared Memory

    Get PDF
    This paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms two systems are described, one using only point-to-point messages, the other using broadcasting as well. The two paradigms and their implementations are described briefly. Their performances on four applications are compared: the travelling-salesman problem, alpha-beta search, matrix multiplication and the all-pairs shortest paths problem. The relevant measurements were obtained on a system consisting of 10 MC68020 processors connected by an Ethernet. For comparison purposes, the applications have also been run on a system with physical shared memory. In addition, the paper gives measurements for the first two applications above when Remote Procedure Call is used as the communication mechanism. The measurements show that both paradigms can be used efficiently for programming large-grain parallel applications, with significant speed-ups. The structured shared data-object model achieves the highest speed-ups and is easiest to program and to debug. KEYWORDS: Amoeba Distributed shared memory Distributed programming Orc

    Distributed shared memory for virtual environments

    Get PDF
    Bibliography: leaves 71-77.This work investigated making virtual environments easier to program, by designing a suitable distributed shared memory system. To be usable, the system must keep latency to a minimum, as virtual environments are very sensitive to it. The resulting design is push-based and non-consistent. Another requirement is that the system should be scaleable, over large distances and over large numbers of participants. The latter is hard to achieve with current network protocols, and a proposal was made for a more scaleable multicast addressing system than is used in the Internet protocol. Two sample virtual environments were developed to test the ease-of-use of the system. This showed that the basic concept is sound, but that more support is needed. The next step should be to extend the language and add compiler support, which will enhance ease-of-use and allow numerous optimisations. This can be improved further by providing system-supported containers

    Software Packaging

    Get PDF
    Many computer programs cannot be easily integrated because their components are distributed and heterogeneous, i.e., they are implemented in diverse programming languages, use different data representation formats, or their runtime environments are incom patible. In many cases, programs are integrated by modifying their components or interposing mechanisms that handle communication and conversion tasks. For example, remote procedure call (RPC) helps integrate heterogeneous, distributed programs. When conf iguring such programs, however, mechanisms like RPC must be used explicitly by software developers in order to integrate collections of diverse components. Each collection may require a unique integration solution. This thesis describes a process called software packaging that automatically determines how to integrate a diverse collection of computer programs based on the types of components involved and the capabilities of available translators and adapters in an environment. Whereas previous efforts focused solely on integration mechanisms, software packaging provides a context that relates such mechanisms to software integration processes. We demonstrate the value of this approach by reducing the cost of configuring applications whose components are distributed and implemented in different programming languages. Our software packaging tool subsumes traditional integration tools like UNIX MAKE by providing a rule-based approach to software integration that is independent of execution environments. (Also cross-referenced as UMIACS-TR-93-56

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

    Modèles et protocoles de cohérence de données, décision et optimisation à la compilation pour des architectures massivement parallèles.

    Get PDF
    Manycores architectures consist of hundreds to thousands of embedded cores, distributed memories and a dedicated network on a single chip. In this context, and because of the scale of the processor, providing a shared memory system has to rely on efficient hardware and software mechanisms and data consistency protocols. Numerous works explored consistency mechanisms designed for highly parallel architectures. They lead to the conclusion that there won't exist one protocol that fits to all applications and hardware contexts. In order to deal with consistency issues for this kind of architectures, we propose in this work a multi-protocol compilation toolchain, in which shared data of the application can be managed by different protocols. Protocols are chosen and configured at compile time, following the application behaviour and the targeted architecture specifications. The application behaviour is characterized with a static analysis process that helps to guide the protocols assignment to each data access. The platform offers a protocol library where each protocol is characterized by one or more parameters. The range of possible values of each parameter depends on some constraints mainly related to the targeted platform. The protocols configuration relies on a genetic-based engine that allows to instantiate each protocol with appropriate parameters values according to multiple performance objectives. In order to evaluate the quality of each proposed solution, we use different evaluation models. We first use a traffic analytical model which gives some NoC communication statistics but no timing information. Therefore, we propose two cycle- based evaluation models that provide more accurate performance metrics while taking into account contention effect due to the consistency protocols communications.We also propose a cooperative cache consistency protocol improving the cache miss rate by sliding data to less stressed neighbours. An extension of this protocol is proposed in order to dynamically define the sliding radius assigned to each data migration. This extension is based on the mass-spring physical model. Experimental validation of different contributions uses the sliding based protocols versus a four-state directory-based protocol.Le développement des systèmes massivement parallèles de type manycores permet d'obtenir une très grande puissance de calcul à bas coût énergétique. Cependant, l'exploitation des performances de ces architectures dépend de l'efficacité de programmation des applications. Parmi les différents paradigmes de programmation existants, celui à mémoire partagée est caractérisé par une approche intuitive dans laquelle tous les acteurs disposent d'un accès à un espace d'adressage global. Ce modèle repose sur l'efficacité du système à gérer les accès aux données partagées. Le système définit les règles de gestion des synchronisations et de stockage de données qui sont prises en charge par les protocoles de cohérence. Dans le cadre de cette thèse nous avons montré qu'il n'y a pas un unique protocole adapté aux différents contextes d'application et d'exécution. Nous considérons que le choix d'un protocole adapté doit prendre en compte les caractéristiques de l'application ainsi que des objectifs donnés pour une exécution. Nous nous intéressons dans ces travaux de thèse au choix des protocoles de cohérence en vue d'améliorer les performances du système. Nous proposons une plate-forme de compilation pour le choix et le paramétrage d'une combinaison de protocoles de cohérence pour une même application. Cette plate- forme est constituée de plusieurs briques. La principale brique développée dans cette thèse offre un moteur d'optimisation pour la configuration des protocoles de cohérence. Le moteur d'optimisation, inspiré d'une approche évolutionniste multi-objectifs (i.e. Fast Pareto Genetic Algorithm), permet d'instancier les protocoles de cohérence affectés à une application. L'avantage de cette technique est un coût de configuration faible permettant d'adopter une granularité très fine de gestion de la cohérence, qui peut aller jusqu'à associer un protocole par accès. La prise de décision sur les protocoles adaptés à une application est orientée par le mode de performance choisi par l'utilisateur (par exemple, l'économie d'énergie). Le modèle de décision proposé est basé sur la caractérisation des accès aux données partagées selon différentes métriques (par exemple: la fréquence d'accès, les motifs d'accès à la mémoire, etc). Les travaux de thèse traitent également des techniques de gestion de données dans la mémoire sur puce. Nous proposons deux protocoles basés sur le principe de coopération entre les caches répartis du système: Un protocole de glissement des données ainsi qu'un protocole inspiré du modèle physique du masse-ressort

    Interaction and interest management in a scripting language.

    Get PDF
    Interaction management is concerned with the protocols that govern interactive activities among multiple users or agents in networked collaborative environments. Interest management is concerned with the relevance-based data filtering in networked collaborative environments. The main objective of the former is to structure interactive activities according to the requirements of the application concerned, while the main objective of the latter is to provide secured data transmission of a subset of information relevant to each recipient. The research in these two important aspects of networked software has largely been carried out in specific application domains such as online meetings, online groupware and online games. This thesis is concerned with the design and implementation of high-level language constructs for interaction and interest management. The work that has been undertaken includes: an abstract study of interactive activities and data transmission in networked collaborative environments through a large number of variations of the noughts and crosses game; the design of a set of language constructs for specifying a variety of interaction protocols; the design of a set of language constructs for specifying secured data sharing with relevance-based filtering; the implementation of these language constructs in the form of a major extension of a scripting language JACIE (Java-based Authoring Language for Collaborative Interactive Environments); the development of two demonstration applications, namely e-leaming on Simulation of Network Trouble Shooting and online Bridge, using the extended JACIE for demonstrating the technical feasibility and usefulness of the design. These high-level language constructs support a class of complicated software features in networked collaborative applications, such as turn management, interaction timing, group formation, dynamic protocol changes, distributed data sharing, access control, authentication and information filtering. They enable programmers to implement such features in an intuitive manner without involving low-level system programming directly, which would otherwise require the knowledge and skills of experienced network programmers

    Multilanguage parallel programming of heterogeneous machines

    No full text
    corecore