16 research outputs found

    Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming

    Get PDF
    This work focuses on compiler and run-time techniques for improving the productivity and the performance portability of general-purpose parallel programming. More specifically, we focus on shared-memory task-parallel languages, where the programmer explicitly exposes parallelism in the form of short tasks that may outnumber the cores by orders of magnitude. The compiler, the run-time, and the platform (henceforth the system) are responsible for harnessing this unpredictable amount of parallelism, which can vary from none to excessive, towards efficient execution. The challenge arises from the aspiration to support fine-grained irregular computations and nested parallelism. This work is even more ambitious by also aspiring to lay the foundations to efficiently support declarative code, where the programmer exposes all available parallelism, using high-level language constructs such as parallel loops, reducers or futures. The appeal of declarative code is twofold for general-purpose programming: it is often easier for the programmer who does not have to worry about the granularity of the exposed parallelism, and it achieves better performance portability by avoiding overfitting to a small range of platforms and inputs for which the programmer is coarsening. Furthermore, PRAM algorithms, an important class of parallel algorithms, naturally lend themselves to declarative programming, so supporting it is a necessary condition for capitalizing on the wealth of the PRAM theory. Unfortunately, declarative codes often expose such an overwhelming number of fine-grained tasks that existing systems fail to deliver performance. Our contributions can be partitioned into three components. First, we tackle the issue of coarsening, which declarative code leaves to the system. We identify two goals of coarsening and advocate tackling them separately, using static compiler transformations for one and dynamic run-time approaches for the other. Additionally, we present evidence that the current practice of burdening the programmer with coarsening either leads to codes with poor performance-portability, or to a significantly increased programming effort. This is a ``show-stopper'' for general-purpose programming. To compare the performance portability among approaches, we define an experimental framework and two metrics, and we demonstrate that our approaches are preferable. We close the chapter on coarsening by presenting compiler transformations that automatically coarsen some types of very fine-grained codes. Second, we propose Lazy Scheduling, an innovative run-time scheduling technique that infers the platform load at run-time, using information already maintained. Based on the inferred load, Lazy Scheduling adapts the amount of available parallelism it exposes for parallel execution and, thus, saves parallelism overheads that existing approaches pay. We implement Lazy Scheduling and present experimental results on four different platforms. The results show that Lazy Scheduling is vastly superior for declarative codes and competitive, if not better, for coarsened codes. Moreover, Lazy Scheduling is also superior in terms of performance-portability, supporting our thesis that it is possible to achieve reasonable efficiency and performance portability with declarative codes. Finally, we also implement Lazy Scheduling on XMT, an experimental manycore platform developed at the University of Maryland, which was designed to support codes derived from PRAM algorithms. On XMT, we manage to harness the existing hardware support for scheduling flat parallelism to compose it with Lazy Scheduling, which supports nested parallelism. In the resulting hybrid scheduler, the hardware and software work in synergy to overcome each other's weaknesses. We show the performance composability of the hardware and software schedulers, both in an abstract cost model and experimentally, as the hybrid always performs better than the software scheduler alone. Furthermore, the cost model is validated by using it to predict if it is preferable to execute a code sequentially, with outer parallelism, or with nested parallelism, depending on the input, the available hardware parallelism and the calling context of the parallel code

    The shared data-object model as a paradigm for programming distributed systems

    Get PDF

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    SoS: self-organizing substrates

    Get PDF
    Large-scale networked systems often, both by design or chance exhibit self-organizing properties. Understanding self-organization using tools from cybernetics, particularly modeling them as Markov processes is a first step towards a formal framework which can be used in (decentralized) systems research and design.Interesting aspects to look for include the time evolution of a system and to investigate if and when a system converges to some absorbing states or stabilizes into a dynamic (and stable) equilibrium and how it performs under such an equilibrium state. Such a formal framework brings in objectivity in systems research, helping discern facts from artefacts as well as providing tools for quantitative evaluation of such systems. This thesis introduces such formalism in analyzing and evaluating peer-to-peer (P2P) systems in order to better understand the dynamics of such systems which in turn helps in better designs. In particular this thesis develops and studies the fundamental building blocks for a P2P storage system. In the process the design and evaluation methodology we pursue illustrate the typical methodological approaches in studying and designing self-organizing systems, and how the analysis methodology influences the design of the algorithms themselves to meet system design goals (preferably with quantifiable guarantees). These goals include efficiency, availability and durability, load-balance, high fault-tolerance and self-maintenance even in adversarial conditions like arbitrarily skewed and dynamic load and high membership dynamics (churn), apart of-course the specific functionalities that the system is supposed to provide. The functionalities we study here are some of the fundamental building blocks for various P2P applications and systems including P2P storage systems, and hence we call them substrates or base infrastructure. These elemental functionalities include: (i) Reliable and efficient discovery of resources distributed over the network in a decentralized manner; (ii) Communication among participants in an address independent manner, i.e., even when peers change their physical addresses; (iii) Availability and persistence of stored objects in the network, irrespective of availability or departure of individual participants from the system at any time; and (iv) Freshness of the objects/resources' (up-to-date replicas). Internet-scale distributed index structures (often termed as structured overlays) are used for discovery and access of resources in a decentralized setting. We propose a rapid construction from scratch and maintenance of the P-Grid overlay network in a self-organized manner so as to provide efficient search of both individual keys as well as a whole range of keys, doing so providing good load-balancing characteristics for diverse kind of arbitrarily skewed loads - storage and replication, query forwarding and query answering loads. For fast overlay construction we employ recursive partitioning of the key-space so that the resulting partitions are balanced with respect to storage load and replication. The proper algorithmic parameters for such partitioning is derived from a transient analysis of the partitioning process which has Markov property. Preservation of ordering information in P-Grid such that queries other than exact queries, like range queries can be efficiently and rather trivially handled makes P-Grid suitable for data-oriented applications. Fast overlay construction is analogous to building an index on a new set of keys making P-Grid suitable as the underlying indexing mechanism for peer-to-peer information retrieval applications among other potential applications which may require frequent indexing of new attributes apart regular updates to an existing index. In order to deal with membership dynamics, in particular changing physical address of peers across sessions, the overlay itself is used as a (self-referential) directory service for maintaining the participating peers' physical addresses across sessions. Exploiting this self-referential directory, a family of overlay maintenance scheme has been designed with lower communication overhead than other overlay maintenance strategies. The notion of dynamic equilibrium study for overlays under continuous churn and repairs, modeled as a Markov process, was introduced in order to evaluate and compare the overlay maintenance schemes. While the self-referential directory was originally invented to realize overlay maintenance schemes with lower overheads than existing overlay maintenance schemes, the self-referential directory is generic in nature and can be used for various other purposes, e.g., as a decentralized public key infrastructure. Persistence of peer identity across sessions, in spite of changes in physical address, provides a logical independence of the overlay network from the underlying physical network. This has many other potential usages, for example, efficient maintenance mechanisms for P2P storage systems and P2P trust and reputation management. We specifically look into the dynamics of maintaining redundancy for storage systems and design a novel lazy maintenance strategy. This strategy is algorithmically a simple variant of existing maintenance strategies which adapts to the system dynamics. This randomized lazy maintenance strategy thus explores the cost-performance trade-offs of the storage maintenance operations in a self-organizing manner. We model the storage system (redundancy), under churn and maintenance, as a Markov process. We perform an equilibrium study to show that the system operates in a more stable dynamic equilibrium with our strategy than for the existing maintenance scheme for comparable overheads. Particularly, we show that our maintenance scheme provides substantial performance gains in terms of maintenance overhead and system's resilience in presence of churn and correlated failures. Finally, we propose a gossip mechanism which works with lower communication overhead than existing approaches for communication among a relatively large set of unreliable peers without assuming any specific structure for their mutual connectivity. We use such a communication primitive for propagating replica updates in P2P systems, facilitating management of mutable content in P2P systems. The peer population affected by a gossip can be modeled as a Markov process. Studying the transient spread of gossips help in choosing proper algorithm parameters to reduce communication overhead while guaranteeing coverage of online peers. Each of these substrates in themselves were developed to find practical solutions for real problems. Put together, these can be used in other applications, including a P2P storage system with support for efficient lookup and inserts, membership dynamics, content mutation and updates, persistence and availability. Many of the ideas have already been implemented in real systems and several others are in the way to be integrated into the implementations. There are two principal contributions of this dissertation. It provides design of the P2P systems which are useful for end-users as well as other application developers who can build upon these existing systems. Secondly, it adapts and introduces the methodology of analysis of a system's time-evolution (tools typically used in diverse domains including physics and cybernetics) to study the long run behavior of P2P systems, and uses this methodology to (re-)design appropriate algorithms and evaluate them. We observed that studying P2P systems from the perspective of complex systems reveals their inner dynamics and hence ways to exploit such dynamics for suitable or better algorithms. In other words, the analysis methodology in itself strongly influences and inspires the way we design such systems. We believe that such an approach of orchestrating self-organization in internet-scale systems, where the algorithms and the analysis methodology have strong mutual influence will significantly change the way future such systems are developed and evaluated. We envision that such an approach will particularly serve as an important tool for the nascent but fast moving P2P systems research and development community

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Summary of Research 1994

    Get PDF
    The views expressed in this report are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.This report contains 359 summaries of research projects which were carried out under funding of the Naval Postgraduate School Research Program. A list of recent publications is also included which consists of conference presentations and publications, books, contributions to books, published journal papers, and technical reports. The research was conducted in the areas of Aeronautics and Astronautics, Computer Science, Electrical and Computer Engineering, Mathematics, Mechanical Engineering, Meteorology, National Security Affairs, Oceanography, Operations Research, Physics, and Systems Management. This also includes research by the Command, Control and Communications (C3) Academic Group, Electronic Warfare Academic Group, Space Systems Academic Group, and the Undersea Warfare Academic Group
    corecore