4,400 research outputs found
Towards compliant distributed shared memory
Copyright © 2002 IEEEThere exists a wide spectrum of coherency models for use in distributed shared memory (DSM) systems. The choice of model for an application should ideally be based on the application's data access patterns and phase changes. However, in current systems, most, if not all of the parameters of the coherency model are fixed in the underlying DSM system. This forces the application either to structure its computations to suit the underlying model or to endure an inefficient coherency model. This paper introduces a unique approach to the provision of DSM based on the idea of compliance. Compliance allows an application to specify how the system should most effectively operate through a separation between mechanism, provided by the underlying system, and policy, pro-vided by the application. This is in direct contrast with the traditional view that an application must mold itself to the hard-wired choices that its operating platform has made. The contribution of this work is the definition and implementation of an architecture for compliant distributed coherency management. The efficacy of this architecture is illustrated through a worked example.Falkner, K. E.; Detmold, H.; Munro, D. S.; Olds, T
Distributed shared memory for virtual environments
Bibliography: leaves 71-77.This work investigated making virtual environments easier to program, by designing a suitable distributed shared memory system. To be usable, the system must keep latency to a minimum, as virtual environments are very sensitive to it. The resulting design is push-based and non-consistent. Another requirement is that the system should be scaleable, over large distances and over large numbers of participants. The latter is hard to achieve with current network protocols, and a proposal was made for a more scaleable multicast addressing system than is used in the Internet protocol. Two sample virtual environments were developed to test the ease-of-use of the system. This showed that the basic concept is sound, but that more support is needed. The next step should be to extend the language and add compiler support, which will enhance ease-of-use and allow numerous optimisations. This can be improved further by providing system-supported containers
Translation techniques for distributed-shared memory programming models
This thesis argues that a modular, source-to-source translation system for distributed-shared memory programming models would be beneficial to the high-performance computing community. It goes on to present a proof-of-concept example in detail, translating between Global Arrays (GA) and Unified Parallel C (UPC). Some useful extensions to UPC are discussed, along with how they are implemented in the proof-of-concept translator
Impact of the Consistency Model on Checkpointing of Distributed Shared Memory
In this report, we consider the impact of the consistency model on
checkpointing and rollback algorithms for distributed shared memory. In
particular, we consider specific implementations of four consistency models for
distributed shared memory, namely, linearizability, sequential consistency,
causal consistency and eventual consistency, and develop checkpointing and
rollback algorithms that can be integrated into the implementations of the
consistency models. Our results empirically demonstrate that the mechanisms
used to implement stronger consistency models lead to simpler or more efficient
checkpointing algorithms
A Comparison of Two Paradigms for Distributed Shared Memory
This paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms two systems are described, one using only point-to-point messages, the other using broadcasting as well. The two paradigms and their implementations are described briefly. Their performances on four applications are compared: the travelling-salesman problem, alpha-beta search, matrix multiplication and the all-pairs shortest paths problem. The relevant measurements were obtained on a system consisting of 10 MC68020 processors connected by an Ethernet. For comparison purposes, the applications have also been run on a system with physical shared memory. In addition, the paper gives measurements for the first two applications above when Remote Procedure Call is used as the communication mechanism. The measurements show that both paradigms can be used efficiently for programming large-grain parallel applications, with significant speed-ups. The structured shared data-object model achieves the highest speed-ups and is easiest to program and to debug. KEYWORDS: Amoeba Distributed shared memory Distributed programming Orc
Brief announcement: Distributed shared memory based on computation migration
Driven by increasingly unbalanced technology scaling and power
dissipation limits, microprocessor designers have resorted to increasing
the number of cores on a single chip, and pundits expect
1000-core designs to materialize in the next few years [1]. But how
will memory architectures scale and how will these next-generation
multicores be programmed?
One barrier to scaling current memory architectures is the offchip
memory bandwidth wall [1,2]: off-chip bandwidth grows with
package pin density, which scales much more slowly than on-die
transistor density [3]. To reduce reliance on external memories and
keep data on-chip, today’s multicores integrate very large shared
last-level caches on chip [4]; interconnects used with such shared
caches, however, do not scale beyond relatively few cores, and the
power requirements and access latencies of large caches exclude
their use in chips on a 1000-core scale. For massive-scale multicores,
then, we are left with relatively small per-core caches.
Per-core caches on a 1000-core scale, in turn, raise the question
of memory coherence. On the one hand, a shared memory abstraction
is a practical necessity for general-purpose programming, and
most programmers prefer a shared memory model [5]. On the other
hand, ensuring coherence among private caches is an expensive
proposition: bus-based and snoopy protocols don’t scale beyond
relatively few cores, and directory sizes needed in cache-coherence
protocols must equal a significant portion of the combined size of
the per-core caches as otherwise directory evictions will limit performance
[6]. Moreover, directory-based coherence protocols are
notoriously difficult to implement and verify [7]
- …