12 research outputs found
The distributed ASCI supercomputer project
The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project
Source-Level Global Optimizations for Fine-Grain Distributed Shared Memory Systems
¡ rveldema,rutger,ceriel,bal
Runtime Optimizations for a Java DSM Implementation
Jackal is a fine-grained distributed shared memory implementation of the Java programming language. Jackal implements Java’s memory model and allows multithreaded Java programs to run unmodified on distributed-memory systems. This paper focuses on Jackal’s runtime system, which implements a multiple-writer, home-based consistency protocol. Protocol actions are triggered by software access checks that Jackal’s compiler inserts before object and array references. We describe optimizations for Jackal’s runtime system, which mainly consist of discovering opportunities to dispense with flushing of cached data. We give performance results for different runtime optimizations, and compare their impact with the impact of one compiler optimization. We find that our runtime optimizations are necessary for good Jackal performance, but only in conjunction with the Jackal compiler optimizations described in [24]. As a yardstick, we compare the performance of Java applications run on Jackal with the performance of equivalent applications that use a fast implementation of Java’s Remote Method Invocation (RMI) instead of shared memory. 1
Source-Level Global Optimizations for Fine-Grain Distributed Shared Memory Systems
This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by most other fine-grain DSM systems. Source-level analysis makes existing accesscheck optimizations (e.g., access-check batching) more effective and enables two novel fine-grain DSM optimizations: object-graph aggregation and automatic computation migration. The compiler detects situations where an access to a root object is followed by accesses to subobjects. Jackal attempts to aggregate all access checks on objects in such object graphs into a single check on the graph's root object. If this check fails, the entire graph is fetched. Object-graph aggregation can reduce the number of network roundtrips and, since it is an advanced form of access-check batching, improves sequential performance. Computation migration (or function shipping) is used to optimize critical sections in which a single processor owns both the shared data that is accessed and the lock that protects the data. It is usually more efficient to execute such critical sections on the processor that holds the lock and the data than to incur multiple roundtrips for acquiring the lock, fetching the data, writing the data back, and releasing the lock. Jackal's compiler detects such critical sections and optimizes them by generating single-roundtrip computation-migration code rather than standard data-shipping code. Jackal's optimizations improve both sequential and parallel application performance. On average, sequential execution times of instrumented, optimized programs are within 10% of those of uninstrumented programs. Application speedups usually improve ..