Abstract. We investigate all possible combinations of re-ordering of read and write instructions and their eects on the correctness of programs that are designed for sequential consistency. With certain combinations of re-orderings, any program that accesses shared memory through only reads and writes and that is correct assuming sequential consistency, can be transformed to a new program that does not use any explicit synchronization, and that remains correct in spite of the instruction re-ordering. With other combinations of re-ordering, such transformations do not exist, without resorting to explicit synchronization.
Introduction
Designers of concurrent algorithms typically assume sequential consistency, a consistency model that is formalized by Lamport [11] . Sequential consistency requires that memory operations of all processors appear to be \executed in some sequential order, and the operations of each processor appear in this sequence in the order specied by its program" (program order). Sequential consistency is intuitive, but disallows many possible hardware and software optimizations.
Adve and Gharachorloo [1] identify several optimization techniques that cause instructions to be re-ordered so that they appear to execute out of program order. This is called instruction re-ordering. Write buers with read bypasses, overlapping writes, non-blocking reads, and optimizing compilers can lead to all forms of instruction re-ordering. They also cite many commercial multiprocessors that utilize instruction re-ordering, such as the AlphaServer 8200/8400, Cray T3D/T3E, and SparcCenter 1000/2000 (See Figure 1) . Other examples include the Java Virtual Machine (JVM), IBM PowerPC, Intel Itanium, and .Net. Instruction re-ordering aims at improving the system's performance but it relaxes sequential consistency, making the job of programming multiprocessors even harder.
Multiprocessor machines that incorporate instruction re-ordering are also equipped with more powerful instructions than reads and writes, such as readmodify-write and memory barrier instructions. These synchronization primitives Architecture write-read write-write read-write read-read re-ordering re-ordering re-ordering re-ordering IBM 370 [1] p SPARC TSO [14, 7] p p [10] SPARC PSO [14, 7] p p p [10] p [10] SPARC RMO [14, 5] p p p p IBM PowerPC [2] p p p p DEC Alpha [3, 5] p p p p JVM [12, 6] p p p p
Intel Itanium [9] p p p p
.Net [13] p p p p can be used to enforce orderings on instructions that otherwise might be reordered causing incorrect computation. Using these powerful instructions, however, is expensive; excessive use can result in inecient implementations, possibly defeating the purpose of instruction re-ordering altogether.
Other related studies (see the full version of the paper for a bibliography [8] ) provide programming strategies for high performance multiprocessors most of which rely on the wise usage of synchronization.
Summary of Results
We assume that multiprocessors are coherent [4] , requiring execution order to maintain program order of instructions applied to the same memory location. If a read of one memory location precedes in program order a write to a dierent memory location and this read appears after this write in execution order, this is called read-write re-ordering. Reordering types write-read, write-write, and readread are dened similarly. Call a shared memory multiprocessor program whose shared memory consists of only atomic locations (that is, variables that support only read and write instructions) a (read/write) multi-program. The fundamental question guiding this work is:
Under what conditions is there a general transformation that transforms any read/write multi-program that is correct under sequential consistency to another read/write multi-program that is still correct in spite of possible instruction re-ordering? Such a transformation is called a read/write transformation and constitutes inserting only additional read and write operations to a given read/write multiprogram, which solves some problem under sequential consistency, but without altering its original semantics. Hence, the transformed multi-program is also a read/write multi-program. The purpose of these additions is to restore program order and maintain sequential consistency in spite of instruction re-ordering. More precisely, let e be an arbitrary read/write multi-program that solves a problem , under sequential consistency. The results of our research are:
1. For any combination of re-ordering types that excludes read-read re-ordering, there exists a read/write transformation, which transforms e to a read/write program e 0 that solves in spite of the re-ordering. The transformation is general; it is correct for any read/write multi-program under any combination of read-write, write-read, and write-write re-orderings. 2. The exclusion of the read-read re-ordering is sucient but not necessary.
For any combination of read-read and write-write re-ordering only, such a read/write transformation still exists. 3. If both read-read and read-write (or both read-read and write-read) reordering combinations are possible, there is no general read/write transformation. Any correct general transformation must use stronger operations than reads and writes, such as read-modify-write and memory barrier instructions, for at least some programs.
Conclusion
The transformations we used are simple and general; they can be applied to any read/write multi-program that is correct for sequential consistency. They are also optimal for general transformations | these that apply to any multi-program that is correct for sequential consistency. However, optimality for general transformations does not necessarily imply optimality for individual multi-program instances. When given a xed instance, it may be possible to apply further optimizations that exploit information from the given multi-program and the problem it solves. Such information (from both programs and problems) is unavailable to general transformers. Our results imply that the IBM PowerPC, DEC Alpha, JVM, and SPARC TSO, PSO, and RMO (Figure 1 ) require the use of explicit synchronization in order to solve certain problems. Hence, one of our future research directions is to augment the target program with memory barrier instructions and to minimize the number of such instructions.
