Formalizing the Java Memory Model for multithreaded program correctness and optimization by Yang, Yue
F o r m a l i z i n g  t h e  J a v a  M e m o r y  M o d e l  
f o r  M u l t i t h r e a d e d  P r o g r a m  
C o r r e c t n e s s  a n d  O p t i m i z a t i o n
Yue Yang, Ganesh Gopalakrishnan, and 
Gary Lindstrom
UUCS-02-011
School of Computing 
University of Utah 
Salt Lake City, U T  84112, U S A
April 2, 2002
A bstract
Standardized language level support for threads is one of the most important features of Java. However, 
defining and understanding the Java M e m o r y  Model ( J M M )  has turned out to be a big challenge. Several 
models produced to date are not as easily comparable as first thought. Given the growing interest in multi­
threaded Java programming, it is essential to have a sound framework that would allow formal specification 
and reasoning about the J M M .
This paper presents the Uniform M e m o r y  Model ( U M M ) ,  a formal m e m o r y  model specification frame­
work. Wit h  a flexible architecture, it can be easily configured to capture different shared m e m o r y  semantics 
including both architectural and language level m e m o r y  models. Based on guarded commands, U M M  is 
integrated with a model checking utility, providing strong built-in support for formal verification and pro­
g ram analysis. A  formal specification of the J M M  following the semantics proposed by M a n s o n  and P u g h  
is presented in U M M .  Systematic analysis has revealed interesting properties of the proposed semantics. In 
addition, several mistakes from the original specification have been uncovered.
F o r m a l i z i n g  t h e  J a v a  M e m o r y  M o d e l  f o r  M u l t i t h r e a d e d  P r o g r a m
C o r r e c t n e s s  a n d  O p t i m i z a t i o n
Yue Yang, Ganesh Gopalakrishnan, and Gary Lindstrom 
School of Computing, University of Utah 
{y yan g  | ganesh | gary }@ cs.utah .edu
Abstract
Standardized language level support for threads is one of the most important features of Java. H o w ­
ever, defining and understanding the Java Memory Model (JMM) has turned out to be a big challenge. 
Several models produced to date are not as easily comparable as first thought. Given the growing interest 
in multithreaded Java programming, it is essential to have a sound framework that would allow formal 
specification and reasoning about the J M M .
This paper presents the Uniform Memory Model (UMM), a formal memory model specification frame­
work. With a flexible architecture, it can be easily configured to capture different shared memory se­
mantics including both architectural and language level memory models. Based on guarded commands,
U M M  is integrated with a model checking utility, providing strong built-in support for formal verification 
and program analysis. A  formal specification of the J M M  following the semantics proposed by Manson 
and Pugh is presented in U M M .  Systematic analysis has revealed interesting properties of the proposed 
semantics. In addition, several mistakes from the original specification have been uncovered.
1  I n t r o d u c t i o n
Java programmers routinely rely on threads for structuring their programming activities, sometimes even 
without explicit awareness. As future hardware architectures become more aggressively parallel, multi­
threaded Java also provides an appealing platform for high performance application development, especially 
for server applications. T he Java M e m o r y  Model (JMM), which specifies h o w  threads interact with each 
other in a concurrent system, is a critical component in the Java threading system. It imposes significant 
implications to a broad range of engineering activities such as programming pattern developments, compiler 
optimizations, Java virtual machine ( J V M )  implementations, and architectural designs.
Unfortunately, developing a rigorous and intuitive J M M  has turned out to be a big challenge. The 
existing J M M  is given in Chapter 17 of the Java Language Specification [1]. As summarized by P u g h  [2], 
it is flawed and very hard to understand. O n  the one hand, it is too strong and prohibits m a n y  c o m m o n  
optimization techniques. O n  the other hand, it is too weak and compromises safety guarantees.
T he need for improvements in J M M  has stimulated broad research interests. T w o  n e w  semantics have 
been proposed for Java threads, one by M a n s o n  and P u g h  [3], the other by Maessen, Arvind, and Shen [4]. 
W e  refer these two proposals as J M M M P  and J M M c r f  respectively in this paper. Th e  J M M  is currently 
under an official revisionary process [5] and will be replaced in the future. There is also an ongoing discussion 
in the J M M  mailing list [6].
Although [3] and [4] have initiated promising improvements on Java thread semantics, the specification 
framework can be enhanced in several ways. O n e  area of improvement is towards the support of formal 
verification. Being able to provide a concise semantics is only part of the goal. People also need to reason their 
programs against the J M M  for compliance. Multithreaded programming is notoriously difficult. Developing 
efficient and reliable compilation techniques for multithreading is also hard. T he difficulty of being able to 
understand and reason about the J M M  has become a major obstacle for allowing Java threading to reach 
its full potential. Although finding an ultimate solution is not an easy task, integrating formal verification 
techniques does provide an encouraging first step towards this goal.
Another problem is that both proposals are somewhat limited to the data structures chosen for their 
specific semantics. Since they use totally different notations, it is hard to formally compare the two models. 
In addition, none of the proposals can be easily re-configured to support different desired m e m o r y  model 
requirements. J M M c r f  inherits the architecture from its predecessor hardware model [7]. Java m e m o r y  
operations have to be divided into fine grained Commit/Reconcile/Fence (CRF) instructions to capture 
the precise thread semantics. This translation process adds unnecessary complexities for describing m e m o r y  
properties. O n  the other hand, the dependency on cache based architecture prohibits it from describing more 
relaxed models. J M M m P uses multiset structures to record the history of m e m o r y  operations. In stead of 
explicitly specifying the intrinsic m e m o r y  model properties, e.g., the ordering rules, it resorts to nonintuitive 
mechanisms such as splitting a write instruction and using assertions to enforce certain conditions. While this 
is sufficient to express the proposed synchronization mechanism, adjusting it to specify different properties 
is not trivial.
Similar to any software engineering activities, designing a m e m o r y  model involves a repeated process of 
fine-tuning and testing. Therefore, a generic specification framework is needed to provide such flexibilities. 
In addition, a uniform notation is desired to help people understand the differences a m o n g  different models.
In this paper, we  present the Uniform M e m o r y  Model ( U M M ) ,  a formal framework for m e m o r y  model 
specification. It explicitly specifies the intrinsic m e m o r y  model properties and allows one to configure t hem at 
ease. It is integrated with a Model Checking tool using Mur^>, facilitating formal analysis of corner cases. To 
aid program analysis, it extends the scope of traditional m e m o r y  models by including the state information 
of thread local variables. This enables source level reasoning about program behaviors. Th e  J M M  based 
on the semantics from J M M m P is formally specified and studied using U M M .  Subtle design flaws from the 
proposed semantics are revealed by our systematic analysis using idiom-driven test programs.
W e  review the related work in the next section. T h e n  we  discuss the problems of the current J M M  specifi­
cation. It is followed by an introduction of J M M m P . O ur formal specification of the J M M  in U M M ,  primarily 
based on the semantics proposed in J M M m P , is described in Section 5. In Section 6, we discuss interesting 
results and compare J M M m P with J M M c r f . W e  conclude and explore future research opportunities in 
Section 7 . A n  equivalence proof between our model and J M M m P  is outlined in the Appendix.
2  R e l a t e d  W o r k
A  memory model describes h o w  a m e m o r y  system behaves on m e m o r y  operations such as reads and writes. 
M u c h  previous research has concentrated on the processor level m e m o r y  models. O n e  of the strongest 
m e m o r y  models for multiprocessor m e m o r y  systems is Sequential Consistency [8]. M a n y  weaker m e m o r y  
models [9] have been proposed to enable optimizations. O n e  of t hem is Lazy Release Consistency [10], 
where synchronization is performed by releasing and acquiring a lock. W h e n  a lock is released, all previous 
operations need to be m a d e  visible to other processors. W h e n  the lock is subsequently acquired by another 
processor, that processor needs to reconcile with the shared m e m o r y  to get the updated data. Lazy Release 
Consistency requires an ordering property called Coherence. Using the definition given by [11], Coherence 
requires a total order a m o n g  all write instructions at each individual address. Furthermore, this total order 
respects the program order from each processor. This requirement is further relaxed by Location Consistency 
[12]. T he write operations in Location Consistency are only “partially” ordered if they are issued by the 
same processor or if they are synchronized through locks. W i t h  the verification capability in U M M ,  we can 
formally compare the J M M  with some of these conventional models.
T o  categorize different m e m o r y  models, Collier [13] specified the m  based on a formal theory of m e m o r y  
ordering rules. Architectural testing programs can be executed on a target system to test these orderings. 
Using methods similar to Collier’s, Gharachorloo et al. [11] [14] developed a generic framework for specifying 
the implementation conditions for different m e m o r y  consistency models. T he shortcoming of their approach 
is that it is nontrivial for people to infer program behaviors from a c o m p o u n d  of ordering constraints.
Park and Dill [15] developed an executable specification framework with formal verification capabilities 
for the Relaxed Memory Order ( R M O  [16]) [17]. W e  extended this method to the domain of the J M M  in our 
previous work on the analysis of J M M crf [18]. After adapting J M M crf to an executable specification, we 
exercised the model with a suite of test programs to reveal pivotal properties and verify c o m m o n  program­
ming idioms. Roychoudhury and Mitra [19] also applied the same technique to verify the existing J M M ,
3
Figure 1: Architecture of the existing Java M e m o r y  Model
achieving similar success. However, these previous executable specifications are all restricted to the specific 
architectures of their target m e m o r y  models. U M M  provides a generic abstraction mechanism for capturing 
different m e m o r y  consistency requirements into a formal executable specification.
3  P r o b l e m s  o f  t h e  E x i s t i n g  J M M
T he existing J M M  uses a m e m o r y  hierarchy illustrated in Figure 1. In this framework, every variable has 
a working copy stored in the working memory. Eight actions are defined. As a thread executes a program, 
it operates on the working copies of variables via use, assign, lock, and unlock actions as dictated by the 
semantics of the program it is executing. Data transfers performed by J V M  between the main m e m o r y  
and the working m e m o r y  are not atomic. A  read action initiates the activity of fetching a variable from 
main m e m o r y  and is completed by a corresponding load action. Similarly, a store action initiates the 
activity of writing a variable to main m e m o r y  and is committed by a corresponding write action. Th e  lock 
and unlock actions enforce a synchronization mechanism similar to Lazy Release Consistency. T he current 
J M M  informally describes sets of rules to impose constraints to the actions. There are m a n y  non-obvious 
implications that can be deduced by combining different rules. As a result, this framework is hard to 
understand and the lack of rigor in specification has led to some flaws as listed below.
• Strong ordering restrictions prohibit standard compiler optimizations.
T he existing J M M  requires a total order for operations on each individual variable [20]. Because of 
this requirement, important compiler optimization techniques such as fetch elimination are prohibited. 
Consider figure 2, where p.x and q.x m a y  become the same variable due to aliasing during execution. 
The statement k =  p.x can not be replaced by k =  i by the compiler because a total order a m o n g  
operations on the same variable is required. As a result, adding a seemingly innocuous read instruction 
j =  q.x introduces additional constraints. This is an annoying side effect in a threading system because 
people need to be able to add debugging read instructions without changing program behaviors. This 
ordering restriction is actually ignored by some commercial J V M  implementations.
Initially, p.x = =  0











p.x =  1; 
q =  p;
Problem: k =  p.x can not be replaced by k =  i 
Figure 2: Current J M M  prohibits fetch elimination
4
Initially, p = =  null
Thread 1 Thread 2
synchronized(this) { 
p =  n e w  Point(1,2);
}
if(p != null) { 
r =  p.x;
}
Finally, 
can result in r = =  0
Figure 3: Current J M M  allows premature release of object reference
• The existing JMM prohibits the removal of “redundant” synchronizations.
T he present J M M  requires a thread to flush all variables to main m e m o r y  before releasing a lock. 
Because of this strong requirement on visibility effect, a synchronization block can not be optimized 
away even if the lock it owns is thread local.
• Java safety might be compromised.
T he existing J M M  does not guarantee an object to be fully initialized by its constructor before the 
returned object reference is visible to other threads if there exists a race condition, which might only 
occur under some weak m e m o r y  architectures such as Alpha. Take Figure 3 as an example, w hen 
thread 2 fetches the object field without locking, it might obtain uninitialized data in the statement r 
=  p.x even if p is not null. Although this loophole is an extremely rare corner case, it does have serious 
consequences. Java safety is compromised since it opens the security hole to malicious attacks via race 
conditions. In particular, m a n y  Java objects, such as a String object, are designed to be immutable. If 
default values before initialization can be observed, the object becomes mutable. Furthermore, popular 
programming patterns, such as the double-checked locking [21] algorithm, are broken under the existing 
J M M  due to the same problem.
• Semantics for final variables is omitted.
Being able to declare a variable as a constant is a useful feature in multithreading systems because it 
offers more compilation flexibility. Unfortunately, the existing J M M  does not mention final variables. 
In fact, final variables have to be reloaded every time at a synchronization point.
• Volatile variables are not useful enough.
T he existing J M M  requires operations on volatile variables to be Sequentially Consistent. But volatile 
variable operations do not affect visibility on normal variable operations. Therefore, volatile and non­
volatile operations can be reordered. In traditional languages such as C, volatiles are used in device 
drivers for accessing m e m o r y  m a p p e d  device registers. A  volatile modifier tells the compiler that the 
variable should be reloaded for each access. In Java, low level device access is no longer a priority. 
Volatile variables are mostly used as synchronization flags. Because the existing volatile semantics 
does not offer sufficient synchronization constraints on normal variables, it is not intuitive to use in 
practice. Consequently, m a n y  J V M  implementations do not comply with the present specification.
4  S e m a n t i c s  P r o p o s e d  b y  M a n s o n  a n d  P u g h
In order to fix the problems listed in Section 3, J M M M P  is proposed as a replacement semantics for Java 
threads. After extensive discussions and debates through the J M M  mailing list, some of the thread properties 
have emerged as leading candidates to appear in the n e w  J M M .
4.1 Desired Properties
• I t  should enable the removal of “redundant” synchronizations.
5
Similar to the existing J M M ,  J M M M P  uses a release/acquire process for synchronization. However, 
the visibility restrictions are m u c h  relaxed. Instead of permanently flushing all variables w h e n  a lock 
is released, visibility states are only synchronized through the same lock. Consequently, if the lock is 
not used by other threads, the synchronization can be removed since it would never cause any visibility 
effects.
• I t  should relax the total order requirement for operations on the same variable.
J M M M P  essentially follows Location Consistency, which only requires a “partial” order a m o n g  write in­
structions on the same variable established through the same thread or synchronization. Most standard 
compiler optimizations such as fetch elimination are enabled.
• It should maintain safety guarantees even under race conditions.
J M M M P  guarantees that all final fields can be initialized properly. To design an immutable object, it 
is sufficient to declare all its fields as final fields. Variables other than final fields are allowed to be 
observed prematurely.
• It should specify reasonable semantics for final variables.
A  final field v is only initialized once in the constructor of its containing object. At the end of the 
constructor, v is frozen before the reference of the object is returned. If the final variable is improperly 
exposed to other threads before it is frozen, v is said to be a pseudo-final field. Another thread would 
always observe the initialized value of v unless it is pseudo-final, in which case it can also obtain the 
default value.
• It should make volatile variables more useful.
J M M M P  proposes two changes to the volatile variable semantics. O n e  is weaker and the other is 
stronger comparing to the original J M M .  First, the ordering requirement for volatile operations is 
relaxed to allow non-atomic volatile writes. Second, the release/acquire semantics is added to volatile 
variable operations to achieve synchronization effects for normal variables. A  write to a volatile field 
acts as a release and a read of a volatile field acts as an acquire.
4.2 J M M mp Notations
J M M M P  is based on an abstract global system that executes one operation from one thread in each step. A n  
operation corresponds to a J V M  opcode. Actions occur in a total order which respects the program orders in 
each thread. T he only ordering relaxation explicitly allowed is for prescient writes under certain conditions.
4.2.1 D a t a  Structures
A  write is defined as a unique tuple of (variable, value, Q U ID ). J M M M P  uses the multiset data structure to 
store history information of m e m o r y  activities. In particular, the allWrites set is a global set that records 
every write events that have occurred. Every thread, monitor, or volatile variable k also maintains two local 
sets, overwritten^ and previousk. Th e  former stores the obsolete writes that are k n o w n  to k. T he latter 
keeps all previous writes that are k n o w n  to k. W h e n  a variable v is created, a write w with the default 
value of v is added to the allWrites set and the previous set of each thread. Every time a n e w  write is 
issued, writes in the thread local previous set become obsolete to that thread and the n e w  write is added 
to the previous set and the allW rites set. W h e n  a read action occurs, the return value is chosen from the 
allW rites set. But the writes stored in the overwritten set of the reading thread are not eligible results.
4.2.2 Prescient W r i t e
A  write w m a y  be performed presciently, i.e., executed early, if (a) w is guaranteed to happen, (b) w can 
not be read from the same thread before where w would normally occur, and (c) any premature reads of w 
from other threads must not be observable by the thread that issues w via synchronization before where w 
would normally occur. To  capture the prescient write semantics, a write action is splitted into initWrite and 
performWrite. Special assertion is used in performWrite to ensure that the prescient write conditions are met.
6
Prescient reads do not need to be explicitly specified. Eligible reordering of read instructions can be 
deduced as long as it does not result in an illegal execution.
4.2.3 Synchronization M e c h a n i s m
T he thread local overwritten and previous sets are synchronized between threads through the release/acquire 
process. A  release operation passes the local sets from a thread to a monitor. A n  acquire operation passes 
the local sets from a monitor to a thread. A n y  non-synchronized write instruction on the same variable from 
another thread is an eligible write for a read request.
4.2.4 N o n - a t o m i c  Volatile Writes
Non-atomic volatile writes enable writes on different variables to arrive at different threads in different orders. 
To capture the semantics, a volatile write is splitted into two consecutive instructions, initVolatileWrite and 
performVolatileWrite. If thread t1 has issued initVolatileWrite but has not completed performVolatileWrite, no 
other thread can issue initVolatileWrite on the same variable. During this interval, another thread t2 can 
observe either the n e w  value or the previous value of the volatile variable. As  soon as t2 sees the n e w  value, 
however, t2 can no longer observe the previous value. W h e n  performVolatileWrite is completed, no thread 
can see the previous value.
4.2.5 Final Field Semantics
A  very tricky issue in final field semantics arises from the fact that Java does not allow array elements to 
be declared as final. For example, the implementation of a String class m a y  use a final field r to point to 
its internal character array. Because the elements pointed by r can not be declared as final, another thread 
might be able to observe their default values even if they have been initialized before r is frozen.
J M M M P  proposes to add a special guarantee to these elements that are referenced by a final field. The 
visible state of such an element must be captured w h e n  the final field is frozen and later synchronized to 
another thread w h e n  these elements are accessed through the final field. Therefore, every final variable 
v is treated as a special lock. A  special release is performed w h e n  v is frozen. Subsequently, an acquire 
is performed w h e n  v is read to access its sub-fields. Wit h  this mechanism, an immutable object can be 
implemented by declaring all its fields as final. If any field is a reference to an array or object, it is sufficient 
to just declare this reference as final.
Adding this special final field requirement substantially complicates J M M M P  because synchronization 
information needs to be passed between the constructing thread and every object pointed by a final field. 
T he variable structure is extended to a local, which is a tuple of (a, oF, kF) where a is the reference to an 
object or a primitive value, o F  is the overwritten set caused by freezing the final fields, and k F  is a set 
recording what variables are k n o w n  to have been frozen. Whenever a final object is read, its knownFrozen 
set associated with its initializing thread is synchronized to the reference of the final object. This allows any 
subsequent access to its sub-field to k n o w  if the sub-field has been initialized.
5  S p e c i f y i n g  J M M m p  U s i n g  U M M
In this section we present a formal specification of the Java M e m o r y  Model using the U M M  framework. The 
J M M  semantics, except for rules of final fields and control dependency, is based on J M M M P . J M M m P has 
two versions. [3] is an evolving specification that describes the full semantics of Java threads and [22] is a 
core subset of it. T he one we  use in U M M  is based on the latest revision of [3] dated as January 11, 2002. 
Although w e  follow the specific rules outlined by J M M M P , the exact semantics can be easily adjusted to 
meet different m e m o r y  model requirements. T he equivalence proof of the semantics, except for volatile and 
final fields, is given in the Appendix.
5.1 Overview
T he U M M  uses an abstract machine to define thread interactions in a shared m e m o r y  environment. M e m o r y  




| LIB. LV. |
i j
GIB
i^ t i  i i i i
LK
Figure 4: T he U M M  architecture
specific conditions are satisfied. A  transition table defines all possible events along with their corresponding 
conditions and actions for the abstract machine.
At any given step, any legal event m a y  be nondeterministically chosen and atomically completed by 
the abstract machine. Th e  sequence of permissible actions from various threads constitutes an execution. 
A  m e m o r y  model M  is defined by all possible executions allowed by the abstract machine. A n  actual 
implementation, I m ,  m a y  choose different architectures and optimization techniques as long as the executions 
allowed by I m  are also permitted by M .
5.2 The Architecture
As shown in Figure 4, each thread k has a local instruction buffer LIBk that stores its pending instructions 
in program order. It also maintains a set of local variables in a local variable array LVk. Each element 
LVk [v] contains the data value of the local variable v. LIBk and LVk are not directly exposed to other 
threads. Thread interactions are communicated through a global instruction buffer GIB, which is visible to 
all threads. GIB stores all previously completed write and synchronization instructions. In general, a read 
instruction completes w h e n  the return value is bound to its target local variable. A  write or synchronization 
instruction completes w h e n  it is added to the global instruction buffer. A  multithreaded program terminates 
w h e n  all instructions from all threads complete.
T he usage of LIB and G I B  is motivated by the observation that local ordering rules and global observability 
rules are two pivotal properties for understanding thread behaviors. T he former dictates when an instruction 
can be issued by a thread and the latter determines what value can be read back. In U M M ,  these properties 
are explicitly specified as conditions in the transition table.
T he local instruction buffers can be used to represent effects caused by both instruction scheduling and 
data replication. Therefore, there is no need for intermediate layers such as cache.
Although we can store all necessary bookkeeping information in LV, LIB, and GIB to describe any impor­
tant thread properties, a dedicated global lock array LK is also used for clarity. Each element LK[1] is a tuple 
(count, owner), where count is the number of recursive lock acquisitions and owner records the thread that 
owns the lock I.
5.3 Definitions 
Definition 1 Variable
A global variable in UMM refers to a static field of a loaded class, an instance field of an allocated object, 
or an element of an allocated array in Java. It can be further categorized as a normal, volatile, or final 
variable. A local variable in UMM corresponds to a Java local variable or an operand stack location.
LVi LIB±
I l l M  ~
_ ‘_ _ S n
8
Definition 2 Instruction
An instruction i is represented by a tuple (t,pc, op, var, data, local, useLocal, useNew, lock, time) where
t(i) =  t: 
pc(i) =  pc: 
op(i) =  op: 
var(i) =  var: 
data(i) =  data: 
local(i) =  local:
useLocal(i) =  useLocal:
useNew(i) =  useNew: 
lock(i) =  lock: 
time(i) =  time:
thread that issues the instruction 
program counter of the instruction 
instruction operation type 
variable operated by the instruction 
data value in a write instruction
local variable used to store the return value in a read instruction or 
local variable to provide the value in a write instruction 
tag in a write instruction i indicating whether the write value data(i) needs 
to be obtained from the local variable local(i)
tag in a read volatile instruction to support non-atomic volatile writes 
lock in a lock or an unlock instruction
global counter incremented each time when a local instruction is added to GIB
5.4 N eed  for Local Variable Information
Because traditional m e m o r y  models are designed for processor level architectures, aiding software program 
analysis is not a c o m m o n  priority in those specifications. The y  only need to describe h o w  data can be shared 
between different processors through the main memory. Consequently, a read instruction is usually retired 
immediately w h e n  the return value is obtained. Following the same style, neither J M M M P  nor J M M c r f  
keeps the returned values from read operations. However, Java poses a n e w  challenge to m e m o r y  model 
specification with an integrated threading system as part of the programming language. In Java, most 
programming activities such as computation, flow control, and method invocation, are carried out using 
local variables. Programmers have a clear need for understanding m e m o r y  model implications caused by 
the nondeterministically returned values in local variables. Therefore, it is desired to extend the scope of 
the m e m o r y  model by recording the values obtained from read instructions as part of the global state of the 
transition system.
Based on this observation, we  use local variable arrays to keep track thread local variable information. 
Not only does this reduce the gap between program semantics and m e m o r y  model semantics, it also provides 
a clear delimitation between them. This allows us to define the J M M  at the Java byte code level as well as the 
source program level, giving Java programmers an end-to-end view of the m e m o r y  consistency requirement.
5.5 M em ory Operations
A  global variable in Java is represented by object.field, where object is the object reference and field is the field 
name. In this paper, the object.field entity is abstracted to a single variable v. W e  also follow a convention 
that uses a, b, c to represent global variables, r1, r2, r3 to represent local variables, and 1, 2, 3 to represent 
primitive values.
A  read operation on a global variable corresponds to a Java program instruction r1 =  a. It always has 
a target local variable to store the returned data. A  write operation on a global variable can have two 
formats, a =  r1 or a = 1 ,  depending on whether the useLocal tag is set or not. T he data value of the 
write instruction is obtained from a local variable in the former case and is provided by the instruction 
directly in the latter case. T he format a =  r 1 allows one to examine the data flow implications caused by 
the non-determinism from m e m o r y  behaviors. If all write instructions have useLocal =  false and all read 
instructions use different local variables, the U M M  degenerates to the traditional models that do not keep 
local variable information.
T he local variables are not initialized. Java requires them to be assigned before being used. This is 
implicitly enforced in U M M  by data dependency on local variables.
Since we  are defining the m e m o r y  model, only m e m o r y  operations are identified in our transition system. 
Instructions such as r1 =  1 and r1 =  r2 +  r3 are not included. However, the U M M  framework can be easily 
upgraded to a full blown program analysis system by adding semantics for computational instructions.
Lock and unlock instructions are injected as dictated by Java synchronized keyword. T h e y  are used to 
model the mutual exclusion effect as well as the visibility effect.
9
E v e n t Condition Action
readNormal 3i £ L I B t(j) : ready(i) A op(i) =  ReadNormal A 
































writeNormal 3i £ L I B t(j) : ready(i) A op(i) =  WriteNormal if (useLoca/(i))
i.data := L V t(j)[/oca/(i)];
end;
G I B  := append(GIB,i); 
LIBt(i) := de/ete(LIBt(i),i);
lock 3i £ L I B t(j) : ready(i) A op(i) =  Lock A 
(LK[/ock(i)] .count =  0 v 
LK[/ock(i)].owner =  t(i))
LK[/ock(i)].count := LK[/ock(i)].count +  1; 
LK[/ock(i)].owner := t(i);
G I B  := append(GIB,i);
LIBt(i) := de/ete(LIBt(i),i);
unlock 3i £ L I B t(j) : ready(i) A op(i) =  Unlock A 
(LK[/ock(i)] .count >  0 A 
LK[/ock(i)].owner =  t(i))
LK[/ock(i)].count := LK[/ock(i)].count — 1; 
G I B  := append(GIB,w);
LIBt(i) := de/ete(LIBt(i),i);
readVolatile 3i £ L I B t(j) : ready(i) A op(i) =  ReadVolatile A 
(3w £ G I B  :
(/ega/O/dWrite(i, w) v /ega/NewWrite(i, w)))
LVt(i)[/oca/(i)] := data(w); 
if (/ega/NewWrite(i, w)) 
i.useNew := true;
end;
G I B  := append(GIB,i); 
L I B t(i) : de/ete(LIBt(i) ,i);
writeVolatile 3i £ L I B t(j) : ready(i) A op(i) =  WriteVolatile if (useLoca/(i))
i.data := L V t(i)[/oca/(i)];
end;
G I B  := append(GIB,i); 
L I B t(i) : de/ete(LIBt(i) ,i);
readFinal 3i £ L I B t(j) : ready(i) A op(i) =  ReadFinal A 
(3w £ G I B  : /ega/Fina/Write(i, w))
LVt(i)[/oca/(i)] := data(w); 
L I B t(i) : de/ete(LIBt(i) ,i);
writeFinal 3i £ L I B t(j) : ready(i) A op(i) =  WriteFinal if (useLoca/(i))
i.data := L V t(i) [/oca/(i)];
end;
G I B  := append(GIB,i);
L I B t(i) : de/ete(LIBt(i) ,i);
freeze 3i £ L I B t(j) : ready(i) A op(i) =  Freeze G I B  := append(GIB,i); 
L I B t(i) : de/ete(LIBt(i) ,i);
Table 1: Transition Table
2nd ^ Read Write Lock Unlock Read Write Read Write Freeze
1st ^ Normal Normal Volatile Volatile Final Final
R ead Normal no diffVar no no no no no no no
Write Normal no yes no no no no no no no
Lock no no no no no no no no no
Unlock no yes no no no no no no no
R ead Volatile no no no no no no no no no
Write Volatile no yes no no no no no no no
R ead Final no yes no no no no no no no
Write Final no yes no no no no no no no
Freeze no no no no no no no no no
Table 2: The Bypassing Table (Table BYPASS)
10
Finally, a special Freeze instruction for every final field v is added at the end of the constructor that 
initializes v to indicate v has been frozen.
5.6 Initial Conditions
Initially, instructions from each thread are added to the local instruction buffers according to their original 
program order. T he useNew fields are set to false. GIB is initially cleared. T h e n  for every variable v, a 
write instruction winit with the default value of v is added to GIB. A  special thread ID tinit is assigned in 
Winit. Finally, the count fields in LK are set to 0.
After the abstract machine is set up, it operates according to the transition table specified in Table 1. 
T he conditions and actions corresponding to m e m o r y  instructions are defined as events in the transition 
table. O ur notation based on guarded c o m m a n d s  has been widely used in architectural models [23], making 
it familiar to m a n y  hardware designers.
5.7 Ordering Rules
T he execution of an instruction i is only allowed w h e n  either all the previous instructions in the same thread 
have been completed or i is permitted to bypass previous pending instructions according to the m e m o r y  
model and local data dependency. This is enforced by condition ready, which is required by every event in 
the transition table.
Condition ready consults the bypassing table B Y P A S S  and guarantees that the execution of an instruction 
would not violate the ordering requirements from the m e m o r y  model. Th e  B Y P A S S  table as shown in Table 2 
specifies the ordering policy between every pair of instructions. A n  entry BYPASS[op1] [op2] indicates whether 
an instruction with type op2 can bypass a previous instruction with type opl, where the value yes permits 
the bypassing, the value no prohibits it, and the value diffVar allows the bypassing only if the the variables 
operated by the two instructions are different and not aliased. J M M m P specifies that within each thread 
operations are usually done in their original order. T he exception is that writes m a y  be done presciently. 
T he straightforward implementation of U M M  follows the same guideline by only allowing normal write 
instructions to bypass certain previous instructions as shown in Table 2. T he equivalence proof in the 
Appendix is based on Table 2. A  more relaxed bypassing policy can also be deduced, which is discussed in 
Section 5.12.
In addition to the ordering properties set by the m e m o r y  model, the data dependency imposed by the 
usage of local variables also need to be obeyed. This is expressed in condition localDependent. T he helper 
function isW rite(i) returns true if the operation of i is WriteNormal, WriteVolatile, or WriteFinal. Similarly, 
isRead(i) returns true if the operation of i is ReadNormal, ReadVolatile, or ReadFinal. These operation types 
are defined with respect to the global variables in the instructions. A  read operation on a global variable 
actually corresponds to a write operation on a local variable.
Condition 1 ready(i)
- 3j £ L I B t(i) : p c (j) <  pc(i) A (localDependent(i, j )  v
BYPASS[op(j)][op(i)] =  no v BYPASS[op(j)][op(i)] =  diffVar A v a r (j) =  var(i))
Condition 2 localDependent(i, j )
t ( j ) =  t ( i )  A local(j) =  local(i) A 
(isW rite(i) A useLocal(i ) A isRead(j ) v 
isW rite (j) A useLocal(j) A isRead(i) v 
isRead(i) A isRead(j ))
5.8 Observability Rules
A  write or a synchronization instruction carries out actions to update the global state of the abstract 
machine. T he state is observed by a read instruction that returns the value previously set by an eligible 
write instruction. Besides the ordering rules, the criteria of choosing legal return values is another critical 
aspect of a m e m o r y  model.
11
T he synchronization mechanism used by J M M M P  plays an important role in selecting legal return values. 
This is formally captured in condition synchronized. Instruction i1 can be synchronized with a previous 
instruction i2 via a release/acquire process, where a lock is first released by t(i2) after i2 is issued and later 
acquired by t(i1) before i1 is issued. Release can be triggered by an Unlock or a WriteVolatile instruction. 
Acquire can be triggered by a Lock or a ReadVolatile instruction.
Condition 3 synchronized(i1, i2)
3l, u £ G I B  : (op(l) =  Lock A op(u) =  Unlock A lock(l) =  lock(u) v
op(l) =  ReadVolatile A op(u) =  WriteVolatile A var(l) =  var(u)) A
t(l) =  t(i1) A (t(u) =  t(i2) v t(i2) =  tinit) A
time(i2) < time(u) A time(u) < tim e(l) A tim e(l) < time(i1)
T he synchronization mechanism follows Location Consistency. It requires an ordering relationship as 
captured in condition LCOrder, which can be established if two instructions are from the same thread or 
if they are synchronized. This ordering relationship is transitive, i.e., i1 and i2 can be synchronized by a 
sequence of release/acquire operations across different threads. Therefore, LCOrder is recursively defined.
Condition 4 LCOrder(i1, i2)
((t(i1) =  t(i2) v t(i2) =  tinit) A pc(i1) >  pc(i2) A var(i1) =  var(i2)) v 
synchronized(i1,i2) v
(3i' £ G I B  : tim e(i') > time(i2) A tim e(i') < time(i1) A LCOrder(i1 ,i') A LC  Order (i' ,i2 ))
Condition legalNormalWrite(r, w) defines whether an instruction w is an eligible write for the read 
request r . w provides a legal return value only if there does not exist another write w' on the same variable 
between r and w such that r is ordered to w' and w' is ordered to w following LCOrder.
Condition 5 legalNormalWrite(r, w)
op(w) =  WriteNormal A var(w) =  var(r) A
(- 3w' £ G I B  : op(w') =  WriteNormal A var(w') =  var(r) A LCOrder(r,w ') A LC  Order (w' ,w))
5.9 N on -A tom ic Volatiles
Conditions legalOldWrite(r, w) and legalNewWrite(r, w) are used to specify the semantics of non-atomic 
volatile write operations. Suppose the value last written to a volatile variable is set by a WriteVolatile 
instruction w. After another write instruction w' is performed on the same variable, a ReadVolatile instruction 
from thread t (w') must always observe the n e w  value set by w' but other threads can get the value either 
from w' or w. However, once a thread sees the n e w  value set by w', that thread can no longer see the 
previous value set by w. A  special tag useNew in the ReadVolatile instruction is used to indicate whether the 
n e w  value has been observed by the reading thread. Furthermore, the n e w  value set by w' is “committed” 
if the writing thread t(w') has completed any other instructions that follow w' in thread t(w').
According to condition legalNewWrite(r,w), any WriteVolatile instruction w can be an legal write if it 
is the most recent write for that variable. Condition legalOldWrite(r,w) specifies that w can also be a legal 
result if (a) w is the second most recent WriteVolatile instruction on the same variable, (b) the most recent 
write has not been “committed” by its writing thread, and (c) the n e w  value has not been observed by the 
reading thread.
Condition 6 legalNewWrite(r, w)
op(w) =  WriteVolatile A var(w) =  var(r) A
(- 3w' £ G I B  : op(w') =  WriteVolatile A var(w') =  var(r) A time(w') > time(w))
Condition 7 legalOldWrite(r, w)
op(w) =  WriteVolatile A var(w) =  var(r) A t(w) =  t(r) A
(3i1 £ G I B  : op(i1) =  WriteVolatile A var(i1) =  var(r) A time(i1) > time(w) A
(- 3i2 £ G I B  : op(i2) =  WriteVolatile A var(i2) =  var(r) A time(i2) > time(w) A time(i2) =  time(i1)) A 
(- 3i3 £ G I B  : t(i3) =  t(i1) A pc(i3) >  pc(i1))) A
(- 3i4 £ G I B  : op(i4) =  ReadVolatile A t(i4) =  t(r) A var(i4) =  var(r) A time(i4) >  time(w) A useNew(i4))
12
5.10 Final Variable Semantics
In Java, a final field can either be a primitive value or a reference to another object or array. W h e n  it is a 
reference, the Java language only requires that the reference itself can not be modified in the Java code after 
its initialization but the elements it points to do not have the same guarantee. Also, there does not exist a 
mechanism in Java to declare array elements as final fields.
As mentioned in Section 4.2.5, J M M m P proposes to add a special requirement for the elements pointed 
by a final field to support an immutable object that uses an array as its field. This requirement is that 
if an element pointed by a final field is initialized before the final field is initialized, the default value of 
this element must not be observable after normal object construction. J M M m P uses a special mechanism to 
“synchronize” initialization information from the constructing thread to the final reference and eventually 
to the elements contained by the final reference. However, without explicit support for immutability from 
the Java language, this mechanism makes the m e m o r y  semantics substantially more difficult to understand 
because synchronization information needs to be carried by every variable. It is also not clear h o w  the exact 
semantics can be efficiently implemented by a Java compiler or a J V M  since it involves runtime reachability 
analysis.
While still investigating this issue and trying to find the most reasonable solution, we  implement a 
straightforward definition for final fields in the current U M M .  It is slightly different from J M M m P in that 
it only requires the final field itself to be a constant after being frozen. T h e  observability criteria for final 
fields is shown in condition legalFinalWrite. T he default value of the final field (when t(w) =  tinit) can 
only be observed if the final field is not frozen. In addition, the constructing thread can not observe the 
default value after the final field is initialized.
Condition 8 legalFinalWrite(r, w)
op(w) =  WriteFinal A var(w) =  var(r) A
(t (w) =  tinit v
(t(w) =  tinit A (- 3i1 £ G I B  : op(i1) =  Freeze A var(i1) =  var(r)) A 
(- 3i2 £ G I B  : op(i2) =  WriteFinal A var(i2) =  var(r) A t(i2) =  t(r))))
5.11 Control Dependency Issues
T he bypassing policy specified in the B Y P A S S  table dictates ordering behaviors of the m e m o r y  operations on 
global variables. Thread local data dependency is formally defined in localDependent. In addition, thread 
local control dependency on local variables should also be respected to preserve the meaning of the Java 
program. However, h o w  to handle control dependency is a tricky issue. A  compiler might be able to remove 
a branch statement if it can determine the control path through program analysis. A  policy needs to be set 
regarding what the criteria is to m a k e  such a decision.
J M M m P identifies some special cases and adds two more read actions, guaranteedRedundantRead and 
guaranteedReadOfWrite which can suppress prescient writes to enable redundant load elimination and forward 
substitution under specific situations. For example, the need for guaranteedRedundantRead is motivated by 
a program shown in Figure 5. In order to allow r2 =  a to be replaced by r2 =  r1 in Thread 1, which would 
subsequently allow the removal of the if statement, r2 =  a must be guaranteed to get the previously read 
value.
Initially, a = =  b = =  0
Thread 1 Thread 2
r1 =  a; r3 =  b;
r2 =  a; a =  r3;
if(r1 = =  r2)
b =  2;
Finally, can r1 = =  r2 = =  r3 = =  2?
Figure 5: Motiation for guaranteedRedundantRead
Although we could follow the same style by adding similar events in U M M ,  we do not believe it is 
a good approach to specify a m e m o r y  model by enumerating special cases for every optimization need.
13
2nd ^  
1st ^
Read Normal Write Normal Lock Unlock Read Volatile Write Volatile
Read Normal yes diffVar yes no yes no
Write Normal diffVar yes yes no yes no
Lock no no no no no no
Unlock yes yes no no no no
Read Volatile no no no no no no
Write Volatile yes yes no no no no
Table 3: T he Relaxed Bypassing Table
Therefore, we propose a clear and uniform policy regarding control dependency: the compiler m a y  remove 
a control statement only if the control condition can be guaranteed in every possible execution, including 
all interleaving results caused by thread interactions. This approach should still provide plenty of flexibility 
for compiler optimizations. If desired, global data flow analysis m a y  be performed. U M M  offers a great 
platform for such analysis. O n e  can simply replace a branch instruction with an assertion. T h e n  the model 
checker can be run to verify whether the assertion might be violated due to thread interactions.
5.12 Relaxing Ordering Constraints
Although J M M M p does not explicitly relax ordering rules except for prescient writes, possible reordering 
can be inferred. As long as the reordering does not result in any illegal execution, an implementation is free 
to do so. In U M M ,  these effects can be directly described in the B Y P A S S  table to provide a more intuitive 
view about what is allowed by the m e m o r y  model. A  high performance threading environment requires 
efficient supports from m a n y  components, such as compilation techniques, cache protocol designs, m e m o r y  
architectures, and processor pipelining. Because more liberal ordering rules provide more optimization 
opportunities at each intermediate layer, it is desired to have a clear view about the allowed reordering.
Table 3 outlines the relaxed bypassing policy for m e m o r y  instructions except for final variable operations. 
It does not cover all possible relaxations but it illustrates some of the obvious ones. A  ReadNormal instruction 
is allowed to bypass a previous WriteNormal instruction operated on a different variable or a ReadNormal 
instruction. Because a presciently performed read instruction would get a value from a subset of the legal 
results, its return data is still valid. A  Lock instruction can bypass previous normal read/write instructions 
and normal read/write instructions can bypass a previous Unlock instruction. This is motivated by the fact 
that it is safe to mov e  normal instructions into a synchronization block since it still generates legal results. 
This relaxation also applies to volatile variable operations which have similar synchronization effects on 
normal variables.
5.13 M u r^  Implementation
T he U M M  is implemented in Mur^> [24], a description language with a syntax similar to C  that enables 
one to specify a transition system based on guarded commands. In addition, M u r ^  is also a model checking 
system that supports exhaustive state space enumeration. This makes it an ideal tool for verifying our shared 
m e m o r y  system.
O u r  Mur^> program consists of two parts. T he first part implements the formal specification of J M M m P , 
which provides a “black box” that defines Java thread semantics. T he transition table in Table 1 is specified 
as Mur^> rules. Ordering rules and observability rules are implemented as Mur^> procedures. T he second part 
comprises a suite of test cases. Each test program is defined by specific Mur^> initial state and invariants. It 
is executed with the guidance of the transition system to reveal pivotal properties of the underlying model. 
O ur system can detect deadlocks and invariant violations. To  examine test results, two techniques can be 
applied. T he first one uses Mur^> invariants to specify that a particular scenario can never occur. If it does 
occur, a violation trace can be generated to help understand the cause. T he second technique uses a special 
“thread completion” rule, which is triggered only w h e n  all threads are completed, to output all possible final 
results. O ur executable specification is a configurable system that enables one to easily set up different test
14
programs, abstract machine parameters, and m e m o r y  model properties. Running on a P C  with a 900 M H z  
Pentium III processor and 256 M B  of R A M ,  most of our test programs terminate in less than 1 second.
6  A n a l y s i s  o f  J M M m p
B y  systematically exercising J M M M P  with idiom-driven test programs, w e  are able to gain a lot of insights 
about the model. Since we have developed formal executable models for both J M M CRF [18] and J M M M P , 
w e  are able to perform a comparison analysis by running the same test programs on both models. This can 
help us understand subtle differences between them. As an ongoing process of evaluating the Java M e m o r y  
Models, we  are continuing to develop more comprehensive test programs to cover more interesting properties. 
In this section we highlight some of the interesting findings based on our preliminary results.
6.1 Ordering Properties
6.1.1 C o h e r e n c e
J M M M P  does not require Coherence. This can be detected by the program shown in Figure 6. If r1 =  2 and 
r 2 =  1 is allowed, the two threads have to observe different orders of writes on the same variable a, which 
violates Coherence. For a normal variable a, this result is allowed by J M M M P  but prohibited by J M M C R F .
Initially, a = =  0
Thread 1 Thread 2
a = 1 ;
r1 =  a;
a =  2; 
r2 =  a;
Finally,
can it result in r1 = =  2 and r2 = =  1? 
Figure 6: Coherence Test
6.1.2 W r i t e  Atomicity for N o r m a l  Variables
J M M M P  does not require Write Atomicity. This can be revealed from the test in Figure 7. For a normal 
variable a, the result in Figure 7 is allowed by J M M M P  but forbidden by J M M C R F . Because the C R F  
model uses the shared m e m o r y  as the rendezvous point between threads and caches, it has to enforce Write 
Atomicity.
Initially, a = =  0
Thread 1 Thread 2
a = 1 ;  
r1 =  a; 
r2 =  a;
a =  2; 
r3 =  a; 
r4 =  a;
Finally,
can it result in r1 = =  1, r2 = =  2, r3 = =  2, and r4 = =  1? 
Figure 7: Write Atomicity Test
6.2 Synchronization Mechanism
J M M M P  follows Location Consistency, which does not require Coherence. W h e n  thread t issues a read 
instruction, any previous unsynchronized writes on the same variable issued by other threads can be observed, 
in any order. Therefore, J M M M P  is strictly weaker than Lazy Release Consistency. Without synchronization, 
thread interleaving m a y  result in very surprising results. A n  example is shown in Figure 8.
15
Initially, a = =  0
Thread 1 Thread 2
a = 1 ;  











can it result in r1 = =  r3 = =  1 and r2 = =  2?
Figure 8: Legal Result under Location Consistency
6.3 Constructor Property
T he constructor property is studied by the program in Figure 9. Thread 1 simulates the constructing 
thread. It initializes the field before releasing the object reference. Thread 2 simulates another thread trying 
to access the object field without synchronization. M e m b a r 1  and M e m b a r 2  are some hypothetic m e m o r y  
barriers that prevents instructions from acrossing them, which can be easily implemented in our program 
by simply setting some test specific bypassing rules. This program essentially simulates the construction 
mechanism used by J M M c r f , where M e m b a r 1  is a special EndCon instruction indicating the completion of 
the constructor and M e m b a r 2  is the data dependency enforced by program semantics w h e n  accessing fie/d 
through reference. If fie/d is a normal variable, this mechanism works under J M M c r f  but fails under 
J M M m P . In J M M m P the default write to fie/d is still a valid write since there does not exists an ordering 
requirement on non-synchronized writes. However, if fie/d is declared as a final variable and the Freeze 
instruction is used for M e m b a r 1 ,  Thread 2 would never observe the default value of fie/d if reference is 
initialized.
This illustrates the different strategies used by the two models for preventing premature releases during 
object construction. J M M c r f  treats all fields uniformly and J M M m P only guarantees fully initialized fields 
if they are final or pointed by final fields.
Initially, reference = =  fie/d = =  0
Thread 1 Thread 2
fie/d =  1; 
M e m b a r 1 ;  
reference =  1;
r1 =  reference; 
M e m b a r 2  
r2 =  fie/d;
Finally,
can it result in r1 = =  1 and r2 = =  0?
Figure 9: Constructor Test
6.4 Subtle M istakes in J M M Mp
Using our verification approach, several subtle yet critical specification mistakes in J M M m P are revealed.
6.4.1 N o n - A t o m i c  Volatile Writes
O n e  of the proposed requirements for non-atomic volatile write semantics is that if a thread t has observed 
the n e w  value of a volatile write, it can no longer observe the previous value. In order to implement this 
requirement, a special flag readThisVo/ati/et^w,infot) is initialized to fa/se in initVolatileWrite [3, Figure
14]. W h e n  the n e w  volatile value is observed in readVolatile, this flag should be set to true to prevent the 
previous value from being observed again by the same thread. However, this critical step is missing and 
the flag is never set to true in the original proposal. This mistake causes inconsistency between the formal 
specification and the intended goal.
16
A  design flaw for final variable semantics has also been discovered. This is about a corner case in the 
constructor that initializes a final variable. T he scenario is illustrated in Figure 10. After the final field a is 
initialized, it is read by a local variable in the same constructor. T he readFinal definition [3, Figure 15] would 
allow r to read back the default value of a. This is because at that time a has not been “synchronized” to 
be k n o w n  to the object that it has been frozen. But the readFinal action only checks that information from 
the kF set which is associated with the object reference. This scenario compromises the program correctness 
because data dependency is violated.
class foo {
f in a l  int a;
public fo o ( )  {  
int r ;  
a = 1; 
r  = a;
// can r  == 0?
}
}
Figure 10: Flaw in final variable semantics
6.4.2 F inal Semantics
7  C o n c l u s i o n s
As discussed in earlier sections, the importance of a clear and formal J M M  specification is being increasingly 
realized. In this paper we  have presented a uniform specification framework for language level m e m o r y  
models. This permits us to conduct formal analysis and pave the w a y  towards future studies on compiler 
optimization techniques in a multithreaded environment. Comparing to traditional specification frameworks, 
U M M  has several noticeable advantages.
1. It provides strong support for formal verification. This is accomplished by using an operational ap­
proach to describe m e m o r y  activities, enabling the transition system to be easily integrated with a 
model checking tool. Formal methods can help one to better understand the subtleties of the model by 
detecting some corner cases which would be very hard to find through traditional simulation techniques. 
Because the specification is executable, the m e m o r y  model can be provided to the users as a “black 
box” and the users are not necessarily required to understand all the details of the m e m o r y  model. In 
addition, the mathematical rules in the transition table makes the specification more rigorous, which 
eliminates any ambiguities.
2. U M M  addresses the special need from a language level m e m o r y  model by reducing the gap between 
m e m o r y  semantics and program semantics. This enables one to study the m e m o r y  model implications 
in the context of data flow analysis. It offers the programmers, compiler writers, and hardware designers 
an end-to-end view of the m e m o r y  consistency requirement.
3. T he model is flexible enough to enforce most desired m e m o r y  properties. M a n y  existing m e m o r y  
models are specified in different notations and styles. This is due to the fact that the specification 
is often influenced by the actual architecture of its implementation and there lacks a uniform system 
that is flexible enough to describe all properties in a shared m e m o r y  system. In U M M ,  any completed 
instructions that m a y  have any future visibility effects are stored in the global instruction buffer along 
with the time stamps of their occurrence. This allows one to plug in different selection algorithms to 
observe the state. In a contrast to most processor level m e m o r y  models that use a fixed size main 
memory, U M M  applies a global instruction buffer whose size m a y  be increased if necessary, which is 
needed to specify relaxed m e m o r y  models that require to keep a trace of multiple writes on a variable.
17
T he abstraction mechanism in U M M  provides a feasible c o m m o n  design interface for any executable 
m e m o r y  model with all the internal data structures and implementation details encapsulated from the 
user. Different ordering rules and observability rules can be carefully developed in order to enable 
a user to select from a “m e n u” of m e m o r y  properties to assemble a desired formal m e m o r y  model 
specification.
4. T he architecture of U M M  is very simple and intuitive. T he devices applied in U M M ,  such as instruction 
buffers and arrays, are standard data structures that are easy for one to understand. Similar notations 
have been used in processor m e m o r y  model descriptions [16] [23], making this model intuitive to 
hardware designers. S o m e  traditional frameworks use multiple copies of the shared m e m o r y  modules 
to represent non-atomic operations [11]. In U M M ,  these multiple modules are combined into a single 
global buffer which substantially simplifies state transitions.
O u r  approach also has some limitations. Based on the Model Checking techniques, it is exposed to the 
state explosion problem. Effective abstraction and slicing techniques need to be applied in order to use U M M  
to verify commercial multithreaded Java programs. Also, our U M M  prototype is still under development. 
T he optimal definition for final variables needs to be identified and specified.
A  reliable specification framework m a y  lead to m a n y  interesting future works. Currently people need to 
develop the test programs by hand to conduct verifications. To automate this process, programming pattern 
annotation and recognition techniques can play an important role.
Traditional compilation techniques can be systematically analyzed for J M M  compliance. In addition, 
the U M M  framework enables one to explore n e w  optimization opportunities allowed by the relaxed m e m o r y  
consistency requirement.
Architectural m e m o r y  models can also be specified in U M M .  Under the same framework, m e m o r y  model 
refinement analysis can be performed to aid the development of efficient J V M  implementations.
Finally, we plan to apply U M M  to study the various proposals to be put forth by the Java working 
group in their currently active discussions regarding Java shared m e m o r y  semantics standardization. The 
availability of a formal analysis tool during language standardization will provide the ability to evaluate 
various proposals and foresee pitfalls.
Acknowledgments
W e  sincerely thank all contributors to the J M M  mailing list for their insightful and inspiring discussions for 
improving the Java M e m o r y  Model.
R e f e r e n c e s
[1] James Gosling, Bill Joy, and G u y  Steele. The Java Language Specification, chapter 17. Addison-Wesley, 
1996.
[2] William Pugh. Fixing the Java M e m o r y  Model. In Java Grande, pages 89-98, 1999.
[3] Jeremy M a n s o n  and William Pugh. Semantics of multithreaded Java. Technical report, U M I A C S - T R -  
2001-09.
[4] Jan-Willem Maessen, Arvind, and Xiaowei Shen. Improving the Java M e m o r y  Model using C R F .  In 
OOPSLA, pages 1-12, October 2000.
[5] Java Specification Request (JSR) 133: Java M e m o r y  Model and Thread Specification Revision. 
http://jcp.org/jsr/detail/133.jsp.
[6] T he Java M e m o r y  Model mailing list. 
http://www.cs.umd. edu /'pugh/j ava/memoryModel /archive.
[7] X. Shen, Arvind, and L. Rudolph. Commit-Reconcile &  Fences (CRF): A  N e w  M e m o r y  Model for 
Architects and Compiler Writers. In the 26th International Symposium On Computer Architecture, 
Atlanta, Georgia, M a y  1999.
18
[8] Leslie Lamport. H o w  to m a k e  a multiprocessor computer that correctly executes multiprocess programs. 
IEEE Transactions on Computers, C-28(9):690-691, 1979.
[9] S. V. Adve and K. Gharachorloo. Shared m e m o r y  consistency models: A  tutorial. IEEE Computer, 
29(12):66-76, 1996.
10] Pete Keleher, Alan L. Cox, and Willy Zwaenepoel. Lazy release consistency for software distributed 
shared memory. In the 19th International Symposium of Computer Architecture, pages 13-21, M a y  1992.
11] Kourosh Gharachorloo. M e m o r y  consistency models for shared-memory multiprocessors. Technical 
report, CSL-TR-95-685.
12] G u a n g  G a o  and Vivek Sarkar. Location consistency - a n e w  m e m o r y  model and cache consistency 
protocol. Technical report, 16, C A P S L ,  University of Delaware, February, 1998.
13] William W .  Collier. Reasoning about Parallel Architectures. Prentice-Hall, 1992.
14] Kourosh Gharachorloo, Sarita V. Adve, A n o o p  Gupta, John L. Hennessy, and M a r k  D. Hill. Specifying 
system requirements for m e m o r y  consistency models. Technical report, CSL-TR93-594.
15] D. Dill, S. Park, and A. Nowatzyk. Formal specification of abstract m e m o r y  models. In the 1993 
Symposium for Research on Integrated Systems, pages 38-52, M arch 1993.
16] D. Weaver and T. Germond. The SPARC Architecture Manual Version 9. Prentice Hall, 1994.
17] Seungjoon Park and David L. Dill. A n  executable specification and verifier for Relaxed M e m o r y  Order. 
IEEE Transactions on Computers, 48(2):227-235, 1999.
18] Yue Yang, Ganesh Gopalakrishnan, and Gary Lindstrom. Analyzing the C R F  Java M e m o r y  Model. In 
the 8th Asia-Pacific Software Engineering Conference, pages 21-28, 2001.
19] Abhik Roychoudhury and Tulika Mitra. Specifying multithreaded Java semantics for program verifica­
tion. In International Conference on Software Engineering, 2002.
20] A. Gontmakher and A. Schuster. Java consistency: Non-operational characterizations for Java m e m o r y  
behavior. In the Workshop on Java for High-Performance Computing, Rhodes, June 1999.
21] Philip Bishop and Nigel Warren. Java in Pratice: Design Styles and Idioms for Effective Java, chapter 9. 
Addison-Wesley, 1999.
22] Jeremy M a n s o n  and William Pugh. Core semantics of multithreaded Java. In ACM Java Grande 
Conference, June 2001.
23] R o b  Gerth. Introduction to sequential consistency and the lazy caching algorithm. Distributed Com­
puting, 1995.
24] David Dill. T he Mur^> verification system. In 8th International Conference on Computer Aided Verifi­
cation, pages 390-393, 1996.
19
A p p e n d i x :  E q u i v a l e n c e  P r o o f
Let the multithreaded Java semantics specified in Section 5 be referred as J M M u m m . W e  present the 
equivalence proof between J M M u M M  and J M M M P  based on our straightforward implementation using the 
bypassing table shown in Table 2. For the sake of brevity, we only outline the equivalence proof for the core 
subset of the m e m o r y  model including instructions ReadNormal, WriteNormal, Lock, and Unlock.
J M M u M M  and J M M M P  are equivalent if and only if the execution traces allowed by both models are the 
same. This is proven with two lemmas. W e  first demonstrate that both models impose the same ordering 
restrictions for issuing instructions within each thread. W e  then prove that the legal values resulted from 
the ReadNormal instruction in J M M u M M  is both sound and complete with respect to J M M M P .
L e m m a  1 Instructions in each thread are issued under the same ordering rules by JMMumm and JMMm p .
Since prescient writes are the only ordering relaxation explicitly allowed by both models, it is sufficient 
to prove that the ordering requirement on prescient writes are the same.
1. Soundness of J M M u m m : let w be any WriteNormal instruction allowed by J M M u m m , we  show that 
it must satisfy the conditions in J M M M P , which is enforced by the assertion w £ previousReadst in 
performWrite. There are only two ways to add w to previousReadst.
(a) w is read from the same thread before where w would normally occur.
This can not happen in J M M u m m . Because a write instruction w can not bypass a previous 
read instruction r issued by the same thread if they operate on the same variable, r would never 
observe a later write instruction from the same thread.
(b) w is added to previousReadst' by another thread t' and then synchronized to thread t (r ) before r 
is issued.
To m a k e  this happen, there must exists an acquire operation in t ( r ) that happens between where 
w is issued and where w would normally occur. This is not allowed in J M M u m m  since w is not 
allowed to bypass a previous acquire operation.
2. Completeness of J M M u m m : let w be any normal write instruction allowed by J M M M P , we  prove it is 
also permitted by J M M u M M .
Assume w is prohibited by J M M u m m . According to conditions of the WriteNormal instruction in 
the transition table Table 1, w can only be prohibited w h e n  ready(w) =  false. Therefore, w must 
have bypassed a previous instruction prohibited by the B Y P A S S  table. T he only reordering that is 
forbidden for a normal write instruction is the bypassing of a previous lock instruction or a previous 
read instruction operated on the same variable. T he former case is prohibited in J M M M P  by the mutual 
exclusion requirement of a lock instruction. T he latter case is also forbidden by J M M M P  because the 
assertion in performWrite would have failed if a readNormal instruction were allowed to obtain a value 
from a later write instruction in the same thread.
L e m m a  2 The normal read instructions from both models generate the same legal results.
1. Soundness of J M M u M M : if w is a legal result for a read instruction r under J M M u M M , w is also legal 
in J M M m p .
W e  prove that w satisfies all requirements according to the definition of the readNormal operation 
defined in J M M M P :
readNormal(Variable v) Choose (v,w,g) from allWrites(v)
— uncommittedt — overwrittent 
previousReadst +  =  (v,w,g) 
return w
20
(a) The result is from allWrites(v)
This requirement is guaranteed by var(w) =  var(r) and op(w) =  WriteNormal in condition 
/ega/Norma/Write(r, w).
(b) w  G uncommitteddt
A s sume w  G uncommittedt. To m a k e  this happen, w  must be a write instruction that follows r 
in program order but is observed by r. This is prohibited by J M M u m m  because w  is not allowed 
to bypass r in this situation.
(c) w  / overwritten\t
A s sume w  G overwritten. w  can only be added to overwritten in two ways.
i. A write w', which is on the same variable and from the same thread, is performed with its 
corresponding performWrite operation. And w  G previoust at that time.
In order to have w  exist in previoust w h e n  w' is performed, w' must be performed after w  is 
performed. Therefore, w  is not the most recent write, which is illegal according to condition 
/ega/Norma/Write(r, w).
ii. w  is added to overwrittennt by another thread t' and later acquired by t(r) via synchronization.
w  can only be added to overwritten^ by thread t' w h e n  t' performs another write w' on 
the same variable and w  has been added to previoust/ at that time. Furthermore, this must 
occur before the release operation issued in t' which is eventually acquired by t. There­
fore, LCOrder(w',w) =  true and LCOrder(r, w') =  true, which is prohibited in function 
/ega/Norma/Write(r, w ) by J M M u m m .
2. Completeness of J M M u m m : if w  is a legal result for a read instruction r in J M M m P , w  is also legal in 
J M M u m m  .
Assume w  is prohibited by J M M u m m . According to the conditions for the readNormal event in Table 1, 
one of the following reasons must be true.
(a) ready(r) =  fa/se
This indicates that there exists at least one pending instruction i in the same thread such that i 
precedes r. Because J M M m P  does not issue a read instruction out of program order, this scenario 
would not occur in J M M m P  either.
(b) /ega/Norma/Write(r, w) =  fa/se
Condition /ega/Norma/Write(r, w ) only fails w h e n  w  is not the most recent previous write on 
the same variable in a path of a sequence of partially ordered writes according to LCOrder. This 
is also forbidden by J M M m P .
21
