Global Clock, Physical Time Order and Pending Period Analysis in
  Multiprocessor Systems by Chen, Yunji et al.
ar
X
iv
:0
90
3.
49
61
v2
  [
cs
.D
C]
  1
2 J
ul 
20
09
Global Clock, Physical Time Order and Pending
Period Analysis in Multiprocessor Systems
Yunji CHEN
Institute of Computing Technology, Chinese Academy of Sciences
Tianshi CHEN
School of Computer Science and Technology, University of Science and Technology of China
Weiwu HU
Institute of Computing Technology, Chinese Academy of Sciences
In multiprocessor systems, various problems were treated with Lamport’s logical clock and the
resultant logical time orders between operations. However, one often needs to face the high
complexities caused by the lack of logical time order information in practice.
In this paper, the so-called physical time order is proposed based on the global clock in multi-
processor systems. Concretely, we first utilize the global clock to infuse the pending period to each
operation in a multiprocessor system, where the pending period is a time interval in which the
operation starts and ends. Afterwards, we define the physical time order for any pair of operations
with disjoint pending periods. The physical time order is an underlying characteristic of any real
execution in multiprocessor systems due to that it is part of the truly-happened orders obeying
real physical time. Formally, the physical time order is proven to be independent and consistent
with traditional logical time orders.
The above novel yet fundamental concepts enables new effective approaches for analyzing mul-
tiprocessor systems, which are named pending period analysis as a whole. As a consequence of
pending period analysis, many important problems of multiprocessor systems can be tackled effec-
tively. As a significant application example, complete memory consistency verification, which was
known as an NP-hard problem, can now be solved with the complexity of O(n2Cpp) by utilizing
physical time order information (where n and p are the number of operations and processors re-
spectively, C is some constant). Moreover, two event ordering problems, which were proven to be
Co-NP-Hard and NP-hard respectively, can both be solved with the time complexity of O(nCpp)
if restricted by pending period information.
Categories and Subject Descriptors: C.1.4 [Parallel Architectures]: General; F.1.2 [Modes of Computation]:
Parallelism and concurrency
General Terms: Parallel, order, clock, multiprocessor system, memory consistency
Additional Key Words and Phrases: Physical time order, verification, physical time order, pending
period
1. INTRODUCTION
1.1 Motivation and Related Work
Many theoretical investigations of multiprocessor systems treat processors as distributed
spatially. In these investigations, Lamport’s logical clock [27], which is known as a cor-
nerstone in the parallel and distributed computing areas, is often utilized to partially order
Corresponding Address: Institute of Computing Technology, Chinese Academy of Sciences, P.O. Box 2704-25,
Beijing 100190, P. R. China.
Email: cyj@ict.ac.cn (Yunji Chen)
This paper is partially supported by the National High Technology Development 863 Program of China under
Grants No.2007AA01Z112 and No.2008AA110901, the National Grand Fundamental Research 973 Program of
China under Grant No.2005CB321600, and the National Natural Science Foundation of China under grant No.
60533020.
PREPRINT. July 12, 2009.
2 · Yunji Chen et al.
operations in multiprocessor systems. Given a pair of operations obeying some logical
time order obtained by logical clock (such as processor order, execution order and so on),
the logically later operation should observe the effort of the logically earlier operation.
Once given the logical time orders between all pairs of conflicting operations in a system,
the final execution result of the system has been determined. Thus the logical order infor-
mation, if perfect, can reveal some intrinsic features of parallel and distributed computing
without a global lock which was considered hard-to-achieve. However, in real multipro-
cessor systems, it is often the case that the logical order information is far from perfect.
That is because observing the logical order information of all operations, especially the
low-level operations such as load, store, and synchronization instructions, is often imprac-
tical in real systems. If purely replying on fragmentary logical time order information,
one have to infer or conjecture the orders between a large number of conflicting operations
[35; 13]. As a consequence, many application problems in multiprocessor systems (e.g.,
memory consistency verification [12; 13; 14], event ordering [35], and so on) suffer from
the resultant high computational costs, and thus are hard to solve.
During the past decade of years, driving by the development of integrated circuit pro-
cess, SMP (Symmetric Multi Processors) and CMP (Chip Multi Processor) techniques, the
density of computing capacity of multiprocessor systems is fast increasing. The resultant
scaling down of multiprocessor systems rewakes the intuitive idea of utilizing global clock,
and a number of investigations with the consideration of global clock have been proposed.
Herlihy and Wing [19] proposed the concept of linearizability, which requires the accesses
to the same memory location happening in disjoint time intervals with respect to a global
clock, as a correctness condition of memory system. In [42], Singla et al. proposed a
temporal memory model “delta consistency” to offer time window to coalesce write op-
erations to the same memory location. In [38; 43], the global counters, which implicitly
represent the global time, were employed to reason about the ordering of transactions in
transactional memory [20]. The common idea behind the above investigations is to obtain
logical order information (especially execution order about the same memory location) by
explicitly or implicitly employing a global clock.
Nevertheless, the implication of a global clock is far more than providing some com-
plementary logical order information. A notable fact is that, in a multiprocessor system,
the global clock actually provides a physical-time-based partial ordering containing all
happened operations in the system, and the ordering is “nearly” a total order: Two opera-
tions are not ordered by this ordering if and only if their precise performed times (the time
when an operation is observed by all processors) on the global clock are exactly the same.
The complementary logical order information obtained through the global clock is only
a part of the above partial ordering, since the logical orders concern merely operations
on the same processor or accessing the same location. In fact, the extra order informa-
tion obtained through the global clock, together with other logical order information, can
produce a transition closure that further extends the acquirable order information. To our
best knowledge, few investigation has concerned the above extra order information. Such
neglect is probably due to the traditional view that “logical time orders are enough to de-
termine the result of an execution, thus concerning extra order information , which does
not change the result of execution, is not necessary”. However, in this paper, it is discov-
ered for the first time that the extra order information is powerful for simplifying many
problems in multiprocessor systems.
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 3
Although we have mentioned the motivation of the paper above, to make better use of
global clock in multiprocessor systems, we will still encounter three concrete difficulties
in practice: First, it is hard to obtain the precise performed time of an operation, even
there is a precise physical global clock. The reason is that the exact performed time of
an operation is correlated with the hard-to-observe inner states of all processors and the
network of the multiprocessor system (e.g., whether an invalidation message has arrived at
some processor). Second, even if one makes some compromise and utilizes a time interval
which includes the precise performed time, instead of the precise performed time itself,
one must face new problems, e.g., how to deal with the overlapped time intervals of logi-
cally ordered operations (noting that most modern processors can execute multi operations
overlappedly)? Third, it may be difficult to observe and record the global time information
for all operations, since in a modern multiprocessor system there are too many operations
performed in every second. In this paper, these difficulties are tackled effectively and effi-
ciently by our approaches.
1.2 Our Contributions
Given a global clock of a multiprocessor system in which the effect of an operation can
always be globally observed in bounded time1, let us consider two bounding time points
for the performed time of each operation, named the start time and the end time. The re-
sultant time interval from the start time to end time, which includes the performed time,
is called the pending period. As a relaxation of the performed time, the pending period
is easier to obtain in comparison with the precise performed time, which will be shown in
Section 3.1 in detail. It is worth noting that the concept of pending period is pervasive:
The pending periods of two operations in the same processor can be overlapped, which
enables the instruction-level parallelism (ILP) [25] adopted by most modern processors;
the pending periods of two operations accessing a same memory location can also be over-
lapped, which leaves a space of overlapping memory accesses for efficiency optimization.
Thus the concept of pending period can be adopted in most memory models, from strong
memory consistency models (such as linearizability [19], sequential consistency [28; 40]),
to weak memory consistency models (such as weak consistency [10], release consistency
[26]).
On the other hand, once any two operations have disjoint pending periods (even they
relate to neither the same processor nor the same memory location), there is a physical
time order between the two operations. In Section 2.2, it is proven that physical time
order is independent and consistent with existing logical time orders. This is not surprising
since the physical time order is part of a truly-happened physical-time-based ordering with
respect to the global clock. Thus the physical time order is a natural order which must be
obeyed by any real execution in multiprocessor systems.
On the basis of global clock, pending period and physical time order, we introduce some
effective approaches for tackling problems in the context of pending period, which are
named pending period analysis as a whole. These approaches attempt to utilize the infor-
mation brought by global clock to the full extent, involving but not limited to the traditional
logical time order information. The first approach, assignment analysis, aims at assigning
values for pending periods of all operations when the pending periods of only part of op-
1For the sake of brevity, when we are talking about a multiprocessor system, we imply that the precondition does
hold.
PREPRINT. July 12, 2009.
4 · Yunji Chen et al.
erations can be observed directly. This approach can effectively handle the difficulty of
observing and recording an immoderate amount of operations by inferring pending peri-
ods for part of operations. The second approach called frontier analysis distinguishes the
operations with overlapped pending periods from those with disjoint pending periods, and
manage to prune the frontier graph [13]. As a consequence, the maximal number of nodes
in a frontier graph is significantly reduced from O((n/p)p) to O(nCp), while the maximal
number of edges is reduced from O((n/p)p+1) to O(nCpp). Noting that many problems
in multiprocessor systems can come down to graphic problems on the frontier graph, fron-
tier analysis may be applicable to reduce the time complexities of these problems. The
third approach, order analysis, further explores orders beyond the physical time order. The
order analysis aims at characterizing the so called time global order, which is the transi-
tion closure of the physical and logical time orders. A result of order analysis is that any
cycle, in the TGO execution graph representing the time global orders in a system, can be
localized to involve merely O(p) operations. This conclusion is important to guarantee the
correctness of a multiprocessor system.
One established example for validating the effectiveness of physical time order and
pending period analysis is the well-known memory consistency verification problem, which
was known as an NP-hard problem [12; 13]. With the concept and approaches developed
in this paper, the problem can be solved with the time complexity of O(n2Cpp) in the con-
text of pending period. This method has been employed in validation of an industrial CMP
[6; 23]. Additional examples are the event ordering problems which investigate the pos-
sible orders between pairs of operations. The investigated problems have been proven to
be co-NP and NP respectively [35]. However, if these problems are restricted by physical
time order, then they can be solved with the time complexity of O(nCpp). The successful
applications of our approach demonstrates that the global clock, physical time order and
pending period analysis are effective and efficient in tackling various problems in multi-
processor systems, especially those problems relating to frontier graph.
The contributions of this paper can be summarized as follows: First, we provide a novel
notion for global clock in multiprocessor systems, showing that the physical-time-based
partial ordering information exported from the global clock, though has been neglected in
some sense, is very powerful in tackling many problems in multiprocessor systems. Sec-
ond, the proposed physical time order has established a novel, natural and fundamental
concept for multiprocessor systems, which is independent but consistent with the tradi-
tional logical time orders. Third, a set of approaches, called pending period analysis as a
whole, are developed and have been successfully used in two types of well-known applica-
tion problems in multiprocessor systems, one of which has been employed in industry. The
resultant new solutions for the application problems have made significant improvements
in comparison with the previous results.
The terminology used in the rest of this paper is introduced in Table I. The rest of the
paper is organized as follows: Section 2 provides the definition of physical time order in
multiprocessor systems. Section 3 presents the approaches belonging to pending period
analysis. Section 4 introduces two related applications. Section 5 concludes the whole
paper.
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 5
S Multiprocessor system
u, v (with a subscript) Operation
w (with a subscript) Write operation
r (with a subscript) Read operation
O (with a subscript) Set of operations
P (with a subscript) Process / processor
P (with a subscript) Path on a graph
P Set of paths on a graph
C Set of cycles on a graph
C Cycle on a graph
f Frontier
n Number of operations
p Number of processes (processors)
C Constant
E Execution order
P Program order
PO Processor order
GO Global order
T Time order
TGO Time global order
Table I. Notations in this paper.
2. PHYSICAL TIME ORDER IN MULTIPROCESSOR SYSTEMS
Lamport’s logical clock [27] enables partially ordering the events (operations) occurring
at different processes (processors) of a distributed system. During the past thirty years,
his logical “happened-before” order has generated different types of logical time orders
for multiprocessor systems, such as processor order, execution order and so on. Briefly, if
we are able to know all the “happened-before” orders (i.e., logical time orders) between
operations, the execution of these operations, which can be represented by a DAG (Directed
Acyclic Graph) named execution graph, is then determined. However, in practice it is
often the case that we can only acquire part of the logical time orders between operations.
Hence, if purely replying on the logical clock, one may need to infer and conjecture the
orders between pairs of operations, which may be inherent with enormous search spaces.
As a result, there might be an intractably large number of candidate executions that do not
violate our obtainable information, which may be not only incorrect but also difficult to
distinguish the incorrectness.
An intuitive idea of tackling the above situation is to rewake the idea of global clock.
Probably, the simplest way of utilizing global clock is to obtain the precise performed time
of all operations. However, due to that observing the precise performed time of one opera-
tion may involve many components in the system, this ideal approach is rather impractical.
An alternative choice is to relax the precise performed time to the physical time interval
(i.e., the pending period) which includes the performed time. In this way, although the
compromise may eliminate some order information implied by global clock, partially allo-
cating and utilizing the natural physical “happened-before” order implied by the physical
time become possible. Such an order is called physical time order.
PREPRINT. July 12, 2009.
6 · Yunji Chen et al.
In this section, we will first provide a brief introduction to traditional logical time orders.
After that, we will provide more detailed explanations for pending period and physical
time order, including the corresponding definitions. Finally, the relationship between the
physical and logical time orders is studied theoretically.
2.1 Logical Time Orders in Multiprocessor Systems
In this subsection, we briefly introduce some traditional logical time orders in multiproces-
sor systems, which are based on logical clocks. Concretely, let us review the definitions of
program order, processor order and execution order, which are three well-known types of
logical time orders in multiprocessor systems.
Definition 2.1 (Program Order). Given two different operations u1 and u2 in the same
processor, we say that u1 is before u2 in program order iff u1 is before u2 in the program.
We denote this as u1
P
−→ u2.
Definition 2.2 (Processor Order). Given two different operationsu1 and u2 in the same
processor, we say that u1 is before u2 in processor order iff there is global agreement that
u1 is before u2 for all processors. We denote this as u1
PO
−−→ u2.
In multiprocessor systems, two operations are conflicting if they access the same mem-
ory location and at least one of them is a store operation [41]. The execution order specifies
the order between two consequent conflicting operations [21].
Definition 2.3 (Execution Order). We say that a write operation w is before operation
u in execution order iff w is the latest write operation before u that accesses the same
memory location as u. We denote this as w E−→ u. We say that a write operation w is after
operation u in execution order iff w is the first write operation after u that accesses the
same memory location as u. We denote this as u E−→ w.
In addition, the transitive closure of processor and execution orders is known as the
global order:
Definition 2.4 (Global Order). We say that operationu1 is before operation u2 in global
order iff u1 is before u2 in processor order, or u1 is before u2 in execution order, or u1 is
before some operation u in global order and u is before u2 in global order. Formally,
(u1
GO
−−→ u2)→
(
(u1
PO
−−→ u2) ∨ (u1
E
−→ u2) ∨ (∃u ∈ O : u1
GO
−−→ u
GO
−−→ u2)
)
.
So far we have already introduced a number of traditional logical time orders in multi-
processor systems. All the above orders are based on Lamport’s “happened-before” logical
relation [27]: the logically former operation is before the logically latter operation in log-
ical time order. In practice, it is often the case that only parts of those relations can be
observed directly, especially the execution orders [17]. For example, to observe the write-
write execution orders, one may add specific hardware to cache coherence maintainer [34].
Even if we can infer some hard-to-observe orders based on known orders, the number of
candidate executions for parallel programs may still be intractably large, since the case in
which we can infer all logical time orders is rare. To cope with this problem by exploit-
ing more information about the relations between operations, we propose the so-called
physical time order in the next subsection.
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 7
2.2 Physical Time Order
In a von Neumann architecture, an operation must be fetched into the processor before it is
executed, hence there is a start time for any operation. The above fact implies that, before
an operation starts, it cannot affect other operations. On the other hand, an operation can
be globally observed before it ends at some bounded time point, given some appropriate
hardware (e.g., store atomicity [1]) or supplementary software (e.g., broadcasting). There-
fore, given the global clock, we can assign a start time and an end time to each operation
in a multiprocessor system. Compared with the performed time which involves the states
of all processors, all caches and the network in the whole multiprocessor system, the start
time and the end time are easier to obtain because of their locality and flexibility.
Based on the start time and the end time of an operation, we provide the following
definition of pending period.
Definition 2.5 (Pending Period). The pending period of u is the period from ts(u) to
te(u). We say that an operation v is in the pending period of operation u iff the pending
periods of the two operations are overlapped.
The concrete methods of observing start and end times are architecture-dependent. Hence,
there are different observations and implementations for pending period and the related
supports. However, the following essential idea of pending period is invariable: The per-
formed time of an operation –i.e., the time when the operation is performed globally– must
be in its pending period. Therefore, regardless of the concrete definitions of start time
and end time, a partial order exists between two operations executing in disjoint pending
periods. We call the partial order physical time order.
Definition 2.6 (Physical Time Order). Given two operations u and v, if the following
(1) the performed times of u and v are in their pending periods respectively;
(2) the end time of operation u is before the start time of operation v
both hold, then we say that u is before v in physical time order. Formally,
(
ts(u) ≤ tp(u) ≤ te(u)
)
∧
(
ts(v) ≤ tp(v) ≤ te(v)
)
∧
(
te(u) < ts(v)
)
↔ (u
T
−→ v). (1)
As illustrated in Figure 2.2a, the pending periods of u and v are disjoint, thus there is
physical time order between u and v, i.e., u T−→ v. In Figure 2.2b, the pending periods of u
and v are overlapped. Hence, either the performed time of v is before the performed time
u, or the performed time of u is before the performed time v are possible. Therefore, in
such a case there is no physical time order between u and v. That is, if operation v is in
the pending period of operation u, then ¬(u T−→ v ∨ v T−→ u) holds according to the above
definition.
Notably, the physical time order between u and v does not require that the two operations
are executing in the same processor or accessing the same memory location. Instead, it
simply depends on the pending periods of u and v resulted from the physical time given by
the global clock.
2.3 Relationship between Physical and Logical Time Orders
In this subsection, we discuss the relationship between physical time order and traditional
logical time orders. Fundamentally, a remarkable difference between the former and the
PREPRINT. July 12, 2009.
8 · Yunji Chen et al.
ts(v)ts(u) te(u) te(v)
ts(v)ts(u) te(v)
(a) disjoint pending periods of u and v
(b) overlapped pending periods of u and v
te(u)
Fig. 1. Time points of operation u and v on time axis (left side is earlier), where ts(·) and te(·) represent start
time and end time respectively.
latter is that, physical time order is based on physical global clock while logical time orders
are based on logical clock. Due to the above difference, it is intuitive that physical time
order is independent with logical time order:
THEOREM 2.7 TIME ORDER INDEPENDENCY THEOREM. The physical time order and
logical time order are independent with each other.
Proof. First, let us consider a multiprocessor system in which the start times and the end
times of all operations are 0 and ∞ respectively. In such a system, there is no physical
time order between any pair of operations. However, there can be some logical time order
between operations. Therefore, the physical time order does not contain any logical time
order.
Second, let us consider a multiprocessor system in which all operations have nontrivial
assignments of pending periods, and access distinct memory locations. In such a system,
there is no logical time order between any two operations in different processors, while
there can be some physical time order between operations in different processors. There-
fore, any logical time order does not contain the physical time order. 
Another issue concerns the following question: Whether the physical time order and
logical time order contradict with each other? As we know, the physical time order between
two operations implies that one operation physically happens before the other operation,
while the logical time order between two operations implies that one operation logically
happens before the other operation. Intuitively, if there is no bug in the multiprocessor
system, the logical clock should comply with the physical global clock.
To carry out the investigation, we need a common and natural criteria for comparing
physical time order with logical time orders. A solution is to use the physical time points
given by global clock to characterize all the above orders, since physical time order is based
on the start and end time points, and logical time order can be represented as the relation
between the performed time points. Following this idea, we list all formulae linking the
orders to physical time points as follows:
(1) The performed time of operation u is between the start time and the end time of u:
ts(u) ≤ tp(u) ≤ te(u);
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 9
(2) If u is before v in physical time order, then the end time of u is before the start time of
v:
(u
T
−→ v)→
(
te(u) < ts(v)
)
;
(3) If u is before v in processor order, then the performed time of u is before the performed
time of v:
(u
PO
−−→ v)→
(
tp(u) < tp(v)
)
;
(4) If u is before v in execution order, then the performed time of u is before the performed
time of v:
(u
E
−→ v)→
(
tp(u) < tp(v)
)
;
Based on the above formulae, we can prove the following theorem.
THEOREM 2.8 TIME ORDER CONSISTENCY THEOREM. The physical time order is con-
sistent with the global order, i.e., the transition cloture of the processor and execution
orders. Formally,
(v
T
−→ u)→ ¬(u
GO
−−→ v). (2)
Proof. According to Definition 2.6, the above theorem is equivalent to (u GO−−→ v) →(
ts(u) < te(v)
)
. Since (u PO−−→ v) →
(
tp(u) < tp(v)
)
and (u E−→ v) →
(
tp(u) < tp(v)
)
both hold, by transitivity of partial order we obtain that (u GO−−→ v) →
(
tp(u) < tp(v)
)
.
Since ts(u) ≤ tp(u) and tp(u) ≤ te(u), (u
GO
−−→ v)→
(
ts(u) < te(v)
)
holds. 
The independency and consistency between the physical time order and traditional log-
ical time orders demonstrate that the former is novel yet natural. In the rest part of the
paper, we will show that physical time order, together with our forthcoming approaches
named pending period analysis, are powerful for tackling many problems in multiproces-
sor systems.
3. PENDING PERIOD ANALYSIS
So far we have introduced the concepts of pending period and physical time order in the
context of global clock. In this section, three concrete approaches based on the aforemen-
tioned concepts are proposed to analyze the problems in multiprocessor systems. These
approaches are named pending period analysis as a whole, since they concentrate on dif-
feren aspects of pending periods.
The first approach assignment analysis aims at obtaining the pending periods of opera-
tions. As we have mentioned, the physical time order does exist between two operations
only if the pending periods of the two operations are disjoint, thus knowing the pending
periods of operations is essential to obtaining the potential physical time order between
operations. Moreover, as a new restriction to multiprocessor systems, pending period may
forbid many candidate executions of parallel programs which violate the ordering imposed
by a global clock. Accordingly, the second approach, named frontier analysis, aims at
pruning the search space of candidate executions in the context of physical time orders
and pending periods. Finally, suppose we have already known the pending periods of all
operations by some approach, then there are probably some operations between which no
PREPRINT. July 12, 2009.
10 · Yunji Chen et al.
physical time order holds in response to their overlapped pending periods. The third ap-
proach, order analysis, aims at tackling the undiscovered order for the operations with
overlapped pending periods. In this section, the above approaches will be introduced in
detail.
3.1 Assignment Analysis: Inferring Pending Periods
To obtain all the physical time orders between operations, one must know the pending
periods (determined by the start and end times) of all operations in a multiprocessor system.
As we have mentioned in Section 2.2, there are different ways to observe pending period for
facilitation in different systems. For example, some dedicated registers are added to each
processor in Godson-3 [6] to observe the bounding time points of instructions (low-level
operations), including program counter/start time pairs of the last started operation and
program counter/end time pairs of the last ended operation [6]. Purely software method is
also possible to observe time points especially for high-level operations, e.g., employing
some global memory address as time counter.
On the other hand, in many cases, it is hard to observe directly the pending periods of all
operations if there are too many operations (especially low-level operations such as load,
store, and synchronization instructions). Furthermore, even one can observe all the pending
period information, there may also be difficult to record all pending period information in
many systems if the speed of generating pending period information is faster than the speed
of recording pending period information.
To cope the difficulty of obtaining and recording pending period information, the as-
signment analysis is proposed to make a compromise by observing the pending periods for
only part of operations, and inferring the pending periods for the rest operations according
to the following rule:
Inferred Pending Period Rule2: In a multiprocessor system, given an operation set
O = {u1, u2, . . . , un}, and an observed operation subsetOobs = {uobs1 , uobs2 , . . . , uobsm}
(1 ≤ obs1 < obs2 < · · · < obsm ≤ n), if the pending period [ts(uobsj ), te(uobsj )] has
been observed for each operation uobsj ∈ Oobs, then the pending period for any operation
u ∈ O \Oobs, denoted by [ts(u), te(u)], can be inferred as follows:
ts(u) = ts(precobs(u))
te(u) = te(succobs(u))
where precobs(u) and succobs(u) are the last observed operation before u in processor
order and the first observed operation after u in processor order respectively3. Formally,
(precobs(u) ∈ Oobs) ∧ (precobs(u)
PO
−−→ u) ∧ ∄j(precobs(u)
PO
−−→ uobsj
PO
−−→ u)
(succobs(u) ∈ Oobs) ∧ (u
PO
−−→ succobs(u)) ∧ ∄j(u
PO
−−→ uobsj
PO
−−→ succobs(u))
We call a pending period of an operation is legal if it contains the performed time of the
operations. It is not difficult to prove the legality brought by inferred pending period rule.
2In some multiprocessor systems supporting some weaker consistency other than sequential consistency, there
may be no total ordering for the operations on a single processor. Therefore, the precobs or succobs of u may be
not unique. However, arbitrary precobs and succobs for u are reasonable to provide a legal assignment of pending
period to u.
3To avoid there is no precobs or succobs for any unobserved operation, we require that each operation without
predecessor or successor operations in processor order should be observed.
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 11
THEOREM 3.1 INFERRED PENDING PERIOD THEOREM. If the observed pending pe-
riod of each observed operation is legal, then the inferred pending period based on the
inferred pending period rule is legal.
Proof. Consider an operation u ∈ O \ Oobs. If the inferred start time of u is 0, then
ts(u) = 0 ≤ tp(u), else ts(u) = ts(precobs(u)) ≤ tp(precobs(u)) < tp(u). Hence,
ts(u) ≤ tp(u) holds. If the inferred end time of u is ∞, then tp(u) < ∞te(u), else
tp(u) < tp(succobs(u)) ≤ te(precobs(u)) = te(u). Hence, tp(u) ≤ te(u) holds. Since the
performed time of u is in the inferred pending period of u, the inferred pending period is
legal. 
u1
u2
u3
u4
u5
u6
u7
u8
ts(u1) = 0
ts(u8) = 100
te(u1) = 20
te(u8) = 200
ts(u2) = 0
ts(u3) = 0
ts(u4) = 0
ts(u5) = 0
ts(u6) = 0
ts(u7) = 0
te(u2) = 200
te(u3) = 200
te(u4) = 200
te(u5) = 200
te(u6) = 200
te(u7) = 200
Fig. 2. Time points of operations u2 to u7 are inferred with u1 and u8.
As shown in Figure 2.2, with merely observed pending periods of u1 and u8, the pending
periods of u2 to u7 can be inferred: their start times are the same with the start time of u1,
their end times are the same with the end time of u8.
Considering n operations, with assignment analysis, one only need to observe directly
the pending period of one operation out of every m operations (for example, one out of
every 100 (i.e., m = 100) operations is observed directly in [6]), the pending periods of
the rest n(m − 1)/m operations can be then inferred. As a result, the inferred pending
period would be looser than the observed pending period. However, this approach can
reduce the difficulty of observing and recording pending period information by inferring
pending periods.
3.2 Frontier Analysis: Pruning Frontier Graph
For multiprocessor systems, it is a natural problem to analyze the candidate executions of
a parallel program. When the logical time order information is “perfect”, i.e., the orders
between any pairs of operation (if there are), are known to us, the above problem is trivial.
However, since it is often the case that we can only observe and infer limited logical time
order information (especially execution order), we may need to consider a number of can-
didate executions with respect to the same program, which do not conflict the obtainable
order information. If the available order information is not enough, some execution may
PREPRINT. July 12, 2009.
12 · Yunji Chen et al.
be actually illegal (violating memory consistency or cache coherence) but it has been con-
sidered as candidate executions. Unfortunately, distinguishing such an illegal execution
from the legal one might be very time-consuming [12; 17]. In this section, we propose
an effective and efficient approach, which is named frontier analysis, to prune the space
of candidate execution for a parallel program. The frontier analysis approach is based on
the aforementioned concept of physical time order and pending period. To present our ap-
proach, we begin with Gibbons and Korach’s notion of frontier graph [13]. Following the
brief introduction to frontier graph, we then show how frontier graph is pruned in terms of
frontier analysis. A corresponding complexity description will also be provided.
Briefly, let us introduce Gibbons and Korach’s concept of frontier graph: Given p pro-
cesses P1, . . . ,Pp, and p sets O1, . . . , Op containing all the operations executing at the
above p processes respectively. A frontier f(u1, . . . , up) is a tuple of p operations, where
∀i ∈ {1, . . . , p}, ui ∈ Oi. Gibbons and Korach [13] further proposed the frontier graph
consisting of all possible frontiers in the system and the directed edges connecting frontiers.
It starts with an starting frontier consisting of p NULL operations, which represents the
situation that no operation has executed at the very beginning. It ends with a terminating
frontier consisting of the terminating operations at the p processes, which represents the sit-
uation that all operations has begun at the very end. If a new operationu′i ∈ Oi happens, the
frontier f(u1, . . . , ui, . . . , up) will be updated to another frontier f ′(u1, . . . , u′i, . . . , up),
and from f to f ′ there is a directed edge in the frontier graph. Therefore, each candidate
execution can be mapped to a path from the starting frontier to the terminating frontier on
the frontier graph.
In multiprocessor systems, a frontier can demonstrate a snapshot of the executing op-
erations in the p processors: the executing operations in the processors P1, . . . ,Pp are
u1, u2, . . . , up respectively. Intuitively, many important problems of multiprocessor sys-
tems, which is inherent with search spaces of candidate executions, can be transformed
to graphic problems related to the frontier graph, such as memory consistency verifica-
tion and event ordering problems. The complexities of solving these problems directly
relate to the size of frontier graph. Unfortunately, according to Gibbons and Korach [13],
there are O((n/p)p) possible frontiers in total, which results in the intractability of many
multiprocessor system problems. In [13], Gibbons and Korach proposed to use additional
information such as read mapping (the mapping from every read to the write sourcing its
value) and total write order (the write order for each memory location totally) to simplify
the traverse on the frontier graph. Intuitively speaking, read mapping and total write or-
der can reduce the number of possible edges connecting to each frontier by specifying the
relations between write and read operations, and it can also reduce the number of reach-
able frontiers. Consequently, the time for finding a path from the starting frontier to the
terminating frontier on the frontier graph is also reduced. However, since read mapping
and total write order may involve all processors in a system, observing and restricting read
mapping and total write order (especially the latter) might be difficult in practice [17].
Instead of relying on some logical information about read mapping and total write or-
der, we infuse the natural information including physical time orders and pending periods
into the frontier graph, and manage to prune the frontier graph in frontier analysis. The
idea of frontier analysis is quite straightforward: Since the operations in the same frontier
are executing overlappedly, an operation in a frontier is in the pending periods of other
operations belonging to the same frontier. Meanwhile, the physical time orders eliminate
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 13
u26 u37
u21
u22
u23
u24
u31
u32
u33
u34
u25
u35
u36
u11
u12
u13
u14
u15
u16
?2 ?3?1
Fig. 3. The operations in the frontier involving operation u23 should be in the pending period of u23.
the frontiers which contain two or more operations with disjoint pending periods. As the
example shown in Figure 3.2, the cloud represents the pending period of operation u23.
Only the shadowed operations in the clouds can appear in the frontiers involving u23, since
the operations out of the pending period of u23 cannot execute overlappedly with u23.
With the above natural property, we can again figure out the number of feasible frontiers
(in the context of physical time order) which contain a specific operation ui in a p-processor
system. A feasible frontier containing ui can be obtained by picking p − 1 operations
from the operations whose pending period is overlapped with that of ui, to fill them to
the corresponding seats of the frontier. Moreover, since each operation in the systems
discussed in this paper can be globally observed in bounded time, we let B be the upper
bound of the lengths of all pending periods. Hence, in one processor, the maximal number
of operations within any pending periods is no larger than cB (where c is a constant, and
we let C = cB), which dose not rely on the number of operations n. Therefore, there
can be at most Cp−1 feasible frontiers involving ui. Meanwhile, since we have at most n
different choices when determining the specific operation ui, the total number of feasible
frontiers in the frontier graph is then significantly reduced from O((n/p)p
)
to O(nCp).
Noting that the number of processors, p, is a constant for a given multiprocessor system,
thus the number of feasible frontiers is actually O(n) for a given system.
Further, the number of edges in the pruned frontier graph can also be calculated. As we
know, there are p operations in a frontier, and we know that within the pending period of a
specific operation there are at most Cp operations in the total p processors. Hence, for any
frontier, there are at most Cp operations that can extend the frontier. Hence, the number of
outcoming edges from each frontier is bounded from above by Cp. Recall that the number
of feasible frontiers is at most O(nCp−1), the total number of edges in the frontier graph
PREPRINT. July 12, 2009.
14 · Yunji Chen et al.
is O(nCpp) in the context of pending periods and physical time orders. Similar to our
discussions in the last paragraph, the above asymptotic order is in fact O(n) for a given
system.
To sum up, with pending period information, frontier analysis enables one to deal with
the pruned frontier graph, which has only linear numbers of nodes and edges with respect
to the number of operations. Many problems relating to the frontier graph can thus be
simplified. A crucial characteristic of frontier analysis is that it has successfully utilized
the physical time orders, which may include some orders other than logical time orders, to
localize the computation of many undiscovered orders.
3.3 Order Analysis: A Technique Beyond Physical Time Order
As a new dimension of ordering relations in multiprocessor systems, the physical time
order can order some operations that are concurrent from traditional points of view. How-
ever, it is still problematic to say that two operations with neither physical nor logical time
orders are concurrent. In this section, we present a new order that is beyond physical and
logical time orders, and then study the related technique, order analysis, on the basis of the
new order.
As we have mentioned in Section 2.2, the physical time order is independent and consis-
tent with traditional logical time orders. Hence, it is a natural idea to consider the combi-
nation of physical and logical time orders. Formally, the transition closure of the physical
and logical time orders, which is now named time global order, is defined as follows:
Definition 3.2 (Time Global Order). We say that operation u1 is before operation u2 in
time global order iff u1 is before u2 in processor order, or u1 is before u2 in execution
order, or u1 is before u2 in physical time order, or u1 is before some operation u in time
global order and u is before u2 in time global order. Formally,
(u1
TGO
−−−→ u2)→
(
(u1
PO
−−→ u2) ∨ (u1
E
−→ u2) ∨ (u1
T
−→ u2)
∨(∃u ∈ O : u1
TGO
−−−→ u
TGO
−−−→ u2)
)
.
Time global order is not the simple addition of physical and logical time orders. Due to
the transitivity of partial order, two operations with neither physical time order nor logical
time order may have certain time global order.
Now let us come to a novel technique named order analysis, given the definition of
time global order. Order analysis exploits time global orders between operations, and
utilizes them to check the correctness of an execution. As we know, the correctness of
an execution in a multiprocessor system is equivalent to whether there is some cycle in
the corresponding execution graph, where the execution graph is a DAG with its nodes
representing operations and its directed edges representing the orders between operations
(traditionally, these orders are processor order and execution order). Given the pending
period information, a correct execution must further comply with the time global order.
Hence, checking the correctness of an execution is then equivalent to finding a cycle in
the corresponding execution graph, which contains edges in responses to processor order,
execution order, and physical time order. We call this type of graph TGO execution graph
[6].
Let C be the set of all cycles including operation u in the TGO execution graph. Fur-
thermore, let C be a cycle belonging to C (C ∈ C ), such that u is an operation in C (u ∈ C).
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 15
Intuitively, for any operation u, there are three kinds of cycles containing u: 1) all opera-
tions of the cycle except u are not in the pending period of u; 2) some operations of the
cycle are not in the pending period of u, while other operations are in the pending period
of u; and 3) all operations of the cycle are in the pending period of u.
Concerning the first kind of cycles which mainly involves operations outside the pending
period of u, the following lemma proves that they can be reduced to specific local cycles
involving two operations.
LEMMA 3.3. [6] Given a time global order cycle C containing operation u, if all op-
erations in C except u are before u in physical time order, there must be a write operation
w in cycle C, which is after u in execution order. Formally,
(
∀v ∈ C : (v 6= u)→ (v
T
−→ u)
)
→ (∃w ∈ C : u
E
−→ w).
Proof. Given that operation v′ is the successor of u in cycle C, there are three situations
for us to consider: u T−→ v′, u PO−−→ v′, and u E−→ v′. We know that all operations in C
except u are before u in physical time order, therefore u T−→ v′ does not hold. Since tp(v′)
is before tp(u), u
PO
−−→ v′ does not hold. Hence, u E−→ v′ holds. Furthermore, if v′ is a read
operation, v′ certainly cannot get the value of u from the future, and u cannot be before v′
in execution order. Therefore v′ is a write operation. Thus the theorem is proved. 
As shown in Figure 4, the operations u, u1, u2 and u3 consist a cycle in TGO graph,
where u1, u2 and u3 are before the pending period of u. Based on the previous lemma, the
cycle can be reduced to a small cycle u E−→ u3
T
−→ u.
u0
u1
u3
u2
T
PO
E
E
T
Fig. 4. The operations u, u1, u2 and u3 consist a cycle in TGO graph, where u1, u2 and u3 are before the
pending period of u (the clouds in the figure).
Similarly, the second kind of cycles can be reduced to cycles involving merely an op-
eration outside the pending period of u. Moreover, the third kind of cycles are obviously
local cycle inside the pending period of u. Based on the above locality of cycles in TGO
execution graph, we propose three correctness rules in Theorem 3.4 to check the acyclicity
of a TGO execution graph locally. Each correctness rule is related to one kind of cycle
mentioned above.
PREPRINT. July 12, 2009.
16 · Yunji Chen et al.
THEOREM 3.4 CHECKING RULES THEOREM [6]. There is no cycle in the TGO exe-
cution graph of the execution iff for any operation u of the execution, the following three
correctness rules hold:
Rule1: ∀w ∈ O : (w T−→ u)→ ¬(u E−→ w);
Rule2: ∀v, v′ ∈ O : (v T−→ u) ∧ (v′ GO−−→ v)→ ¬(u GO−−→ v′);
Rule3: ¬
(
∃C ∈ C :
(
∀v ∈ C : ¬(u
T
−→ v ∨ v
T
−→ u)
))
.
Proof. “→”. We assume that there is no cycle in the TGO execution graph of the
execution. For Rule 1, given that a write operation w satisfies w T−→ u, u E−→ w does not
hold. Otherwise there will be a cycle w T−→ u E−→ w. For Rule 2, if operations u, v and
v′ satisfy (v T−→ u) ∧ (v′ GO−−→ v), then u GO−−→ v′ does not hold, otherwise there will be a
cycle v′ GO−−→ v T−→ u GO−−→ v′. Rule 3 is trivial: since there is no cycle in the whole graph,
there certainly will be no cycle in the pending period of u. Hence “→” is proved.
“←”. We use reduction to absurdity to prove it. Let us assume that Rules 1, 2 and 3 all
hold, but there is a cycle C. Let operation u be the last performed operation in the cycle.
According to Rule 3, there must be some operation outside the pending period of u. We can
travel C from u. Let v be the first operation before u in physical time order in traveling C.
Since u is the last committed operation in the cycle, u T−→ v cannot hold. Instead, we have
v
T
−→ u. If all operations except u in cycle C are before u in physical time order, according
to Lemma 1, there must be some operation w such that u E−→ w, which contradicts Rule 1.
Hence, there must be some operation in the pending period of u. Let v′ be the precedent
operation of v in C. As shown in Figure 2, u TGO−−−→ v′ and v′ TGO−−−→ v. Let the edge
a
T
−→ b be the first physical time order edge on the path from u to v in C. According to the
definition of physical time order, we obtain that tp(u) < tp(a) < tp(b) < tp(b). However,
since u is the operation committed last in the cycle C, tp(u) cannot be before tp(b), and we
reach a contradiction, there is no physical time order edge from u to v in C. As a result,
u
GO
−−→ v′ and v′ GO−−→ v hold. But u GO−−→ v′ GO−−→ v T−→ u contradicts Rule 2. Thus “←” is
proved. 
In a real system, Rule 1 checks the incorrect performed time: the performed time of an
operation is out of its pending period, thus its effect can not be observed by some operations
with later and disjoint pending periods. To check Rule 1, one need to check whether the
latest write before u in physical time order has propagated to u. Rule 2 focuses on ordering
bugs between operations inside and outside of the pending period. To check Rule 2, one
need to check all operations before u in global order to find cycles as shown in Figure 2.
Rule 3 focuses on cycles inside the pending period.
Theorem 3.4 not only shows how to check whether a TGO execution graph is acyclic,
but also limits the complexity of checking. That is because checking one operation only
involves at most Cp operations: Rule 1 involves only a constant number of operations,
while checking Rules 2 and 3 we need to travel through all operations (the number is at
most Cp, see also Section 3.2) in the pending period of u. As a consequence, checking the
correctness of an execution is with the complexity of O(nCp).
Order analysis is beyond both physical and logical time orders, since it has employed
the time global order. This approach is effective for tackling not only correct behaviors but
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 17
also error behaviors of multiprocessor systems (e.g., cycles in execution graph).
4. APPLICATIONS
In this section, we present two examples, memory consistency verification and event or-
dering problems, to illustrate the significance of our proposed concepts and approaches.
4.1 Memory Consistency Verification Problem
In most modern multiprocessor systems, the memory subsystem employs complex hierar-
chies with many hardware resources to support shared memory, which may contain bugs
about memory consistency and cache coherence. Furthermore, parallel programs and com-
pilers may also bring violations to memory consistency. Even subtle bug about memory
consistency may lead to erratic behavior, which is difficult to find and debug. The mem-
ory consistency verification problem aims at checking the execution of parallel program
against given memory consistency, and has been widely concerned in both academic and
industrial fields. Gibbons and Korach proved that the VSC problem (verifying sequential
consistency) is NP-hard with respect to the number of memory operations [12]. Further,
they also studied the complexity of the VSC problem with some additional information
and constraints [13; 14], and their results include: 1)With read mapping which maps every
read to the write sourcing its value, the obtained VSC-read problem is still NP-complete.
2)With total write order which orders the write operations for each memory location to-
tally, the obtained VSC-write problem is also NP-complete. 3)With both read mapping and
total write order, the obtained VSC-conflict problem belongs to P. However, read mapping
imposes restrictions to the executing parallel program, while obtaining the total write order
even requires adding specialized hardware [34; 9]. Some investigations manage to obtain
a relatively low complexity at the cost of losing completeness (these methods still require
read mapping information) [17; 32; 33; 39]. Nevertheless, without total write order infor-
mation, even the method with least time complexity among these incomplete methods still
require the time complexity O(n3) [33]. Henzinger et al. [18] sought natural restriction
on multiprocessor systems, which bounds the number of pending operations in the system,
to enable formal verification of a high-level system description against memory consis-
tency. However, the method aims at small and manually-constructed system models and is
incapable to solve real design.
In our previous work [6], we proposed a fast and complete memory consistency verifica-
tion method. Our method requires neither read mapping nor total write order, which makes
our method easy to generalize. This method only needs to observe the pending periods of
part of operations periodically to assign a pending period to each operation using the as-
signment analysis approach presented in Section 3.1. As shown in Table 1, the framework
of the algorithm in our method is inherited from [13], which requires to traverse the frontier
graph to find a path from the starting frontier to the terminating frontier. At each frontier
(node of the frontier graph), the checking violence function checks cycle in the current
execution graph: If any cycle is found in the current execution graph, the algorithm should
backtrack to the previous frontier and select another branch to explore; if the current exe-
cution graph is proven to be acyclic, the algorithm should move forward to a next frontier.
Each time moving forward in the frontier graph, some edges of execution order should be
added to the current execution graph; each time backtracking in the frontier graph, some
edges of execution order should be removed from the current execution graph. Once the
current frontier f travels to the terminating frontier and no cycle is found, the execution
PREPRINT. July 12, 2009.
18 · Yunji Chen et al.
Algorithm 1: Algorithm Framework of Memory Consistency Verification
INT MEMORY CONSISTENCY VERIFICATION()
(1) f = starting frontier;
(2) while 1 do
(3) if CHECKING VIOLENCE(current execution graph) then
(4) if f == starting frontier then
(5) return 0;
(6) end if
(7) else
(8) REMOVE EDGE(current execution graph);
(9) f = BACKTRACK(f);
(10) end else
(11) end if
(12) else
(13) if f == terminating frontier then
(14) return 1;
(15) end if
(16) else
(17) f = SELECT BRANCH(f);
(18) ADD EDGE(current execution graph);
(19) end else
(20) end else
(21) end while
is proven to comply with memory consistency model. If the current frontier f has back-
tracked to the starting frontier and no other branch can be selected, the execution is proven
to violate the memory consistency model.
The complexity of memory consistency verification comes from the product of two as-
pects. One aspect is the complexity of traversing the frontier graph, and the other aspect is
the complexity of checking violence in the current execution graph at each frontier. Recall
that with pending period information, the frontier analysis presented in Section 3.2 bounds
the numbers of frontiers in the frontier graph to O(nCp) from above, and the order analy-
sis bounds the complexity of checking cycle in execution graph to O(nCp), therefore the
overall time complexity for complete memory consistency verification is only O(n2Cpp).
It is worth noting that in error multiprocessor systems, there may be bugs of improper
performed times for operations: the actual performed time of an operation may not been
globally observed before the obtained end time of the operation. For example, in a directory
based cache coherent system, a store may not be observed by other processors because
of some error in the directory, thus its actual performed time is later than the obtained
end time. However, although our memory consistency verification method expects that the
obtained pending period of each operation should contain its actual performed time, it does
not require the certified precondition that the obtained pending period of each operation
contains its actual performed time (though we have defined ), since it can find violations
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 19
of both memory consistency model and improper performed time.
On the basis of our theoretical investigations, we have implemented a memory consis-
tency verification tool for CMP, which is named LCHECK [6]. LCHECK can verify a
number of memory consistency models [44], including sequential consistency, processor
consistency, weak consistency and release consistency. It has become an important verifi-
cation method for the functional validation of an industrial CMP called Godson-3 [22; 23],
and have found many bugs of the memory subsystem of Godson-3.
4.2 Event Ordering Problems
Event ordering is another interesting topic related to multiprocessor systems. In different
candidate executions of one parallel program, two operations in the program may occur
in different orders. The event ordering problems investigate the inevitability or possibility
of the order between two operations. They are the theoretical foundations of many other
important problems in multiprocessor system, such as replay of execution [24], debugging
software [37; 31], intrusion detection [30], and so on.
In [35], Netzer and Miller gave a formal analysis for the event ordering problems. They
defined the happened-before, concurrent-with, and ordered-with relations, for operations.
Each of the three ordering relations was further defined in two manners: must-have and
could-have (similar with universal quantifier and existential quantifier in symbolic logic).
The must-have sense requires that the ordering is guaranteed in all legal candidate exe-
cutions of the program (with respect to the given memory consistency model), and the
could-have sense requires that the ordering occurs in at least one legal candidate execu-
tion of the program. Netzer and Miller found that it is co-NP-hard to prove any of the
must-have ordering relations and it is NP-hard to prove any of the could-have ordering
relations. For the sake of brevity, here we only discuss the must-have happened-before
(MHB) and could-have happened-before (CHB) relations in detail, while the discussions
of other relations are similar to the two examples.
According to Gibbons and Korach [13], one path from the starting frontier to the termi-
nating frontier on the frontier graph can represent a candidate execution of the program, the
event ordering problems, which depend on the candidate executions, can be investigated
on the frontier graph of the parallel program. Assume that operation u is an operation
of the processor Pi, operation v is an operation of the processor Pj , and P is the set of
all paths from the starting frontier to the terminating frontier on the frontier graph, then
the must-have happened-before (MHB, denoted by “ MHB−−−−→”) and could-have happened-
before (CHB, denoted by “ CHB−−−→”) relations can be formalized as follows:
u
MHB
−−−−→ v ↔ (∀P ∈ P : (∃f ∈ P : (Pi(f)
PO
−−→ u) ∧ (v
PO
−−→ Pj(f)))); (3)
u
CHB
−−−→ v ↔ (∃P ∈ P : (∃f ∈ P : (Pi(f)
PO
−−→ u) ∧ (v
PO
−−→ Pj(f)))), (4)
where Pi(f) is the operation of frontier f on processor Pi. Regarding the frontier graph,
the must-have happened-before relation between u and v is that for each path from the
starting frontier to the terminating frontier, there is a frontier f on the path whose operation
on processorPi is after u in processor order and whose operation on processorPj is before
v in processor order. Thus deciding the must-have happened-before relation between u and
v is equivalent to a basic problem of the graph theory: to decide all paths (paths on frontier
graph) from one node (the starting frontier) to another node (the terminating frontier) in
a DAG (the frontier graph) must pass through a set of nodes (the set of frontiers which
PREPRINT. July 12, 2009.
20 · Yunji Chen et al.
imply u happens before v), which has time complexity of O(nf + ef) [4] (where nf is the
number of nodes in the frontier graph and ef is the number of edges in the frontier graph).
Therefore, the time complexity of the must-have happened-before problem is bounded
from above by some linear function with respect to the numbers of nodes and edges in the
frontier graph.
Similarly, the could-have happened-before relation of operation between u and v u CHB−−−→
v is that there exists a path from the starting frontier to the terminating frontier on the
frontier graph, which contains a frontier f whose operation on processor Pi is after u in
processor order and whose operation on processor Pj is before v in processor order. It is
equivalent to another basic problem of graph theory: to decide a path (a path in frontier
graph) from one node (the starting frontier) to another node (the terminating frontier) in a
DAG (the frontier graph) which passes through a set of nodes (the set of frontiers which
imply u happens before v), which has time complexity of O(nf + ef) [4]. Hence, the
complexity of the could-have happened-before problem is bounded from above by some
linear function with respect to the numbers of nodes and edges in the frontier graph.
From the above discussions, we can find that the complexities of the two event ordering
problems directly relate to the numbers of nodes and edges in the frontier graph. Similar
discussions can be generalized to other event ordering problems. Given the restrictions of
pending periods and physical time orders, the complexities of event ordering problems can
both be reduced to O(nCpp), since both the nodes and the edges of frontier graph is no
more than O(nCpp) with the additional information.
5. CONCLUSION
In this paper, a novel perspective of utilizing global clock in multiprocessor systems is pre-
sented, demonstrating that the implication of global clock, if being exploited sufficiently,
can have significant influence on the design and analysis of multiprocessor systems. As we
have pointed out in Section 1, a global clock in a multiprocessor system actually implies
a physical-time-based partial ordering for all operations in the system. It is revealed by
our investigations that the above partial ordering can not only export useful information
for supplying logical time order information, but also provide natural constraints for local-
izing the inference between operations. Such natural constraints are defined explicitly as
a partial order named physical time order, which has been proven to be independent and
consistent with traditional logical time orders.
On the basis of the above views and concepts, we have proposed a number of approaches,
which are named pending period analysis as a whole, focusing on different aspects of
making our idea of utilizing global clock practical. These approaches, together with the
definitions of pending period and physical time order, actually provide solutions for the
difficulties mentioned at the end of Section 1.1. Concretely, the concept of pending period,
which is actually a flexible relaxation of the precise performed time, has given a feasible
solution for handling the hard-to-obtain precise performed time. Moreover, the frontier
analysis presented in Section 3.2 has limited the complexity of conjecturing or inferring
the ordering relations through pruning the space of candidate executions, and the order
analysis presented in Section 3.3 further combines the physical and logical time orders
to infer the ordering relations inside overlapped pending periods. Finally, the assignment
analysis carried out in Section 3.1 demonstrates that, observing pending periods of only
part of the operations is enough to obtain pending periods of all operations.
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 21
The pending period analysis has been adopted in two application problems in multi-
processor systems. One of these problems, complete memory consistency verification, is
simplified from NP-hard to the time complexity of O(n2Cpp) with pending period anal-
ysis. This fast and complete memory consistency verification method has been employed
in industry. Moreover, the two event ordering problems, which were proven to be Co-NP-
Hard and NP-hard respectively, can now be solved with the time complexity of O(nCpp)
if restricted by pending period information. It can be hoped that more problems in multi-
processor systems can be facilitated by the view, concepts and approaches proposed in this
paper.
REFERENCES
[1] Arvind and J. Maessen. “Memory Model = Instruction Reordering + Store Atomicity”. In Proceedings of
the 33st International Symposium on Computer Architecture (ISCA’06), 2006.
[2] H. Cain and M. Lipasti. “Verifying Sequential Consistency Using Vector Clocks”. In Proceedings of the
14th ACM Symposium on Parallel Algorithms and Architectures (SPAA’02), 2002.
[3] J. Cantin, M. Lipasti, and J. Smith. “The Complexity of Verifying Memory Coherence”. In Proceedings
of the 15th ACM Symposium on Parallel Algorithms and Architectures (SPAA’03), 2003.
[4] G. Chartrand and P. Zhang. Introduction to Graph Theory. McGraw-Hill Press, 2004.
[5] P. Chatterjee, H. Sivaraj, and G. Gopalakrishnan. “Shared Memory Consistency Protocol Verification
Against Weak Memory Models: Refinement via Model-Checking”. In Proceedings of the 14th International
Conference on Computer Aided Verification (CAV’02), 2002.
[6] Y. Chen, Y. Lv, W. Hu, T. Chen, H. Shen, P. Wang, and H. Pan. “Fast Complete Memory Consistency Ver-
ification”. In Proceedings of the 15th International Symposium on High-Performance Computer Architecture
(HPCA’09), 2009.
[7] W. Collier. Reasoning About Parallel Architectures. Prentice-Hall Press, 1992.
[8] A. Condon, M. Hill, M. Plakal, and D. Sorin. “Using Lamport Clocks to Reason About Relaxed Memory
Models”. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture
(HPCA’99), 1999.
[9] A. DeOrio, I. Wagner, and V. Bertacco. “Dacota: Post-silicon Validation of the Memory Subsystem in
Multi-core Designs”. In Proceedings of the 15th International Symposium on High-Performance Computer
Architecture (HPCA’09), 2009.
[10] M. Dubois, C. Scheurich, and F. Briggs. “Memory Access Buffering in Multiprocessors”. In Proceedings
of the 13rd International Symposium on Computer Architecture (ISCA’86), 1986.
[11] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. “Memory Consistency
and Event Ordering in Scalable Shared-Memory Multi Processors”. In Proceedings of the 17th International
Symposium on Computer Architecture (ISCA’90), 1990.
[12] P. Gibbons and E. Korach. “The Complexity of Sequential Consistency”. In Proceedings of the 4th IEEE
Symposium on Parallel and Distributed Processing (SPDP’92), 1992.
[13] P. Gibbons and E. Korach. “On Testing Cache-Coherent Shared Memories”. In Proceedings of the 6th
ACM Symposium on Parallel Algorithms and Architectures (SPAA’94), 1994.
[14] P. Gibbons and E. Korach. “Testing Shared Memories”. SIAM Journal on Computing, Vol. 26, No. 4,
pp. 1208-1244, 1997.
[15] J. Goodman. “Cache Consistency And Sequential Consistency ”. Technical Report No. 61, SCI commit-
tee, 1989.
[16] G. Gopalakrishnan, Y. Yang, and H. Sivaraj. “QB or not QB: An Efficient Execution Verification Tool
for Memory Orderings”. In Proceedings of the 16th International Conference on Computer Aided Verification
(CAV’04), 2004.
[17] S. Hangal, D. Vahia, C. Manovit, J. Lu, and S. Narayanan. “Tsotool: A Program for Verifying Memory
Systems Using the Memory Consistency Model”. In Proceedings of the 31st International Symposium on
Computer Architecture (ISCA’04), 2004.
PREPRINT. July 12, 2009.
22 · Yunji Chen et al.
[18] T. Henzinger, S. Qadeer and S. Rajamani. “Verifying Sequential Consistency on Shared-memory Mul-
tiprocessor Systems”. In Proceedings of the 11th International Conference on Computer Aided Verification
(CAV’99), 1999.
[19] M. Herlihy and J. Wing. “Linearizability: a Correctness Condition for Concurrent Object”. ACM Trans-
actions on Programming Languages and Systems, Vol. 12, No. 3, 1990.
[20] M. Herlihy and J. Moss. “Transactional Memory: Architectural Support for Lock-Free Data Structures”.
In Proceedings of the 20th International Symposium on Computer Architecture (ISCA’93), 1993.
[21] W. Hu. Shared Memory Architecture. Doctoral dissertation, Institute of Computing Technology, Chinese
Academy of Sciences, Beijing, 1996.
[22] W. Hu, J. Wang, X. Gao, and Y. Chen. “Micro-architecture of Godson-3 Multi-Core Processor”. In
Proceedings of the 20th Hot Chips, 2008.
[23] W. Hu, J. Wang, X. Gao, Y. Chen, Q. Liu, and G. Li. “Godson-3: A Scalable Multicore RISC Processor
with x86 Emulation”. IEEE Micro, Vol. 29, No. 2, 2009.
[24] D. Hower and M. Hill. “Rerun: Exploiting Episodes for Lightweight Memory Race Recording”. In
Proceedings of the 35th International Symposium on Computer Architecture (ISCA’08), 2008.
[25] N. Jouppi and D. Wall. “Available Instruction-Level Parallelism for Superscalar and Superpipelined
Machines”. In Proceedings of the 3rd International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS’89), 1989.
[26] P. Keleher, A. Cox, and W. Zwaenepoel. “Lazy Release Consistency for Software Distributed Shared
Memory”. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA’92), 1992.
[27] L. Lamport. “Time, Clocks, and the Ordering of Events in a Distributed System”. Communications of
the ACM, Vol. 21, No. 7, pp. 558-565, 1978.
[28] L. Lamport. “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Pro-
grams”. IEEE Transactions on Computers, Vol. 28, No. 9, pp. 690-691, 1979.
[29] A. Landin, E. Hagersten, and S. Haridi. “Race-free Interconnection Networks and Multiprocessor Con-
sistency”. In Proceedings of the 18th International Symposium on Computer Architecture (ISCA’91), 1991.
[30] T. Leblanc and J. Mellor-Crummey. “Debugging Parallel Programs with Instant Replay”. IEEE Trans-
actions on Computers, Vol. 36, No. 4, pp. 471-482, 1987.
[31] S. Lu, J. Tucek, F. Qin, and Y. Zhou. “AVIO: Detecting Atomicity Violations via Access Interleaving
Invariants”. In Proceedings of the 12nd International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS’06), 2006.
[32] C. Manovit and S. Hangal. “Efficient Algorithms for Verifying Memory Consistency”. In Proceedings
of the 17th ACM Symposium on Parallelism in Algorithms and Architecure (SPAA’05), 2005.
[33] C. Manovit and S. Hangal. “Completely Verifying Memory Consistency of Test Program Executions”. In
Proceedings of the 12nd International Symposium on High-Performance Computer Architecture (HPCA’06),
2006.
[34] A. Meixner and D. Sorin. “Dynamic Verification of Sequential Consistency”. In Proceedings of the 32nd
International Symposium on Computer Architecture (ISCA’05), 2005.
[35] R. Netzer and B. Miller. “On the Complexity of Event Ordering for Shared-Memory Parallel Program
Executions”. In Proceedings of the International Conference on Parallel Processing (ICPP’90), 1990.
[36] M. Plakal, D. Sorin, A. Condon, and M. Hill. “Lamport Clocks: Verifying a Directory Cache-Coherence
Protocol”. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA’98),
1998.
[37] E. Pozniansky and A. Schuster. “Efficient On-the-fly Data Race Detection in Multihreaded C++ Pro-
grams”. In Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Pro-
gramming (PPoPP’03), 2003.
[38] T. Riegel, P. Felber, and C. Fetzer. “A Lazy Snapshot Algorithm with Eager Validation”. In Proceedings
of the 20th International Symposium Distributed Computing (DISC’06), 2006.
[39] A. Roy, S. Zeisset, C. Fleckenstein, and J. Huang. “Fast and Generalized Polynomial Time Memory
Consistency Verification”. In Proceedings of the 18th International Conference on Computer Aided Verifica-
tion (CAV’06), 2006.
[40] C. Scheurich and M. Dubois. “Correct Memory Operation of Cached-Based Multiprocessors”. In Pro-
ceedings of the 14th International Symposium on Computer Architecture (ISCA’87), 1987.
PREPRINT. July 12, 2009.
Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems · 23
[41] D. Shasha and M.Snir. “Efficient and Correct Execution of Parallel Programs that Share Memory”. ACM
Transactions on Programming Languages and Systems, Vol. 10, No. 2, pp.282-312, 1988.
[42] A. Singla, U. Ramachandran, and J. Hodgins. “Temporal Notions of Synchronization and Consistency
in Beehive”. In Proceedings of the 9th ACM Symposium on Parallel Algorithms and Architectures (SPAA’97),
1997.
[43] M. Spear, V. Marathe, W. III, and M. Scott. “Conflict Detection and Validation Strategies for Soft-
ware Transactional Memory”. In Proceedings of the 20th International Symposium Distributed Computing
(DISC’06), 2006.
[44] R. Steinke and G. Nutt. “A Unified Theory of Shared Memory Consistency”. Journal of the ACM, Vol.
51, No. 5, 2004.
PREPRINT. July 12, 2009.
