This paper establishes the necessary and su cient condition for a correct clock resetting such that the functionality of vector clocks can be preserved. A clock reset protocol is presented with its applicability and limitation discussed. Our result indicates that for some applications, the potential of clock over ow can be completely prevented by carefully choosing the condition for initiating the clock reset protocol. 
No common system-wide clocks exist in distributed systems. Their absence makes it di cult to reason about temporal ordering of events occurring in these systems. Lamport 15] de ned the happened-before relation which is a partial ordering on the set of system events. To realize this relation, Fidge 8] and Mattern 18] introduced the vector clock scheme, in which each event is timestamped with the value of the vector clock maintained by the process where the event occurs. By comparing the timestamps of events, we can determine the causal relation of events in the system. It has been proven that, given n processes, vector clock size must be at least n in order to characterize the causality of events 4]. Singhal and Kshemkalyani also presented an e cient implementation of vector clocks 24] . Applications of vector clocks have been found in the eld of distributed debugging 18, 12] , especially in detecting global predicates 7, 11, 5, 10, 6] . Vector clocks have also been used in developing protocols which guarantee causal ordering among messages 2, 23], rollback recovery 19] , and in many other distributed applications 20].
When using vector clocks, it is di cult to determine the optimal number of bits used to implement the integer variables for clock values. If the number is too small, over ow is imminent; if the number is too large, the extra cost of storing and maintaining clocks becomes intolerable. One straightforward strategy to relieve us from such a dilemma is to roughly estimate the necessary number of bits, and to reset vector clocks at each process when any of them is about to over ow. The purpose of this paper is to establish the necessary and su cient condition for a correct clock resetting such that the functionality of vector clocks can be preserved. Our result indicates that for some applications, the potential clock over ow can be completely prevented by carefully choosing the condition for initiating the clock reset protocol.
3 presents and proves a necessary and su cient rule for a correct clock resetting. The implementation details of this rule is given in Section 4, with its applicability and limitation discussed in Section 5. Section 6 concludes this paper. Let n be the number of processes in a distributed system. Each process P i has its own vector clock V i which is de ned as an n-element integer vector. Every event e occurring in P i is assigned a timestamp T (e), which is the content of V i at the instant that e occurs.
The rule by which each process evolves its clock and timestamps its events is as follows.
(1) When an internal or sending event e occurs at process P i , V i i] is increased by one, and then assigned to T (e). (2) Every message m is attached a timestamp T (m) equal to the timestamp of the event that sends m. When message m is received by process P i , V i i] is incremented by one, and for all k 6 = i, V i k] := max(V i k]; T (m) k]). The timestamp of this receipt event is then set to be the result V i .
Let T i and T j represent two distinct timestamps, between which an ordering relation lessthan (<) is de ned 24] as T i < T j if T i 6 = T j^8 k : T i k] T j k]. The set of all events possesses the vector clock property 18]. That is, for any two events a and b, a ! b () T (a) < T (b).
Note both the happened-before and the less-than relation are precedence 17] relations, which means that they are antisymmetric and transitive. Many researchers have extended these relations to include the re exivity property, so that the terminologies and results pertaining to partial ordering relations can be applied.
We call the action that a process takes to reset its vector clock a reset event. A collection of reset events, one from each process, constitutes a reset cut. The setup of a new reset cut concludes the current timestamping phase and starts a new one. Let us begin with Phase 1 and let E i be the set of all events occurring in Phase i. It Figure 1) . A reset cut partitions the progression of clocks into two separate parts. However, a forward or a backward message which crosses some reset line can break the vector clock property. The delivery of a forward message propagates obsolete information about the sender's clock to the receiver, causing the receiver to incorrectly update its clock. So it is possible that there exist two events e i and e j in the same timestamping phase and that T (e i ) < T (e j ) but e i 6 ! e j . As an example, consider the time-space diagram shown in Figure 1 . Suppose that each process P i resets its respective vector clock at time t i . The delivery of m 1 causes the timestamp of event b, which is the receipt of m 1 , to leap to 7; 2; 7]. We can see that T (a) < T (b) but a 6 ! b. The delivery of a backward message can also break RVCP. Message m 2 in Figure 1 is a such example. We can see that c ! d but T (c) 6 < T (d). This problem arises since the clock evolution contributed by the reception of a backward message will vanish when later this clock is reset. Evidently, it is necessary to preclude the possibility of forward or backward messages to guarantee RVCP. This is explicitly stated as follows.
Reset Rule: Messages must not cross any reset line.
All messages sent from process P i to process P j before P i resets its clock must be received before P j has reset its clock, thus precluding the possibility of forward messages.
All messages sent from process P i to process P j after P i resets its clock must be received after P j has reset its clock, thus precluding the possibility of backward messages.
This rule demands synchronization between clock resetting and message receptions. In the next subsection we shall justify that the reset rule is not only necessary, but also su cient to ensure RVCP.
The Reset Rule is Su cient for RVCP
A set of events E, together with the happened-before relation ! on E, constitutes an event structure (E; !) that represents a distributed computation. A causal chain de ned on (E; !) is a sequence of events e 1 ; e 2 ; : : :; e r , where r 2, such that e i 2 E; 1 i r, and e 1 ! e 2 ; e 2 ! e 3 ; ; e r?1 ! e r . The set of all possible causal chains starting at event a and ending with event b is denoted by (a; b). Let ( ) denote the set of all elements contained in a causal chain . A causal chain : e 1 ; e 2 ; : : : ; e r is said to be a closure of (e 1 ; e r ) if and only if 8i 2 f1; : : : ; r ? 1g : 8e 0 2 E ? ( ) :: e i 6 ! e 0 _ e 0 6 ! e i+1 .
Lemma 1 Given a nite event set E and two events a; b 2 E, each non-closure causal chain 2 (a; b) can be extended to a closure of (a; b).
Proof: Let : e 1 ; e 2 ; : : :; e r be a non-closure causal chain, where a = e 1 and b = e r . It follows that 9i 2 f1 Case 2 9a; b 2 E i : T (a) < T (b)^a 6 ! b.
Since timestamps always take monotonically increasing values in the same timestamping phase, this case arises only if the timestamp of event b is illogically enlarged. The only way for processes to illogically enlarge vector contents is to receive forward messages whose timestamps are obsolete but larger than those of the receivers. Message m 1 in Figure 1 is a such example. 2 4 The Implementation of the Reset Rule Theorem 1 implies that a correct reset cut must be a strongly consistent cut 9, 13], i.e., a consistent cut without in-transit messages (forward messages in terms of our de nition). In the following we present a coordinating protocol that yields on-the-y reset cuts.
The Algorithm
Our approach is inspired by Chandy and Lamport's distributed snapshot algorithm 3]. We assume that between any two processes, there is at most one communication channel connected which provides bidirectional, reliable, and FIFO-ordered delivery. Message transmission delays are assumed to be arbitrary but nite.
Two kinds of control messages are used by our protocol: reset req and reset done. Each process P i , which can operate in one of the three modes normal, mute, and stand-by, maintains a variable S i that records P i 's current process mode. Let (P i ) denote the set of processes having a communication channel connected to P i . For each of P i 's neighbor, P j 2 (P i ), P i maintains a variable, S i;j , that records P j 's process mode currently known by P i . S i and S i;j are initiated to be normal for all i, j.
The execution of our algorithm is triggered by some condition local to a process (which will be discussed later). A process that starts the execution is called an initiator, which rst sends control message reset req to each of its neighbors and then enters mute mode. A process operating in mute mode is not allowed to send application messages. Any non-initiator process P i operating in normal mode, on receiving reset req for the rst time, behaves like an initiator. That is, it sends out reset req message to each of its neighbors, and then enters mute mode. In the mean time, it sets S i;k to mute, if P k is the process that sent reset req to P i . When P i in mute mode receives a control message from one of its neighbors, say, P j , it sets S i;j according to the message it receives: it sets S i;j to mute on receiving reset req, and sets S i;j to stand-by on receiving reset done.
At P i , when the values of all S i;j 's have been changed from normal to mute or stand-by, P i resets its clock, sends control messages reset done to all its neighbors, and then enters stand-by mode. A process operating in stand-by mode can send application messages to another only if the former has recorded the latter's mode as stand-by. Finally, after the values of all S i;j 's have been changed to stand-by, P i sets all S i;j 's to normal, and enters back to normal mode, which indicates the completion of the current run of the protocol on this process. The detailed algorithm of the clock reset protocol executing in P i is shown in Figure 2 . Note both message-driven routines are atomic, i.e., non-interruptable during its execution.
Correctness Justi cation
We now justify that the presented protocol correctly implements the clock reset rule. For any two adjacent processes P i and P j , P j does not send any application message to P i after it sends reset req to P i and before it resets V j . Because message delivery is FIFO, the delivery of P j 's reset req message on P i indicates that all application messages sent by P j before V j is reset have been received. Since P i resets its clock after it has received reset req along each incoming channel, our protocol precludes the possibility of forward messages.
After resetting V i , P i sends application messages to P j only after it has received reset done from P j , which indicates that P j has already reset its own clock. Therefore our protocol also precludes the possibility of backward message.
The process that initiates the reset protocol e ectively plays the role of a di using source of reset req messages. If two or more processes initiates the protocol simultaneously, there will be multiple sources that spread reset req messages. Concurrent initiations of the protocol does not cause any correctness problem, since the correctness of the protocol relies on the fact that eventually reset req messages are spread over every communication channel, no matter how many di using sources there will be.
Eliminating Reset done Messages
Control message reset done is used to prevent the occurrence of backward messages. It can be eliminated if we modify the original protocol as follows. (1) Each process now operates in two possible modes: normal or mute. (2) As soon as P i resets its clock, it is allowed to send out application messages to any adjacent process. (3) When P i operating in mute mode receives an application message m sent from P j , it examines the value of S i;j to decide what action should be taken. If the value of S i;j is normal, m must have been sent before P j resets V j and therefore can be accepted; if S i;j = mute, m must have been sent after P j resets V j and thus needs to be bu ered. The bu ered messages will not be accepted or processed until P i resets V i .
With this modi cation, potential backward messages are not inhibited by their sender. Instead, they are bu ered at the receiver site and will not be processed until the receiver has reset its clock. So the correctness of the original protocol is still preserved.
This approach reduces message cost, but we need to pay for storage cost instead. If available storage for bu ering backward messages at some process is limited, all other processes sending messages to this process must be careful not to overrun its bu er. Suppose that P j has a storage bu er which is capable of storing r messages for P i . After P i resets V i but before any application message is received from P j , P i has no way to tell if P j has reset V j or not, so the maximal number of messages allowed to be sent from P i to P j is limited to r. P i has to suspend sending to P j after it has sent out r application messages to P j , unless and until it has received an application message from P j , which indicates the completion of V j 's reset.
Discussion

The Triggering Condition of the Protocol
The triggering condition of the reset protocol must be appropriately set up so that vector clocks do not over ow before being reset. Mattern 18] showed that at any instant of time, 8i; j : V i i] V i j]. Therefore, if each process P i can ensure that clock entry V i i] will not get overwhelmed, over ow is not possible. Generally, V i i] is incremented by one every time a message is sent, a message is received, or an internal event occurs. In some applications, however, we do not concern for the causality of internal events, and vector clocks do not advance on the occurrence of internal events. We can prevent clock over ow in this kind of applications by constraining the number of messages allowed to be sent within each timestamping phase. Speci cally, each process counts the number of messages it sends in a per-channel basis, and initiates the reset protocol when some of its counting value corresponding to a particular channel reaches a prede ned limit for that channel. Since if all processes constrain the number of messages they send, the number of messages they may receive is also bounded, there is no need to also constrain the number of messages a process is allowed to receive. Let L i;j denote the maximal number of messages allowed to be sent from P i to P j . With our reset protocol, an over ow-free setting of L i;j must satisfy the following inequality:
where t i denotes the maximal value of V i i]. Finding a triggering condition subject to (2) is not di cult. As an example, L i;j can be set to 1 2 min(b t i j (P i )j c; b t j j (P j )j c)
for each P i and P j 2 (P i ). However, nding a triggering condition both to satisfy (2) and to constrain message sending only when necessary is impossible without prior knowledge of run-time behavior of the processes in the system.
The Insu ciency of RVCP
For some applications, we may have to compare timestamps of events occurring in di erent timestamping phases. RVCP does not help in this case, thus we need an auxiliary function for this kind of event comparisons. Let a and b be two events occurring respectively in timestamping phases E i and E j (i < j). It is impossible that b happens before a, since the reset protocol precludes the possibility of any backward messages. The auxiliary function therefore only needs to decide whether a happens before b or a and b are concurrent. Unfortunately, implementing such a rule will inevitably involve attaching each event an additional variable indicating the number of the current timestamping phase, which has essentially the same unfavorable e ect as adding an extra entry in the clock vector. Moreover, the variable storing the timestamping phase number may also over ow. How this problem can be dealt with depends on available domain knowledge about the applications. For example, if a vector-clock application never needs to examine causal relationship between events that are two or more timestamping phases apart, a three-valued counter is su cient to represent the current timestamping phase number. Let f1; 2; 3g be the set of possible values. We can . These three values thus can be cyclicly used without worrying about over ow.
Applicability of the Protocol
In the application of preserving causal message ordering 2, 21, 23], vector clocks advance only when a message is sent or received, so clock over ow can be completely prevented by setting up a triggering condition satisfying (2) . Vector clocks in this kind of applications are mainly used to timestamp messages. Timestamps are to be examined by receiver processes to determine if received messages can be delivered. Once a message has been delivered, its timestamp becomes useless and can be safely discarded. Since our protocol ushes all messages between timestamping phases, there is no possibility to compare two timestamps or vector clock contents that are from di erent timestamping phases. Therefore, the insuciency of RVCP does not cause a problem.
In some applications, vector clocks advance only when a message is sent or received, and are used to timestamp events or states as well as messages. However, only certain types of events or states are of interest, and so are their timestamps. For example, in applications that exploit vector clock to detect global predicates 11, 5, 10, 6], only local states that satisfy some particular local predicate are of interest. Their timestamps are locally collected and on which a consistent global predicate is to be identi ed by either a centralized checker process or a set of cooperative processes. Since this kind of clocks does not advance on the occurrences of internal events, clock over ow will not happen if we set up trigger conditions satisfying (2) . However, the insu ciency of RVCP does pose a problem, as noted in Section 5.2, in identifying consistent global states. We believe that this problem has no satisfactory solution other than using extra counter bits to represent phase numbers.
In other applications, clocks advance even on the occurrences of internal events, and it is needed to examine timestamps that are from di erent timestamping phases. The insu ciency of RVCP remains to be an inherent problem. Since internal events may occur arbitrarily, it is impossible to prevent clock over ow unless we can suspend a process's computation during the execution of the reset protocol. Suspending a process's computation is usually considered unacceptable. Therefore, for this kind of applications, clock resetting is not appropriate.
Related Work
Our necessary and su cient condition for clock reset can be related to the consistent snapshot recording problem 3] in the sense that a correct reset cut forms a consistent snapshot with no in-transit messages, if each reset event is viewed as an event that takes local snapshot. However, existing solutions to this problem 3, 14, 16, 13, 1] do not preclude the existence of in-transit messages, and thus cannot be adopted as a reset protocol. Fischer et al. 9] proposed a method of taking strongly consistent (i.e., no in-transit message) global checkpointings for distributed transaction system. Their method is suitable only for an o -line analysis of the entire system, and thus cannot be used to produce on-the-y reset cuts.
Birman et al. proposed a ushing protocol which is used to cope with the changes of group membership in a process group 2]. After executing the ush protocol, all processes can reset their clocks, so clock over ow can be prevented in some way. Our method is similar to their timestamp reinitializing technique. Both approaches use two-phase ushing protocol to conclude a timestamping phase and start the next one. Additionally, both require that message sending should be inhibited during the execution of the ushing protocol.
However, Birman's method resets vector clocks after the ush protocol is completed, while ours does so as soon as the rst phase of the ushing protocol has been completed. Moreover, in our protocol, after resetting its clock a process can start communicating with another process provided that the former has been informed of the latter's reset action (as explained in Section 4.1). As a consequence, the time period during which message sending is inhibited will be much shorter in our protocol.
Concluding Remarks
For many vector-clock applications, our scheme relieves us from the di cult task of determining the optimal number of bits to implement vector clocks. One only has to determine a triggering condition for the reset protocol such that the overhead of strongly consistency enforcement between phases can be tolerated while clock over ow can be prevented.
Although we are primarily concerned with vector clock reset, the established result can also applied to matrix clocks 20, 22] .
Author Biographies
Li-Hsing Yen received the B.S. and M.S. degrees in computer science and information engineering, both from National Chiao Tung University, Hsinchu, Taiwan, in 1989 and 1991, respectively. Since September 1993, he has been a Ph.D. student in the Department of Computer Science and Information Engineering at National Chiao Tung University, Hsinchu, Taiwan. His current research interests include distributed algorithms, program testing and veri cation, and mobile computing. E-mail: lsyen@csie.nctu.edu.tw. 
Captions
