The precise temporal attribution of environmental events and measurements as well as the precise scheduling and execution of corresponding reactions is of utmost importance for networked sensor/actuator systems. Apart, achieving a well synchronized cooperation and interaction of these wirelessly communicating distributed systems is yet another challenge. This paper summarizes various related problems which mainly result from the discretization of time in digital systems. As an improvement, we'll present a novel technique for the automatic creation of highly precise event timestamps, as well as for the scheduling of related (re-)actions and processes. Integrated into an operating system kernel at the lowest possible software level, we achieve a symmetric error interval around an average temporal error close to 0 for both the timestamps and the scheduled reaction times. Based on this symmetry, we'll also introduce a dynamic self-calibration technique to achieve the temporally exact execution of the corresponding actions. An application example will show that our approach allows to determine the clock drift between two (or more) independently running embedded systems without exchanging any explicit information, except for the mutual triggering of periodic interrupts.
INTRODUCTION
Wireless sensor/actuator networks (WSAN) are commonly deployed to observe and interact with their environment. In this respect, temporal and spatial information are the two most fundamental measures for the "attribution" or "tagging" of states and events (i.e. state transitions) within any observed environment. In this context, the states describe a set of physical and logical conditions at a given position and at a certain time, and they are specified by one or more continuous or discrete state values.
Recorded over a certain period of time, variations in the state values allow the detection and analysis of events and event patterns within the environment (Wittenburg et al., 2010) (Römer, 2008) . These variations do not only indicate the events' spatial extend, propagation speed, and influence on the environment, but most commonly they also allow the prediction of future states for both the observing system and its surrounding. In this regard, the interaction with the environment, which we already proclaimed to be the most central objective in sensor/actuator systems, typically requires the precise knowledge of time and space to be associated with a node's self-captured and externally obtained values in order to properly correlate the contained information, and to trigger adequate reactions. In fact, measured or otherwise obtained environmental information is often useless unless it is associated with temporal and spatial information.
This paper starts with a discussion of various problems regarding time in digital systems. Next, we present a novel approach for taking timestamps, for measuring and specifying temporal delays, and for scheduling and ensuring reaction times with a symmetric temporal error around 0. Finally, a real-world test bed shows how periodically communicating embedded systems can determine their relative clock drifts without any additional information exchange.
TIME IN DIGITAL SYSTEMS
In contrast to specific position vectors and state values of e.g. sensor nodes (which can change sporadically, arbitrarily and independently from each other), time is a common property. It is system independent, and advances continuously with a globally constant rate of change. If the sensor nodes manage to establish a network-wide and consistent notion of time, this in-formation provides a natural base for their joint interaction and with the environment. Since processors in synchronous digital systems like sensor nodes are always driven by a clock generator C with frequency f C and period λ C = 1 f C , time and time intervals can easily and individually be measured -at least in theory: If it is possible to count the number of elapsed clock periods since a well defined point in time, e.g. the system start, each captured event e, e.g. indicated by an interrupt, can be attributed with its current counter value c e . Consequently, the event's absolute local system timet e can easily be recovered bỹ
and the time difference (i.e. the delay)∆ e 1 ,e 2 between two events e 1 , e 2 computes as ∆ e 1 ,e 2 :=t e 2 −t e 1 = (c e 2 − c e 1 ) · λ C .
Obviously, both the timet e and the delay∆ e 1 ,e 2 already involve a concept-inherent imprecision caused by the discretized counter values c e ∈ N. In addition, we silently assumed for Eq. (1) and Eq. (2) that λ C is perfectly known and constant. Neither is true under real-world conditions, and leads to well-known problems, we'll address and counteract in this paper.
Furthermore, as often requested for interactive systems, a reaction r can be scheduled for a captured event e. Its intended execution time t r ∈ R is commonly related to any event timestamp t e ∈ R by the specification of a corresponding delay ∆ e,r ∈ R:
However, in the best case, i.e. if the (operating system) scheduler permits the timely switch to the responding task's context, the reaction will be triggered upon reaching the corresponding counter value c r ∈ N and the corresponding system timet r . In any case, the finally observable reaction delay depends on the resolution λ C of the system timer:
Although the inherent rounding imprecision is quite intuitive and introduces various hidden implementation problems in real systems, it is commonly simply ignored. Moreover, for concurrent task systems with dynamic execution flows, there is an additional error int r which is neither constant nor predictable. Since most embedded operating systems silently accept even this problem, application developers are urged to compensate the imprecision with little control at the task level. SensorOS (Kuorilehto et al., 2007) at least tries to execute reactions in time by scheduling the responsible task earlier. Yet, the applied "delta time" is constant and won't adapt to changing system loads as well.
In the following we'll indicate and discuss the causes and effects of the mentioned problems in detail, and present an approach at the kernel level to reliably compensate the related imprecision in the average case.
Problem P1: Discretization of time. The difference between the true global time t and the individual system timet has already become visible in Eq. (1) and Eq. (4). While the first advances continuously, the use of a digital counter leads to a discretization of the latter, and imposes a resolution which directly depends on the counter's clock frequency f C . This may lead to serious systematic errors for the time measurement and the subsequent scheduling of reactions:
The simple capturing of a timestamp t for an event -the so called timestamping -is immediately affected by some inevitable rounding, and suffers from a measurement error E t ∈ I 1 with |I 1 | = λ C . For the naïve and adverse reading of the timer counter, rounding down results in I 1 := [0, λ C ), and induces a symmetry around the average measurement error E t,av = 1 2 λ C . Depending on the use of such timestamps, the emerging errors might accumulate during the system runtime. Similarly, the explicit specification of delays ∆ t in software is also subject to rounding errors E ∆ . However, we can round half up (e.g. according to DIN 1333) manually when selecting a delay, and thus the corresponding error is E ∆ ∈ I 3 :
Though not avoidable entirely, I 3 is at least symmetric around 0, and the average error is 0.
Based on these two fundamental error intervals I 1 and I 3 , other intervals can be derived, and consequently exhibit an imprecision, too: For the measurement of delays ∆ E , we see the implicit compensation of the asymmetry in I 1 : E ∆ ∈ I 2 := I 1 − I 1 = (−λ C , +λ C ). In contrast, the scheduling of reaction times t on external events inherits the asymmetry in I 1 :
. System reactions will consequently suffer from an average systematic lateness of 1 2 λ C . Table 1 summarizes the error types and their corresponding intervals which must be expected for the naïve capturing of timestamps by simply reading the timer register after the corresponding event occurrence (e.g. within an IRQ handler). The resulting effects, and our proposed solution to compensate this asymmetry, will be discussed later.
Problem P2: Capturing of timestamps. The creation of reactive systems demands for the precise assignment of timestamps for internal and external 
events. Reaching a voltage threshold at an analogdigital-converter (ADC) or detecting a signal edge at an I/O pin are just two simple examples. However, most observable changes within the environment have one thing in common: They are indicated to the CPU at runtime by so called interrupt requests (IRQs), and should be handled as soon and as fast as possible by the corresponding interrupt service routines (ISRs). Since ISRs are commonly higher privileged than regular application code, they will preempt the latter for their own execution. Thus, they seem to be perfectly suitable for capturing the timestamp for any emerging event. However, even the first instruction within each ISR is not executed before some additional delay, which is also known as interrupt latency ∆ IRQ : If the timer value c TS for the timestamp itself is copied after another implementation-specific delay ∆ ISR within the ISR, then we can compute the discrete timestamp t e for the captured event e as follows:
Hence, a prerequisite for reliable time tracking via Eq. (5) is, that the correction value ∆ TS is constant and free from rounding errors with respect to the discrete system time period. As we will see, both can be achieved through careful code preparation.
Problem P3: Simultaneity and scheduling reliability. Although the perfectly simultaneous transition of two states can never occur in real systems 1 , the surjective discretization of time can easily lead to the assignment of exactly the same system time for multiple events or scheduled actions. Since resource conflicts often prevent the truly parallel processing of events as well as the simultaneous execution of (re)actions, they usually lead to an implicit serialization. The order depends on the task scheduler and the internal task priorities. Since there is most commonly just a single 1 The resolution of the time measurement must simply be chosen fine enough! IRQ controller, this is already true for the generation of timestamps. In fact, the maximum degree of parallelism is always limited by the number of available functional units 2 . A reliable scheduling (e.g. for complying to hard real-time demands) must be achieved through either static techniques at development time or dynamic methods at runtime. A corresponding technique for dynamic resource management under real-time conditions is presented in (Baunach, 2012) .
Problem P4: Imprecision in the timer frequency. Time measurement in digital systems is usually accomplished by using a pulse generator with a specified frequency f 0 . Internally, this component uses an oscillator (most commonly a quartz crystal) to generate a periodic clock signal. The characteristics and stability of such oscillators depend significantly on their manufacturing parameters, age, and various environmental conditions like e.g. voltage variations (Hewlett Packard, 1997) : A varying frequency drift ∆ f must always be expected. Its relative error
is commonly expressed in units of parts per million (ppm). For simple low-cost quartzes, and within the typical temperature ranges of WSAN applications, the temperature sensitivity of a typical HC49 quartz can already result in deviations of ± 20 ppm. Variations in the clock precision are especially critical in distributed applications. Since time measurement is initially individual for each involved system and therefore can drift apart, this may quickly generate inconsistent data, and must be compensated by adequate and repeated synchronization measures.
Problem P5: Global time base and synchronization with other systems. When does time measurement actually start, i.e. when is or was time t 0 = 0? If we consider a completely independent system which uses the notion of time only for its internal operation, e.g. to capture events and to schedule actions by a partial order 3 , the use of a pure local time with arbitrary begin is absolutely sufficient -e.g. time t 0 = 0 may simply indicate the system start. However, as soon as time is of global relevance, e.g. if actions have to take place synchronized on different systems, a common time base is often indispensable. This immediately raises the question about which time or which system is used as reference. In any case, its provider should be highly available and exhibit a high clock stability and precision. Several methods exist for the actual synchronization: Some are based on (regular) time checks or on the measurement of the pairwise drift between the involved systems. Others rely on dedicated reference systems and allow the synchronization based on centrally triggered events like e.g. radio broadcasts (cf. GPS and the DCF77 protocol). Finally, distributed methods are available for multihop systems to successively achieve a common time base, e.g. via Desynchronization (Mühlberger, 2013) .
AN ADVANCED TIME DISCRETIZATION APPROACH
Considering the aforementioned problems, which originated from the integration of time-awareness into digital systems, P1-P3 directly affect the environmental interaction and can be addressed by each system individually. In contrast, P4 and P5 require some information exchange with other systems. These "peers", however, are not necessarily available during the entire system runtime. For this reason, P1-P3 are treated locally at the embedded systems level (e.g. in the operating system kernel), while P4 and P5 must be addressed more globally (e.g. in the network layer or at application level). At the embedded systems level, our approach relies on a hardware timer component to provide a local system timeline with a fixed temporal resolution. The timeline management is integrated directly into the OS kernel and accessible for all software layers through the OS API. This unifies the usage by application tasks and avoids execution time imponderabilities through unpredictable code interleaving at runtime. Based on our approach, the kernel automatically captures a timestampt e for each interrupt e, and compensates the error's asymmetry about 1 2 λ C which would result from using the naïve approach with I 1 = [0, λ C ) as explained in Section 2. Therefore, the kernel as a hardware abstraction layer provides 1 2 λ, 1 2 λ). While the timestamp measurement error E t e will still be equally distributed over I 1 , this interval is shifted, and the average timestamp error is reduced from initially 1 2 λ down to 0. At the same time, the propagation and amplification of systematic errors for timedependent reactions will also be kept low and symmetric about 0, i.e. I 4 = I 1 + I 3 = [−λ C , +λ C ). Table  1 compares the error intervals of our compensation approach with the naïve technique.
How can this symmetry be guaranteed? In order to deal with the related problems P1 and P2, we propose a concept based on two synchronized clocks with interdependent frequency. Thereby, we assume the CPU frequency to be higher than the timer frequency, while conversely, the system time is derived from the quartz-stabilized CPU clock by an even integer divider. Commonly, both requests do not impose an unreasonable restriction on the hardware/software design: In fact, they are already satisfied in many systems, since usually only a single central oscillator is used as base for all other system clocks. While the CPU is commonly directly driven by this main clock, other components apply power-of-two dividers to derive their individual frequencies. Finally, and for computationally constrained embedded systems in particular, driving a local time with the maximum resolution would cause unnecessary CPU load 4 .
Besides the following formal description of our approach, we also refer to the example in Figure 1 for a comprehensive understanding. Initially, we denote the CPU clock frequency as f and its period as λ. The system time frequency is denoted as f C and its period as λ C . In addition, we demand for
(6) If an interrupt e occurs at time t e , the corresponding timer counter c e will not be copied before some system inherent delay ∆ TS has passed. For our approach, we request this delay to take exactly ∆ c CPU cycles as follows: 4 The system time must be accumulated in software at every timer overflow. Especially for timers with small word widths, this can quickly lead to a huge performance penalty. For instance, a 16 Bit counter counting at f C = 1 MHz will already overflow after every 65.536 ms. 
Thus, the delayed acquisition of the timestamp takes place at time
(8) To compensate for this delay, and to force the timestamp error interval I 1 to become symmetric around the true event occurrence time while also exhibiting an average error close to 0, we can select the correction value as an integer multiple of λ C :
Thus, we simply have to subtract n from the copied timer value c e to compute the timestampt e for the interrupt e:
Since c e (timer driven) and n (constant) are integers of architecture word width, their subtraction is easily accomplished. Besides, the result's resolution implicitly equals the resolution of the system time. However, we still have to prove the symmetry about 0 for the error intervals in Table 1 : Lemma 1. For our discretization approach the error intervals I 1 , I 2 , I 3 , and I 4 for taking timestamps, for measuring and specifying delays, as well as for computing reaction times are symmetric about 0.
Proof. While I 3 is not affected by our novel approach, the demanded symmetry of I 1 , I 2 , and I 4 can easily be proofed by some interval arithmetic. With Eq. (8), (10) the expected error E t e of the timestampt e computes as
An Implementation Example
As an example, we'll take a look at the integration of our novel approach into the reference implementation of SmartOS (Baunach et al., 2007) for the MSP430F1611 (Texas Instruments Inc., 2006) MCU and the SNoW 5 sensor nodes (→ Figure 1) . While the main clock drives the CPU at f = 8 MHz, the divider α = 8 derives the frequency f C = 1 MHz for the system time with a resolution of 1 µs. According to Eq. (7), an adequate delay ∆ c between each interrupt occurrence and the acquisition of its timestamp can be adjusted through n:
= n · 8 + 4 with n ∈ N 0 (11) Listing 1 shows the standardized kernel ISR for any interrupt with number e: Since the CPU inherently delays the acceptance of an interrupt by ∆ IRQ = 6 CPU cycles, we already have to select n ≥ 1. In fact, we did select n = 1 and thus have to wait for an additional number of ∆ ISR := ∆ c − ∆ IRQ = 6 CPU cycles within the ISR (1 · 8 + 4 = 6 + 6). According to the specification of the mov instruction, which is used for saving the timer value TS in Line 5, it takes 4 CPU cycles until the value is read from the special function register TIMER COUNTER. The remaining two cycles are filled up by nop instructions. After the acquisition of the counter value, the specific IRQ number is saved and the kernel mode is entered for the actual event handling.
To save CPU time the ISR will only save the current 16 Bit timer value which indicates the delay since the last timeline update. The computation of the final absolute timestampt e is initially avoided, and delayed until the application's event handler requests this information from the OS. Then, according to Eq. (9), n is simply subtracted from the event's absolute counter value c e , which in turn is the sum of the timeline and the just captured timer value TS. With Eq. (10) the result can directly be interpreted as absolute system time given in the timeline resolution of 1 µs:
Note that the applied computation is correct, as long as the IRQ handlers are always executed in kernel mode where further interrupts are disabled and neither the timeline nor TS will change concurrently.
EVALUATION AND APPLICATION EXAMPLE
The test bed for demonstrating the benefit of our timestamping approach consists of pairs of nodes A, B playing some sort of Ping Pong game as depicted in Figure 2 : By a wired or wireless remote connection, one node, WLOG B, triggers an IRQ signal e 0 which is received and timestamped (t 0 ) by the other node A through the just presented timestamping approach. After some fixed delay ∆ delay the signal will be returned by node A, and in turn the other node B catches, timestamps, and returns the signal after the same delay ∆ delay . Having received the last trigger e n with local timestampt n in a perfect system, the observed delay∆ total,n between each node's captured first and last signal timestamp should obviously equal the mathematically expected delay ∆ total,n :
However, this equality will commonly not be observable in real systems. In fact each involved device will suffer from its own and the other device's imprecision:
First, the nodes apply independent clocks, drift apart, and, though globally fixed, they will finally not defer their responses by exactly the same delay ∆ delay . Though our nodes' CPUs are driven by quartzes from the same lot, the clock drifts vary depending on the selected node pair and the environmental influences described before. In fact, Figure 3 shows significantly different drifts d A,B (t) for three node pairs measured over some time t. 5 Second, the responses must be scheduled and initiated by the responsible task on each node. Therefore, these tasks compute their next intended local response time t * r from each previously captured signal timestamp, and then sleep to release the CPU for other tasks. However, waking up sufficiently early to emit the signal in time is not that easy since some loaddependent and variable system overhead must always be taken into account.
Third, the base for each delay computation is never perfect since each captured timestampt c exhibits an inherent error Et c ∈ I 1 . While this cannot be 5 The nodes with IDs 10, 11, and 72 were arbitrarily selected from our pool. The drift was measured via an oscilloscope tracking the delay between two periodically triggered I/O pins at both nodes of each pair. As a plausibility test, the measured drift of each pair corresponds perfectly to the other pairs' drifts: 918 avoided entirely as discussed before, the error should at least be about 0 µs in the average case according to Lemma 1.
Signal TX and Self-Calibration
For the precisely timed signal emission, we propose a dynamic self-calibration scheme based on selfobservation. Therefore, the trigger signal will not only be captured by the other node where it is tagged with the timestampt c , but also by the emitting node itself. We denote the corresponding local timestamp as t r . If the intended local response time for the current iteration has been computed as t * r , the lateness can be computed afterwards and used as compensation value ∆ comp to adjust the delay for the next iteration at its emission time t * r :
In fact, the response time precision error (E t * r ∈ I 4 ) depends not only on the two timestamps and their particular precision error (Et r , Et c ∈ I 1 ), but also on the error in the measured delay (E ∆ comp ∈ I 2 ) and the hard coded delay for the reply (E ∆ delay ∈ I 3 ) itself. Since we intentionally selected ∆ delay := m · λ C with m ∈ N, at least this value is free from rounding errors and I 3 := [0; 0) for this special application.
Pairwise Drift Calculation
For our tests we set up various node pairs A and B as depicted in Figure 2 , and we were interested in each nodes' x ∈ {A, B} local timing error e x which was autonomously calculated by each node after n iterations:
Eq. (13) :=∆ total,n − ∆ total,n = (t n −t 0 ) − 2n · ∆ delay (16) Obviously, both timing errors e A , e B have different sign unless the clocks are perfectly synchronous (then e A = e B = 0 µs). Additionally, we define the symmetry error e symm as seen by an external observer as the average value over e A , e B . Since the average timestamp error E t,av ∈ I 1 will accumulate over the two acquired trigger timestamps within each iteration, we expect e symm := e A + e B 2 = 2n · E t,av .
If we indeed achieved the timestamp error interval I 1 to be symmetric about 0, i.e. by selecting ∆ c = n · α+ 1 2 · α properly according to Eq. (7), we can expect two observations for any pair of nodes A, B: 
2. According to Eq. (17), e symm ! = 2n · 0µs = 0µs, and thus both values e A and e B will show the same absolute values. In direct consequence, each node can autonomously estimate its own drift towards the other node autonomously: In particular, the exchange of any additional data, such as timestamps, between the nodes is not necessary to obtain this information (since ∆ delay is constant).
The reason becomes clear when considering the involved error intervals over n iterations in Eq. (24): Obviously, all error intervals remain symmetric about 0 µs throughout the entire test. In particular, the average error for each variable is 0 µs, and consequently e symm = 0 µs, too.
In contrast, if we intentionally violate Eq. (7) by using e.g. ∆ c := n · α instead, the average timestamp error interval would be symmetric around E t = 1 2 λ C . Consequently, e symm = 2n · 1 2 λ C , and neither the autonomous drift computation through Eq. (19) nor the external drift verification through Eq. (18) would work any more. Figure 4 shows the test bed results for the three already mentioned node pairs from Figure 3 , and for various values of ∆ c after n = 50 iterations with ∆ delay = 1 s (∆ total,n = 100 s). Note that the results repeat in a cyclic manner with period α = 8, and thus the values for ∆ c = 10 are similar to those for ∆ c = 18.
Real-World Test Bed Analysis
When using ∆ c = 1 · 8 + 8 2 = 12, we did indeed achieve the expected symmetry error e symm ≈ 0 µs for all pairs. At least we received |e symm | < λ C = 1 µs, which is the timeline resolution and thus the best timestamp precision a node can reach. Furthermore, for ∆ c = 12, d A,B ≈d A,B verifies the measured values from Figure 3 . Most important, as shown in Table 2 , the autonomously measured drifts between two nodes are almost perfect. Indeed, the maximum visible deviation is in range ±2 µs. Another fact which we can verify from this table is, that since WLOG node A knows its driftsd A,B andd A,C towards the two other nodes B and C respectively, it can also reliably derive the driftd B,C viad B,C :=d A,C −d A,B . 
t e andt r are symmetric about 0. While the first is achieved through the unified and carefully prepared preprocessing of interrupts by the kernel, the latter becomes possible through a simple self-calibration scheme at application layer. Both techniques proved to be a great benefit for an inherent problem within distributed but interacting (embedded) systems: As long as time is not properly manageable locally by the individual nodes, network-wide synchronization and event or state tagging will hardly achieve the potentially feasible precision. Using our approach, a corresponding test bed verified, that it is possible to determine the drift between two nodes without the explicit exchange of any quantitative information (like e.g. timestamps or previously measured delays). Instead, it is sufficient to periodically pass events (i.e. interrupts) between the nodes. Since suitable periodic behavior can also be found in several (wireless) communication protocols like (Støa and Balasingham, 2011) , (Ito et al., 2009 ), (Mutazono et al., 2009) , the proposed techniques can also be applied to support time synchronization and self-organization among the involved systems.
In fact, we already observed good time synchronization results when integrating our approach into the Desync protocol from (Mühlberger, 2013 ). Another objective for us is to support our timestamping concept in hardware: Within a hardware/software codesign project, a specifically prepared interrupt controller of an experimental CPU architecture is already able to pre-process and store timer values even for simultaneously occurring events.
