I. INTRODUCTION
Precise timing is essential for embedded systems, and in general for computing systems that interact with the physical world. Timing requirements arise out of the need to read sensors and perform actuation at a rate prescribed by the applications, as well as to schedule computations. The design of such systems requires specialized knowledge, and their importance is testified by both a huge research literature and a number of industrial applications.
A computing system's notion of time ultimately comes from hardware counters and event timers, which are used to query the time, and schedule events. As counters are fed by a clock signal, their accuracy depends on that of the clock source.
Inexpensive quartz crystal oscillators have made it possible to provide even cost-sensitive embedded systems with clocks having an accuracy of a few tens of parts per million. This is enough for the requirements of all but the most demanding isolated systems, but for distributed ones, the scenario is completely different. When multiple devices need to cooperate, clocks differing by even a few parts per million cause their notion of time to progressively diverge, which very easily turns into an unacceptable situation.
Since distributing a single clock to all devices in a network is resource-consuming, thus hardly feasible especially with wireless connections, it is not surprising that clock synchronization is a very relevant and well studied topic, that dates back to the dawn of computing systems [19] , [11] , but where innovations are still being introduced nowadays [21] , [1] .
Clock synchronization is a fundamental topic for distributed computing systems at large, but in the the Wireless Sensor Network Wirless Sensor Networks (WSNs) domain, it takes a particular flavor. WSNs exhibit at the same time extreme energy constraints and potentially very strict application timing requirements-think for example of sound localization [5] , [16] . Such needs, coupled with the possibility/opportunity to tailor the entire communication protocol stack, gave rise to unique solutions which are still extending the state of the art in clock synchronization.
One of these solutions is the VHT [25] or Virtual Highresolution Time, which enabled low-energy high-precision timekeeping. VHT is a powerful technique, where two oscillators are used cooperatively: a low-power, low-frequency one is always active to keep the time, and a high-frequency but higher-power one is turned on only when required, giving the impression of an always running high resolution timebase.
In principle, the VHT technique does not set an upper bound to the frequency ratio of the two oscillators, therefore -again, in principle -it can be used to increase the resolution of a low-power timekeeping device indefinitely. However, when implementing VHT, we discovered that as the ratio between the clocks increases, the jitter of the low-frequency oscillator limits the accuracy of the overall time representation. This led us to develop an improved VHT, that instead of combining the two clocks like the original technique, synchronizes the high-frequency clock to the slow one.
The contributions of this paper can thus be summarized as follows. First, we provide a model for the VHT technique that allows to study the effect of the nonidealities of the two clocks on the time representation. Then, we propose an improved VHT solution that mitigates the effect of the lowfrequency clock jitter, and at the same time is less demanding than the original one in terms of hardware resources. The proposed solution has been implemented on the WandStem WSN node [6] , and experimental results are reported to show the achieved accuracy improvement.
II. RELATED WORK
The literature on WSN clock synchronization was historically divided in two broad categories. The first one comprehends works that address clock synchronization with a specific emphasis on low-energy nodes, that have to operate for up to years without replacing batteries. The second one is made of works that aim at pushing the boundaries of WSN synchronization, without considering power consumption.
Typical of the first category is the use of a low-frequency timebase, usually with an inexpensive 32 kHz crystal and a consumption in the order of µA, depending on the microcontroller aboard the node. Works in this category include for example DMTS [15] , a simple synchronization scheme that periodically overwrites the local clock of each node with a received timestamp, Tiny-Sync [2] , that performs skew compensation by trying to constrain the possible skew values using a set of inequalities, FBS [18] , that compensates skew using PI control, UTSR [27] , that uses broadcast time stations, and hardware assisted clock synchronization [28] , that instead relies on powerline frequency.
Although in this category solutions have been proposed to synchronize within one clock tick, like FLOPSYNC-QACS [7] , the low resolution of the timebase inevitably results in a synchronization lower bound, that with the typically used 32 kHz crystals is of approximately 30.5µs.
Works in the second category, that conversely address high-resolution clock synchronization, correspondingly rely on hardware timers clocked by a high-frequency crystal oscillator, usually at a few MHz and above. This category includes RBS [13] , that synchronizes nodes by exchanging local timestamps of a reference packet and compensates skew with linear regression, TPSN [20] , that creates a spanning tree of the WSN and then performs pairwise synchronization, FTSP [5] , that introduces flooding as a way to disseminate time, and PulseSync [23] , that improves upon FTSP as for the packet dissemination. The most recent proposals in this category, such as TATS [10] , also compensate for propagation delays to further improve synchronisztion.
The use of a high-frequency timebase allows these works to achieve microsecond-level synchronization, and in some cases to reach the sub-µs range. However, such a timebase severely affects the power consumption of the entire node. Also, in most microcontroller architectures, leaving the high-frequency oscillator active prevents entering the deep sleep state, which further increases power consumption.
The introduction of VHT [25] changed the entire scenario, allowing for the first time high-resolution and low-power synchronization. Works that adopted a VHT-like timebase, like FLOPSYNC-2 [14] , achieved both sub-µs clock synchronization and sub-µA consumption overhead, and extensions such as Reverse Flooding [12] also added propagation delay compensation. The VHT technique has also been an enabler for protocols requiring tight time synchronization to achieve constructive interference [17] , and for applications such as high-performance sound-based localization [16] .
For completeness, the power consumption of clocks has been addressed by other works, such as the Tunable Tick Resolution approach [3] , that aims at reducing the interrupt rate for timekeeping in operating systems (OSes) needing a periodic tick also in deep sleep. Fully hardware synchronization solutions have been proposed as well, see e.g. [24] , although the quoted work focuses more on synchronizing the radio wakeup for efficient protocols than at exposing high resolution time to the OS and the applications.
Summarizing after this brief review, VHT is definitely a step forward in low-power timekeeping and a promising technique for a number of applications. In its original form, it has however a few weaknesses as for accuracy and resource demand, that in the following we evidence, characterize, and address.
III. PROBLEM STATEMENT AND MODEL
In this section we describe the abstract interface of a generic timekeeping device, show in detail the principle of operation of the original VHT algorithm, and point out the effect of the nonidealities of its timebases.
A. The abstract timekeeping device
A timekeeping device is a hardware component, coupled with some support software, that is used by an OS to expose timing information to the applications, and to schedule computations. This component should expose to the OS at least two fundamental operations, that we name here get time and set event, plus optionally two additional ones -hereinafter get hw event timestamp and set hw event -to facilitate clock synchronization.
• get time is an operation that returns the current time.
It is the fundamental primitive to allow the OS and the applications to be aware of that time, whatever the purpose of this awareness is.
• set event is an operation that causes an event, usually an interrupt service routine, to be called at a given time in the future. The time when the event should occur can be either specified as an absolute time point in the timeline of get time, or as a relative time interval starting from the call to set event itself. Its use is primarily to allow the OS scheduler to schedule tasks.
• get hw event timestamp is an optional operation that allows to know the exact time when some hardwaregenerated event external to the CPU has occurred, for example when a packet has been received through a communication interface. Some hardware implementations provide support for timestamping also applicationrelated events, such as sensor readings, when high timing accuracy is required by applications [16] , [6] .
• set hw event is an optional operation that allows to generate an event external to the CPU, usually a rising edge on some output port, at a given absolute time point in the future. When connected to a communication transceiver, it allows to precisely control the time when a packet is sent without software-induced latencies. Also in this case, additional output channels can be dedicated to applications [6] . Although the last two primitives are optional, they are becoming common in WSN nodes as their availability allows to increase the accuracy of clock synchronisation [14] , [12] , [10] , [4] especially in multi-hop networks, as they eliminate latencies caused by the software, that accumulate as the number of hops increase. Additionally, also modern clock synchronization protocols over wired network such as PTP [21] do require hardware packet timestamping.
B. A brief review of VHT
Energy optimization in WSNs is performed by keeping nodes in a low power sleep state as long as possible, with values of 99% and above being practically achievable [26] . In this state, the CPU and the radio transceiver are not operational. Only the timekeeper is active, as this is needed to maintain the time, and schedule wakeups. The timekeeper resolution poses a lower bound to the notion of time a node can have, as clock synchronization cannot exceed the hardware tick resolution. However, the timekeeper consumption is a function of its clock frequency, which results in a tradeoff between timing resolution and energy efficiency.
VHT solves this by relying on two hardware clocks, a highfrequency one, with frequency f h , that can be turned off during deep sleep periods, and a low-frequency one, with frequency f l . We define the ratio between the two frequencies as
Most microcontrollers already support such a setting naturally, as the fast clock is the one for the CPU, and the slow one is a 32768 Hz clock for the RTC.
To fully describe the VHT, it is necessary to briefly review how the typical hardware timer in a microcontroller works. Such a device has a hardware counter, that is incremented at the rate of the timer's clock frequency, and one or more hardware units that can be software configured either in capture mode, where they store in a register a snapshot of the counter at the occurrence of an external event, or in compare mode, where the content of a register is compared against the timer counter, and an event is generated when they are equal. Both in capture and compare mode, it is possible to configure an interrupt to be raised when the event occurs, to allow the software to handle that event.
The VHT paper shows how to implement in software the get hw event timestamp operation in Section 4.1 of [25] , and this implementation is what we call "original VHT" in this paper. This implementation uses two hardware timers, clocked respectively by the fast and slow clock. The event source is connected to a capture channel of both timers, to take a timestamp h 1 of the high-frequency clock and a timestamp l 0 of the low-frequency one every time an event occurs. Either of the two is configured to generate an interrupt, in order to inform the software about the event. Additionally, a second capture channel of the high-frequency timer is connected to the low-frequency clock signal, taking a timestamp h 0 of every rising edge of the low-frequency clock. This channel is not configured to generate interrupts.
When an event occurs, as shown in Figure 1 , the interrupt routine reads h 0 , h 1 l 0 and computes the event timestamp as
The modulo operation is necessary because if a rising edge of the slow clock occurs after an event, but before the software routine reads h 0 , then h 0 and h 1 will be reversed in the timeline. As can be noticed, the high-frequency clock is only used to measure the time interval between the event and the last edge of the slow clock, identified by φ in the figure. This means that it can be turned off when not needed, without losing the notion of time.
C. Issues with the VHT algorithm
The VHT technique allows to achieve the resolution of the high-frequency timer without the cost of having it always clocked, but does it allow to achieve also its accuracy? To answer this question, it is necessary to introduce clock jitter [8] , a nonideality of crystal oscillators that causes their period to change by a small and random amount at each clock cycle. The point here is that the jitter of the low-frequency clock is small compared to its period, but there is no guarantee that it will be small also with respect to the period of the fast clock. This introduces a limit to the ratio of the two clocks φ 0 , beyond which the VHT accuracy will not increase. In detail, jitter causes an additive disturbance to the edges of the slow clock, that affects the timestamp h 0 . Given the nature of (2), an additive disturbance in h 0 will propagate through the difference and the modulo, and will finally introduce uncertainty in the event timestamp, degrading accuracy.
To quantify this effect, we performed three experiments. First, to see the jitter propagation in an ideal setting, we performed a Monte Carlo simulation where 100000 uniformly distributed events over a 100s horizon are timestamped using the original VHT with a 60ns low-frequency clock jitter. The simulation considered a 48MHz high-frequency clock, and a 32768Hz low-frequency one. Then, to show that the issue exists in real-world hardware, and is not specific to a single particular platform, we implemented the VHT algorithm in two quite different microcontroller boards. The first one is an offthe-shelf stm32f4discovery board. This has a high-frequency clock at 84MHz but lacks a 32kHz crystal, hence we soldered to it an ecliptek EC26 32768Hz one. The second board is a WandStem WSN node, that has a 48MHz high-frequency clock and uses a model CPFB 32kHz crystal by Cardinal Components.
It should be noted that the hardware RTC of both boards lacks an input capture feature, and thus to implement VHT we had to route the output of the 32kHz oscillator to a different timer than the RTC. As only the RTC can function when the microcontroller is in deep sleep, this setup prevents the microcontroller from entering deep sleep, making VHT practically useless but serving the experiment's scope of seeing how the jitter propagates through the algorithm. Figure 2 shows the result of the experiments. The left column shows the jitter of the low-frequency oscillator as timestamped by the high-frequency one. The right column shows the jitter of timestamps taken using the original VHT algorithm. In all three implementations we observed some timestamps with an error of 30.5µs, but these were caused by a race condition that will be described later on in this section, so we removed them to focus only on jitter.
The top row shows the results for the Monte Carlo simulation. In this case the jitter of the VHT timestamping is 60.4ns, a value remarkably close to the 60ns jitter of the low-frequency clock. Thus, in an ideal setting where the only disturbance is jitter and quantization noise, the standard deviation of VHT equals the jitter of the slow clock. The middle row shows the results for the stm32f4discovery. In this case the jitter of the low-frequency clock had a standard deviation of 19ns, and the maximum error was ± 7.5 ticks of the fast clock. In this board the fast clock is obtained through a PLL, thus the highfrequency clock is not fully stable, and exhibits a nonideality called wander [22] (a slow random drift in the timescale of seconds). After removing the wander by means of a high pass filter, the jitter of the VHT algorithm is as shown in the middle right plot of Figure 2 , and the standard deviation is 27ns. With the WandStem board, the jitter of the low-frequency clock had a standard deviation of 63ns, and the maximum error was ± 17.5 ticks of the fast clock. The VHT jitter was instead 116ns; the probability distribution spreads across more values due to temperature changes during the experiment, introducing timevarying skew.
As can be seen, in all three experiments the jitter of VHT is never less than the jitter of the underlying slow clock, showing how the VHT timestamping algorithm does not attenuate the low-frequency clock jitter. Moreover, in both hardware implementations, the jitter of the slow clock is several ticks of the fast one, supporting the statement that slow clock jitter is a limiting factor to accuracy. Figure 3 shows instead the raw timestamping error using the original VHT algorithm, in the three experiments above. Note the peaks of 30.5µs, which correspond to one tick of the low-frequency clock. These errors are caused by a race condition in the original VHT algorithm.
To explain this condition, first consider the top timing diagram of Figure 4 . In this example the ratio φ 0 between the fast and slow clock is 10. Now consider that the phase between the two clocks is such that there is a small time where the fast clock has incremented φ 0 times since the last rising edge of the slow clock. If an event occurs in that time instant, the input capture units will log the following timestamps: h 1 =11, l 0 =1. However, since the software interrupt has a latency, the h 0 timestamp will be overwritten as soon as the second edge of the slow clock occurs, so instead of h 0 =1 we will have h 0 =11. Applying Equation (2) the wrong timestamp 10 is computed instead of the correct result which is 20. The bottom part of the figure shows that the race condition occurs even when considering zero interrupt latency, as in this case we will have h 0 =1 and h 1 =11, but 10 mod 10 is 0.
IV. JITTER-COMPENSATED VHT
As shown in the previous section, the VHT technique exhibits three issues. The first one is the degradation of the time accuracy as the ratio between the clocks increases, due to the jitter of the low-frequency clock. The second one is the race condition just shown. The third one, perhaps less obvious and on which we shall return later on, is the need for two input capture units for each event to be timestamped, which makes its implementation heavy on hardware resources. The issues above are exacerbated if one wants to use the VHT as an OS timer, because doing so requires to implement the entire timekeeper interface, not just the get hw event timestamp operation-a matter out of the scope of the original VHT paper, however.
In this section, we overcome these issues by introducing an improved VHT implementation, that we call jittercompensated VHT. This solutions casts the VHT problem in the clock synchronization framework, and works by synchronizing the fast clock to the slow one, by means of the FLOPSYNC-2 technique [14] , every time the node gets out of deep sleep. To explain the operation of our jittercompensated VHT, we proceed in two steps. First we show a naïve approach, that only corrects for offset, as a preliminary step to introducing the complete scheme, that also performs skew correction.
A. The naïve jitter-compensated VHT
The simplest solution to solve the scalability issues mentioned above consists in using the low-frequency clock to resynchronize the high-frequency one only upon exiting deep sleep, and then perform time-related operations using only the fast clock. Since all operations are performed using only one timer, this solution is also inherently immune from the race condition described in Section III-C. To do so, one can measure the offset every time the node goes out of deep sleep, by taking a timestamp h 0 of the fast clock corresponding to an edge l 0 of the slow clock, as
This offset computation can be repeated multiple times, for subsequent edges of the slow clock, averaging the results to reduce the effect of jitter that affects h 0 . Such an operation can be considered acceptable because the wakeup is an infrequent event, thus the expense in terms of timestamp measurements and average computation, which on a typical implementation can be assumed to be less than 500µs, does not impact the node performance to a relevant extent.
From then on, timestamps can be taken using only a capture channel of the high-speed timer, and converted to the VHT timeline by simply adding the offset, thus implementing the get hw event timestamp operation. The get time operation can likewise be implemented by reading the high-frequency timer counter and adding the offset, while the set event and set hw event can be implemented by using a single compare channel of the fast clock timer, which is set to the desired time minus the offset.
This solution clearly solves the hardware resource issue, as all timekeeping operations are implementable with at most one capture/compare channel of the fast clock per operation, plus an additional one to compute the offset at every wakeup from deep sleep.
This solution also solves the clock jitter issue. As timestamps are taken using only the high-frequency clock, the jitter of the low-frequency clock no longer affects each time measurement. For what concerns interval measurements, those obtained by subtracting two timestamps taken while the node is active are entirely taken by the high-frequency timer, and are thus insensitive to the low-frequency clock jitter. If on the contrary the node went to deep sleep in between two timestamps, the jitter effect is significantly mitigated by the averaging in the offset computation.
However, the naïve jitter-compensated VHT described so far would not work in practice, due to another nonideality of crystal oscillators, that is, clock skew. In fact, both the high-frequency and the low-frequency crystal oscillators are affected by skew, and the skews of the two can differ significantly from one another, both in magnitude and as for their dependence on temperature.
Setting without loss of generality the time origin of the lowfrequency clock to when the node was first turned on, and denoting by t w the generic wakeup time, the time reported by the naïve jitter-compensated VHT will be C naive (t) = φ 0
where δ f l (t) and δ f h (t) are terms representing the frequency error of the two clocks due to their nonidealities. The first integral is the skew accumulation of the slow clock till the node wakeup, which coincides with the offset computation, while the second integral is the skew accumulation of the fast clock from the wakeup instant onwards. The floor operators account for the tick quantization. Figure 5 shows a graphical representation of the issue.
Synchronization schemes that can correct for skew using the time information received via the radio are well known, but these schemes are meant to compensate a single clock skew, not for a blend of two skews that depends on how long the node has been out of deep sleep.
To overcome this issue, we propose a hierarchical solution that introduces a local clock skew correction to make the skew of the fast clock equal to that of the slow clock, t s t − t
Deep sleep t w
We need to improve the resolution of the slow clock ...but once the fast clock is on, its skew is what we get. Fig. 5 . Clock skew issue in the naïve jitter-compensated VHT thereby allowing for the abstraction of an always running, jitter-free, low-power and high-resolution clock with a single skew value. Once this abstraction is built, an ordinary clock synchronization scheme can compensate the skew of the jittercompensated VHT, needing no awareness that the clock to synchronize is built with two different crystals.
B. Skew correction in the jitter-compensated VHT
Performing skew correction means synchronizing the fast clock to the slow one. There are various phenomena that make the two clocks diverge, like crystal imperfections, aging, jitter, and temperature variations. These phenomena collectively manifest themselves in the synchronization error, but cannot be measured individually. In problems like this, feedback control is a natural solution, as it permits to contrast error-generating phenomena (in control terms, disturbances) based only on the measurement of the error.
We thus address the problem with the FLOPSYNC-2 synchronization framework [14] , which is based on a feedback controller. However, the frequency content of the disturbances encountered in this work is not the same as in [14] , and although the feedback control structure can be re-used, the computation of the control signal needs modifying. In this section we illustrate the control problem, review the FLOPSYNC-2 solution, and describe the synthesis of the new control law. For the physical reasons sketched above and detailed in [14] , the frequency f h of the fast clock varies over time with respect to its nominal value φ 0 f l0 , where f l0 is the nominal frequency of the slow clock. Since we want the fast clock to synchronize to the slow one -or equivalently, we want the fast clock to be the slave and the slow the master -we take the time counted by the latter as the reference. Denoting the fast and slow clock times respectively with t h and t l , and indicating with e hl the synchronization error, we can write
As in FLOPSYNC-2, we assume synchronizations to take place at a fixed period, T hl to name it, dictated by the master clock. The error is measured -see Figure 6 -as the expected minus the actual end of the synchronization period counted in the slave clock. This gives the controlled system exactly the same structure as in [14] , with the inter-node frequency fluctuation replaced by the intra-node one.
The next expected end of the synchronization period is computed as the previous one plus T hl plus a correction u hl (k), that takes the role of the control signal. Denoting by e hl (k) the error at the k-th intra-node synchronization event, computations omitted for brevity lead to write
where
is the disturbance provided by the fast/slow relative skew cumulated over one synchronization period. Should this disturbance -hence the skew -be constant, non control-centric synchronization schemes (such as those based on regression) could fit. However the skew is not constant at all, and depends on phenomena (aging, thermal stress, jitter) with different time scales. This motivates the choice of a feedback control law designed specifically to counteract the frequency components of that disturbance that are most relevant for the synchronization need at hand.
In [14] , FLOPSYNC-2 was used to synchronize clocks aboard different nodes, and the most relevant source of disturbance was thermal stress. In the case considered herein the clocks are on the same node, and given the much faster time scale, the main concern is high-frequency jitter. We now proceed to the design of a control law with the specific aim of rejecting high-frequency disturbance components.
We carry out the design in the continuous time domain, mainly because doing so allows to operate with frequency values directly, and independently of the adopted synchronization period. Putting model (7) in transfer function form and transforming to continuous time, we get
where s is the Laplace transform complex variable. We use an integral controller augmented with a zero-pole pair; this guarantees a 40 dB/decade high-frequency disturbance-tooutput roll-off, and allows to prescribe the cosed-loop stability degree. In detail, writing the controller as
the Bode magnitude plot of the frequency response of the loop transfer function L( jω) = C( jω)P( jω) takes the form of Figure 7 . The cutoff frequency ω c dictates the closed-loop response speed, while parameter α governs the stability degree, because the phase margin is To determine the parameters in controller (10), we set T hl to the recommended value (see Section IV-C) of 200 ms and carried out a design exploration in the (ω c , α, β ) space, operating in simulation with the process model (9) . The goal was to obtain a fast error convergence to zero, while maintaining a good high-frequency disturbance rejection, and an adequate stability degree. A sample of the said exploration, illustrating the error convergence speed, is shown in Figure 8 . After the exploration we chose ω c = 5/4 r/s, α = 25/4 and β = 16, which correspond in Figure 8 to the red thick response. These values yield a phase margin of about 77 • ; a faster settling would result in a lower stability degree, and most important, in a less effective jitter rejection. Applying the backward Euler method, we finally got the discrete-time controller
1 // State variables 2 // uoo=u(k-2), uo=u(k-1), eo=e(k-1) 3 static int uoo, uo, eo; 4 static long long expectedTimestamp; This controller settles to compensate the skew within 0.1% in about 14 s, see again the red thick response in Figure 8 . This is therefore the time it takes when the node is first turned on to correct for 99.9% of the initial (unknown) skew, which we found to be an adequate performance. Listing 1 reports the implementation of the proposed controller in C++, evidencing its simplicity.
C. Synchronization period selection
In the case of the inter-node synchronization it is important to keep the synchronization period as long as possible, in order to minimize the radio usage. In the case of intra-node synchronization it is instead beneficial to keep the period as short as possible, because the high-frequency clock is not running while the node is in deep sleep, and thus intra-node synchronization is only possible while the node is active.
The lower limit of the synchronization period is due to the quantization induced by the fast clock resolution q e = 10 6 f h T hl (13) where q e is the quantization error in parts per million. Inverting this formula, to synchronize a 48MHz fast clock to within 0.1 ppm, the minimum synchronization period is around 200ms, which can be reduced to 20ms if 1ppm accuracy is enough.
D. The complete jitter-compensated VHT
To summarize, the complete jitter-compensated VHT works as follows. When the node is powered up, both oscillators are turned on, and the offset between the two clocks is computed as in Section IV-A. After that, the skew correction starts as per Section IV-B. The node is forced to stay active long enough for the skew correction controller to settle, after which it is allowed to go in deep sleep. Note that during this time the only CPU load caused by our VHT implementation is one interrupt for each synchronization period T hl , so the CPU is essentially idle and this time can be proficiently used for other tasks, such as MAC neighbor discovery, and to wait for the first synchronization packet as required by inter-node synchronization schemes.
Every time the node gets out of deep sleep, the offset correction is performed again. Additionally, the last skew correction is preserved when entering deep sleep, so as to apply it upon wakeup. This eliminates the need to wait for the controller to settle every time a wakeup from deep sleep occurs.
As long as the node remains active, the skew correction controller is executed at period T hl . Every time the node wakes up to receive the synchronization packet and to perform the inter-node synchronization, it is forced to stay active for at least T hl , and thus perform one inter-node skew correction update. This is required to ensure a minimum rate of skew correction updates even with applications that always wake up for smaller periods than T hl . Note that in our implementation we decided to tie the minimum rate of the intra-node skew correction to the inter-node synchronization period mainly for convenience, but other options are possible.
The solution has an overhead in terms of hardware resources of two capture/compare units: one output compare unit of the slow clock connected to an input capture unit of the fast clock. These two hardware resources are set up to generate an event every rising edge of the slow clock during the averaging of the offset when waking up from deep sleep, and then reconfigured to generate one event every intra-node synchronization period T hl , as long as the node is active, to perform skew correction. All the operations of the timekeeper interface can then be implemented using just one capture/compare unit of the fast clock, except for get time which requires none.
V. EVALUATION
The proposed solution was evaluated in terms of lower hardware resource usage, jitter attenuation capability and power consumption.
A. Hardware requirements
The jitter-compensated VHT was implemented on the WandStem WSN node [6] using the Miosix operating system [9] . The entire timekeeper interface has been implemented, with the OS relying on the VHT get time and set event operations for task scheduling, and three hardware event sources: 1) the radio transceiver packet reception event, linked to an get hw event timestamp operation; 2) the transceiver "send packet" line, controlled by a set hw event operation; 3) an event line dedicated to applications, which can be configured as either a timestamp input through a get hw event timestamp, or an event output via set hw event. Table I shows the hardware requirements in terms of capture/compare channels that were necessary to implement the timekeeper using the jitter-compensated VHT, compared to the requirements that would have been necessary to implement the original VHT algorithm. The first line refers to the requirements for the VHT to function not tied to specific operations, i.e., to the resources needed just to perform offset and skew correction in the jitter-compensated VHT, and to timestamp each rising edge of the slow clock in the original VHT. As can be seen, the jitter-compensated VHT requires 6 capture/compare channels, while the original VHT would have required 9 in the WandStem use case.
B. Jitter attenuation
Using the WandStem implementation, we measured the jitter of the original and jitter-compensated VHT, and compared it to measurements of the jitter of the fast and slow clock. Jitter measurements were performed by repeatedly measuring the length of an interval timed by the clock under test, and computing the standard deviation, except for the original VHT case in which the more precise clock is used to generate events timestamped by the VHT algorithm. The experiment was repeated for different time intervals to appreciate jitter integration over different time spans.
As measuring intervals of a clock requires another clock of greater precision, a FE5680 rubidium oscillator was used. The measurement resolution of our setup is 11.9 ns. Figure 9 shows the results. As can be seen, the 32768 Hz oscillator has a high jitter standard deviation, between 60 and 80 ns. The measure of the jitter of the 48 MHz turned out to be difficult due to being less than the measurement resolution. In these condition, the introduced quantization does not allow to make conclusions on the exact value, other than it is less than 11.9 ns.
The jitter-compensated VHT jitter is comparable to that of the 48 MHz oscillator for intervals up to 100ms, after which it starts increasing. This is due to the skew compensation controller, whose filtering action on jitter gets progressively less effective for longer periods. Although this is unavoidable, and the performance of the jitter-compensated VHT will asymptotically -and obviously -reach that of the slow clock for longer periods, it can be argued that the effect of jitter is most relevant for small time interval measurements, where it can induce a high relative error. For larger intervals, jitter is less of concern compared to other error sources, such as thermal-induced skew variations over time.
The jitter of the original VHT is always equal or higher than the one of the 32768 Hz oscillator. Note that to achieve this results the timestamps affected by the original VHT race condition have been removed, otherwise its standard deviation would have been even higher. 
C. Power consumption
To assess the current consumption of the jitter-compensated VHT we performed a test where a WandStem node performs inter-node clock synchronization using FLOPSYNC-2 and propagation delay compensation, using jitter-compensated VHT as the underlying timebase. Figure 10 shows a trace of the node current consumption. Upon wakeup from deep sleep, the node first resynchronizes the fast clock to the slow one as shown in Section IV-A. Then it waits for the (inter-node) synchronization packet. The two peaks above 40mA are the Glossy [4] rebroadcast of the synchronization packet, and the packet sent to estimate propagation delays. The node then waits for the propagation delay reply and enters the sleep state with the high frequency clock active until 200ms have passed, to estimate the skew as explained in Section IV-B. Finally, the node can go to deep sleep till the next sync period.
Notice that despite the requirement for the jittercompensated VHT to spend 200ms to measure the relative skew, the node can still spend 98.0% (10s sync period) and 99.7% (60s sync period) in the deep sleep state.
To estimate the power saving of the jitter-compensated VHT, the experiment of Figure 10 has been repeated with no VHT, thus with only the high frequency clock and no deep sleep. To compare the jitter-compensated VHT with the original one, we estimated its consumption by running jitter-compensated VHT with intra-node skew compensation disabled, which is what sets it apart in terms of power consumption. We could not use the original VHT implementation of Section III-C because it is incapable of going in deep sleep, but this does not affect the purpose of the test. Results are summarized in Table II This paper focused on one of the key methodologies that allowed joint low power and high resolution timekeeping in WSNs, namely Virtual High-resolution Time. Its operation was studied, identifying three issues as jitter susceptibility, high hardware requirements, and a race condition.
A new solution was proposed, which fully casts the VHT in the framework of clock synchronization. The proposed solution was implemented on a real WSN node, assessing the expected performance improvement. At present the only relevant drawback of the proposed solution is an increased computation overhead, still compatible with the applications, but worth further research effort.
It is expected that the research here presented will foster a more widespread adoption of VHT timebases, by lowering the barrier required for their implementation, and making them more palatable thanks to the improved accuracy.
Furher work will also address the use of the proposed technique to tackle "blended skews" also in the context of inter-node clock synchronization problem.
