An Accurate Time-Management Unit for Real-Time Processors by Kailas, Krishnan K. & Agrawala, Ashok K.
An Accurate Time-management Unit for Real-timeProcessorsKrishnan K. KailasDepartment of Electrical Engineering Ashok K. AgrawalaInstitute for Advanced Computer StudiesDepartment of Computer ScienceUniversity of MarylandCollege Park, MD 20742, USAfkrish, agrawalag@cs.umd.eduTechnical ReportAbstractTime management is an important aspect of real-time computation. Traditional highperformance processors provide little or no support for management of time. In this report,we propose a time-management unit which can greatly help improve the performanceof a real-time system. The proposed unit can be added to any processor architecturewithout aecting its performance. We also explain how the unit helps to solve the clocksynchronization problems in a distributed real-time network.
This work is supported in part by ONR and ARPA under contract N66001-95-C-8619 to the ComputerScience Departmentat the University of Maryland. The views, opinions, and/or ndings contained in this reportare those of the authors and should not be interpreted as representing the ocial policies, either expressed orimplied, of the Advanced Research Projects Agency, ONR or the U.S. Government.1
An Accurate Time-management Unit for Real-time Processors 21 IntroductionAccurate time management functions are required for scheduling real-time tasks to meet theirdeadlines. Time-based scheduling techniques [15] make use of the worst-case execution timeestimates of the tasks to generate deterministic schedules for hard real-time systems. With theadvent of fast processors which can execute millions of instructions per second, considerableamount of computations can be done in a very short period of time. This in turn, demandsaccurate timing mechanisms for scheduling real-time tasks to achieve better processor utiliza-tion. A fast and accurate time keeping mechanism, such as a system clock with ne granularityis essential to implement such precise time-based scheduling algorithms. Fast internal clockof the modern processors can be made use of to implement a system clock with ne granular-ity. However, the support provided by the traditional high performance processor architecturesare not often useful to implement these ideas even though the hardware may support a fastsystem clock. Some of the embedded microprocessors such as Intel 80x196, Intel386 EX, andcommercial high performance processors such as Pentium and Pentium Pro [7] provide on-chiptimers driven by the processor clock to implement system clocks with better time granularity.Computers that do not use such processors usually use an external hardware timer [6], [3] onthe processor bus. But, these timers have been known to \lose time" during operation [19]. Forexample, in order to set a new value for the timer, usually a register has to be updated. Butcertain operations such as DMA and high priority interrupts can preempt time managementfunctions. This can cause a delay in updating the timer register and make the system clock tolose time.By moving all the time management functions into the processor, one can alleviate theabove mentioned problems and provide a more exible solution for the management of time.We believe that the time management functions should be made an essential feature of theprocessors used in real-time systems. In this report, we propose a hardware architecture of anaccurate time management unit for such processors. The following are the basic functionalityrequired for such an accurate time management unit. A mechanism to implement a monotonic clock. The monotonicity of clock is very im-portant because distributed applications assumes that time-stamps produced by a clockalways increases monotonically [11]. An automatic mechanism to register the time of occurrence of user-dened events. Thisproperty is essential to accurately time-stamp events in a real-time system. A deterministic and atomic mechanism to read and update the system time. A hardware register to hold system time to the required resolution and a mechanism toincrement the system time without any software intervention. A mechanism to compensate for the drift (internal and external) to maintain a consistentglobal time.A fast and accurate time keeping mechanism can also be of help in implementing robustreal-time distributed computing systems. Solutions to several design problems in distributedcomputing can be simplied if a global time base is available in the system [12]. Maintaining aconsistent global notion of time in a distributed computing environment involves synchronizingall the local clocks. Clock synchronization problem has been studied extensively in the past andseveral solutions have been proposed [9], [16], [5], [13], [14], [1]. But, most of these solutionseither depend on special purpose hardware or complicated protocols. This will add extraprocessing overhead and complexity to the system, there by increasing the clock skews in a









system clockFigure 1: A node of a typical distributed real-time systemA typical node of a distributed real-time system makes use of an external timer chip asshown in gure-1 for time management. The most common timing technique is based on asystem clock (software counter) in the memory updated by the timer-tick interrupts generatedby a xed interval timer [17, 2]. The main disadvantage of this method is that the coarsegranularity of the system clock, the software timing mechanism, is limited to the order ofmilliseconds. This limitation arises because of the overheads associated with the timer-tickinterrupt processing, which involves saving and restoring the processor registers, updating the
An Accurate Time-management Unit for Real-time Processors 4





delay in updating system clock
start of interrupt processing
interruptFigure 2: Timing diagram of external timer-based time managementsystem clock and checking the ready task list. Another disadvantage with this approach isthat there is a possibility of missing the timer interrupts if the interrupts were disabled orthe interrupt processing gets delayed due to operations such as DMA. The main factors thatconstitute the delay in updating the system clock are the following:1. jitter in the timer interrupt signal due to frequency drifts,2. the interrupt latency of the processor, and3. the execution time of the interrupt service routine.The gure-2 shows the timer interrupt signal and these delays in a typical external timer-basedsystem. Moreover, on modern high performance processor architectures, the interrupt latencyand the execution time of the interrupt service routine itself is often hard to predict[8].An alternative to this approach is to use a programmable interval timer, which is commonlyused for scheduling tasks in time-based real-time operating systems [15]. In this approach, aninterval timer is loaded at each task scheduling instance with a new interval equal to the task'sexecution time-slot duration. Timer generates an interrupt at the end of the interval to invokethe scheduler again to schedule the next task from the ready list. The main advantage of thismethod over the timer-tick approach is the better time granularity of the scheduling clock,because the granularity depends only upon the frequency of the clock signal used to drive thetimer and width of the timer register. However, this scheme also has similar disadvantage oflosing the time between the initiation of the timer interrupt service routine and reloading oftimer register with new interval, due to DMA operations or higher priority interrupt processing.A possible solution to this problem is to modify the timer to allow the new interval value tobe added to the current count-down register contents [19]. Another solution is to use a secondregister to automatically load the next interval value to the timer register, as in the VAX-11computer systems [18]. In addition to the errors that can occur in the time keeping mechanismsas mentioned above, the basic source used to drive the timer itself can generate errors due todrift. Even though the add-timer and second register method apparently solve the problemof updating the system clock for scheduling purposes, these solutions do not address problemssuch as providing the current time to the applications at any instant for time-stamping purposesand compensating for the drifts.It is clear from the above discussion that the existing time management approaches arenot satisfactory in performance to provide the ne time granularity, accuracy and exibilitydemanded by the real-time computing systems of today. The problem with the existing so-lutions is that they address and solve the problems separately, resulting in solutions that arenot comprehensive. Hence these solutions, as discussed above, often fail to provide guaranteederror-free performance always. We strongly believe that in order to provide a comprehensivesolution to meet all the above requirements, the solution should be based on hardware, andmust be implemented within the processor. Moving the time management functions to the
An Accurate Time-management Unit for Real-time Processors 5processor level has several advantages which can not be achieved otherwise. For example, withthe traditional timer-tick approach, the resolution of time measurement is limited to the coarsegranularity of the timer-ticks. The granularity of the proposed hardware time managementunit, as explained later, is much higher than that can be achieved with timer-tick approach.The time management unit also set free the processor from interacting with external hardwareto implement time management functions and there by providing more bus bandwidth andcomputing resources to other tasks. Most of the time management functions can be made inde-pendent of other CPU activities, and thus they can be carried out in parallel. This can providesubstantial improvements in the performance of time management functions and can be usedof to implement accurate clock synchronization algorithms and other distributed applications.3 Time Management Unit ArchitectureThe proposed time-management unit works in parallel with the processor, without using any ofthe computational resources of the CPU. This allows the time-management unit to provide veryaccurate time management functionality without aecting the performance of the processor.The architecture of the time management unit is shown in gure-3. It consists of a set ofregisters accessible to the CPU, a Limit register and associated logic. A drift-free clock signalis derived from the clock source using a Rate Adjustment Unit. The clock source can be theinternal clock used in the processor itself or a stable external clock source such as a crystaloscillator. The Rate Adjustment Unit consists of a frequency divider (counter) for scaling downthe input clock source frequency and a phase adjustment counter to apply corrections for smallchanges in frequency. The Rate Adjustment Unit makes necessary corrections to nullify anychanges in the frequency of the clock source due to drift (see the next section for a detaileddescription about the functioning of this unit).The system time is maintained in the Physical Time register, which is a 64-bit counterincremented at a rate specied by the output clock frequency of the Rate Adjustment Unit.The granularity of time maintained by the system is therefore dened by the output clockfrequency of the Rate Adjustment Unit. For example, a 64-bit register can represent a timespan of millions of years and provide a time granularity of the order of 1/10th of micro secondswith a 10 MHz clock derived using the Rate Adjustment Unit. The Physical Time register canbe accessed by system software as a CPU register to read or modify it's contents. The systemtime TS is compared with a Limit register TL at each processor clock cycle and an interruptis generated when the condition TS  TL is satised. This interrupt signal can be used forreal-time task scheduling and precisely initiating time-based events. The interrupt signal willbe reset only when the Limit register is modied.The user accessible registers of the proposed time management unit is shown in gure- 4.All the registers except the Rate Divisor register and Mode Selector register are 64-bit registers.There are 3 modes of operation for the proposed time management unit, based on the way inwhich the Limit register content is updated. The modes can be selected by writing appropriatecontrol words in the Mode Selector register. The three modes of operations is described below:Absolute time mode: This mode may be used for scheduling tasks precisely at a given abso-lute time. The time at which the interrupt is to be generated is specied in the Absolutetime register, which is accessible to the CPU. In this mode, the Limit register is loadedwith the contents of Absolute time register and at each processor clock cycle it is comparedwith the current physical time.T mode: This mode is intented for generating precise delays by emulating a one-shot timer.The desired delay time is specied in the T register. In this mode, the Limit register




















LogicFigure 3: Time Management Unit Architecturewill be loaded with the sum of the current contents of the Physical Time register andT register, in the next clock cycle after the the T register is updated. The idea is toprevent any loss of time as in timer tick-based approach.Auto-reload mode: In this mode, after generating the interrupt, the Limit register is auto-matically loaded with a new value similar to the T mode. The new value of the Limitregister is generated by adding the current contents of Limit register and the T register.For example, if TS is the physical time (the contents of the Limit register) at the instantwhen the interrupt is generated, and T is the delay time, then the comparator will gen-erate the next interrupt after TS +T seconds. The main dierence between this modeand the T mode is that in this mode the changes in T register will be eective in thecycle after the interrupt. The idea is to eliminate the delays (refer to gure-2) that wouldhave occurred in the T mode if the same functionality is implemented making use ofthe timer interrupt service routine to load a new time interval after each interrupt. Thismode can be used to implement very accurate time-based schedulers such as the one usedin the Maruti hard real-time operating system [15].The proposed time-management unit also provides an accurate mechanism to time-stampevents that occur in the system. This is made feasible because, at any processor cycle using asingle register transfer instruction, the current physical time can be read into one of the EventTime registers. The same operation can also be initiated by an external interrupt signal. This
An Accurate Time-management Unit for Real-time Processors 7
Absolute Time
Physical Time






Compare current time with:
Absolute Time register
Physical time +   T∆Figure 4: Time Management Unit Registersfacilitates accurate time-stamping of external events without interfering the CPU computations.In the next section, we explain how to make use of this feature to implement accurate clocksynchronization algorithms.4 Support for Clock SynchronizationThe essential requirements of a clock synchronization algorithm for distributed real-time sys-tems can be found in [9, 19]. A common characteristic of all clock synchronization algorithmsis that each node computes periodically the deviation of its local clock from a global timebase [16]. These clock synchronization algorithms make use of the knowledge about the localclocks of other nodes or a master node in the system to compute the corrections to the localclock. Time stamped packets are used for exchanging the current time of the local clocks in thesystem. But, in a distributed system, there is a large variability in the time taken by a packetfrom the instant it is submitted for transmission at the sender node to the time it is processedat the receiver node. This jitter associated with the message passing may be attributed to thedynamics of the network and the processing delays at the sender and destination nodes. For ex-ample, the ethernet protocol can introduce certain amount of uncertainty which increases withthe network trac [10]. However, by choosing protocols such as TDMA to pre-allocate slotsfor time message packets, the variability due to the network dynamics can be bounded [4, 15].However, the jitter persists due to the unpredictable processing delays at the nodes resultingfrom operations such as non-preemptible interrupt processing and DMA. It is clear from theabove discussion that regardless of the algorithm used for clock synchronization, the accuracy ofthe technique is aected by time stamping operation at the sender and receiver nodes. Clearly,a mechanism is needed to accurately time-stamp the packets just before they are transmittedat the sender and as soon as they arrive at the receiver. The proposed time management unitprovides such a mechanism to solve this problem by automatically time-stamping the packets.The packets are time stamped on arrival making use of the interrupt signal from the networkinterface card without any processor intervention. The current physical time will be latched inthe Event Time register by the interrupt signal, which can be made use of to accurately time-stamp the arrival-time of the packets. The Physical time register can be read for time stampingthe packet just before they are sent. Thus, with the help of the proposed time managementunit, without using any external hardware, the packets can be accurately time-stamped to im-plement clock synchronization algorithms. Moreover, the Absolute Time mode of the proposedtime management unit can be used for precisely scheduling the send time of time messages.In addition to the errors that occur due to the jitter in the message passing, the basic clocksource itself can generate errors due to drift. This drift in frequency of the clock source is due to













mFigure 5: Rate Adjustment Unittemperature variations and aging, and must be corrected. A discrete correction applied to thePhysical Time register can cause the local clock to instantaneously leap forward or be set backand then run at the previous rate, thereby violating the monotonicity property. This problemcan be avoided by amortizing (i.e., spreading out) the correction continuously over a timeinterval and this clock adjustment technique is called amortization [16]. The Rate Adjustmentunit implements this technique in hardware to avoid any abrupt jumps in the local clock.A frequency divider making use of a simple binary counter may not be sucient to derivethe desired output frequency accurately from the clock source. This is because of the truncationerrors in the approximation of the scaling factor to an integer value. However, it is possible toderive fairly accurate output clock signal by changing the width of a few clock pulses out of axed number of clock pulses periodically, so that the average frequency of the output signal isvery close to the required value. The Rate Adjustment unit makes use of this technique to derivethe desired clock frequency. The unit consists of a frequency divider (counter) for scaling downthe input clock source frequency and a phase adjustment counter1 to apply phase correctionsas shown in gure-5. Very small changes in output frequency is taken care of by re-loadingthe frequency divider with a slightly higher or lower count than the normal count at a ratespecied by the phase adjustment counter. If F is the frequency of the clock source and f isthe desired output clock frequency, then the normal scaling factor of the frequency divider isgiven by n = dF=fe. If F=f is not an integer, then one out of every 1=m output clock cycles,the frequency divider is loaded with a modied scaling factor n0 = n k, where k is an integer(note that k = 0 if F=f is an integer). The phase adjustment rate m, may be computed usingthe following relationship. (1 m)n +mn0 = FfMost of the time, at the time of re-synchronization, only the phase adjustment count, n0 andthe phase adjustment rate, m, needs to be updated. The parameters n, n0 and m are madeaccessible to the software at the Rate Divisor register.5 Related WorkThe importance of time management in real-time systems has been identied by researchersfor quite some time. The problems with the interval based timing mechanisms and the lackof coordination between the hardware timers and software were discussed by Volz and Mudge1The counter is called the phase adjustment counter because the counter changes the width or the phase ofthe output signal by a small amount.
An Accurate Time-management Unit for Real-time Processors 9in [19]. They suggested the use of absolute time as a solution and proposed an instructionlevel timing mechanism to accomplish this. They have also mentioned the idea of placing thetiming functions on the CPU chip for scheduling applications. In contrast, the architecture wehave proposed here is more versatile and generic in nature { the support for instruction-levelscheduling is one of the features supported by the proposed architecture. The architecture wehave proposed does not explicitly specify the format in which the time is represented, thoughthe architecture supports the absolute time representation mentioned in [19, 20]. The reasonfor making it a generic architecture is to make it easy to adapt with minimal modications tothe existing applications and to easily adapt to new time representation formats that may comeup in the future.The Mars project [4, 9] make use of proprietary network interface logic based on a clocksynchronization unit chip, to automatically generate time-stamps. The scheme they have pro-posed provides most of the functionality required by a time management unit. However, theportability of their solution to other platforms is highly restricted because it is based on exter-nal hardware. Whereas our solution is transparent to most of the hardware and the networkinterface logic used in the system, there by making it easily portable to other platforms. Thebasic idea of the rate adjustment scheme for deriving clock signal proposed in this report issimilar to the adjustable rate clock proposed by Volz et al. for clock synchronization in IEEE896 Futurebus+ systems [20]. The dierences are mainly of hardware implementation details.Another hardware-based clock synchronization technique for synchronizing the clock signalscan be found in [1]. This technique implements a modied version of the interactive conver-gence algorithm CNV [13] and assumes that the actual clock signals are available for skewmeasurements. However, this scheme is not suitable for large distributed system because of theproblems associated with distributing the clock signal over large distances.The hardware-assisted software clock synchronization scheme proposed by Ramanathanet al.[14] make use of an algorithm similar to CNV for a distributed system with point-to-pointinterconnection topology. They emphasize on the algorithmic aspect than the implementationof the hardware support required. The resolution of their technique for applying corrections tothe logical clock at nodes is limited by the frequency of the clock source and the scaling factor oftheir scheme can only assume one xed integer value. As a result of which their scheme can notcompensate for very small variations in frequency. Moreover, the architecture proposed here isaimed at providing mechanisms for ecient implementation of distributed clock synchronizationalgorithms with minimum software overheads and better accuracy, rather than implementing aspecic algorithm in hardware.6 ConclusionIn this report we have proposed an accurate time-management unit architecture for real-timeprocessors. Our design is motivated by the lack of support provided for time managementin the modern processors. The proposed time management unit can be incorporated into anyprocessor architecture with little extra logic. With the recent developments in VLSI technology,such a timemanagement unit with nanoseconds resolution can be easily implemented. The basicidea behind the proposed time management unit is to exploit the parallelism between the timemanagement functions and normal computing operations of the processor, at the same timeproviding an instruction-level mechanism to access the system time. We believe that moving thetime management functionality into the processor will greatly help to generate better solutionsin terms of performance, simplicity and maintainability.
An Accurate Time-management Unit for Real-time Processors 107 Future WorkIn order to support the proposed on-chip time-management unit, the processor architecturemust provide a deterministic instruction-level mechanism to interact with the time-managementunit hardware. In the modern multiple-issue processors supporting out-of-order execution ofinstructions, it is hard to predict the delay between instruction issue and retirement. Therefore,on such processors a special instruction for reading the current physical time into the eventtime register is not sucient, unless there is a mechanism to ensure that the instruction canbe executed within a deterministic time interval. We would like to address this problem in ourfuture research. At present, as a rst step towards understanding the problem, we are lookingat the timing issues and temporal accuracy of one of the commercial o the shelf modernprocessors.AcknowledgmentIntel386 EX, Pentium and Pentium Pro are registered trademarks of Intel Corporation.References[1] Y. Baek, H-K. Lee, and H. Yoon. New hardware-based clock synchronization for theByzatine fault. Electronics Letters, 28(21):2018{2019, October 1992.[2] Dipto Chakravarty. POWER RISC System/6000: Concepts, facilities, and architecture,chapter 14. McGraw-Hill, Inc., New York, 1994.[3] Product Data book, chapter 6. Dallas Semiconductor Corporation, Dallas, TX, 1992-93.[4] Hermann Kopetz et al. Distributed Fault-tolerant Real-Time Systems: The Mars Ap-proach. IEEE Micro, pages 25{40, February 1989.[5] Flaviu Cristian. Probabilistic Approach to Distributed Clock Synchronization. In 9thInternational Conference on Distributed Computing Systems, pages 288{296. IEEE, 1989.[6] Microprocessor and Peripheral Handbook, volume II, chapter 6. Intel Corporation, SantaClara, CA, 1989.[7] Pentium Pro Family Developer's Manual, volume 1-3. Intel Corporation, Mt. Prospect,IL, 1996.[8] P. Koopman. Perils of the PC Cache. Embedded Systems Programming, 6(5):26{34, May1993.[9] Hermann Kopetz and Wilhelm Ochsenreiter. Clock Synchronization in Distributed Real-Time Systems. IEEE Transactions on Computers, C-36(8):933{940, August 1987.[10] James F. Kurose, Mischa Schwartz, and Yechiam Yemini. Multiple-Access Protocols andTime-Constrainted Communication. ACM Computing Surveys, 16(1):43{70, March 1984.[11] L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed system. Commu-nications of ACM, 21(7):558{565, July 1978.[12] L. Lamport. Using time instead of timeout for fault-tolerant distributed systems. ACMTransactions on Programming Languages Syst., 6(2):254{280, April 1984.
An Accurate Time-management Unit for Real-time Processors 11[13] L. Lamport and P.M. Meilliar-Smith. Synchronizing Clocks in the presence of Faults.Journal of the ACM, 32(1):52{78, January 1985.[14] P. Ramanathan, Dilip D. Kandalur, and Kang G. Shin. Hardware-Assisted Software ClockSynchronization for Homogeneous Distributed Systems. IEEE Transactions on Computers,39(4):514{524, April 1990.[15] M. Saksena, J. da Silva, and Ashok K. Agrawala. Design and Implementation of Maruti-II.In Sang H. Son, editor, Principles of Real-Time Systems. Prentice Hall, Englewood Clis,N.J., 1995. Also available as University of Maryland CS Tech Report CS-TR-3181.[16] Frank Schmuck and Flaviu Cristian. Continuous clock amortization need not aect theprecision of a clock synchronization algorithm. Technical Report RJ 7290 (68547), IBMAlmaden Research Center, San Jose, CA, 1990.[17] Andrew S. Tanenbaum. Modern Operating Systems, chapter 5. Prentice-Hall, Inc., Engle-wood Clis, New Jersey, 1992.[18] VAX hardware handbook, chapter 20. Digital Equipment Corporation, Maynard, Mass.,1982.[19] Richard A. Volz and Trevor N. Mudge. Instruction Level Timing Mechanism for AccurateReal-Time Task Scheduling. ACM Transactions on Computers, C-36(8):988{993, August1987.[20] Richard A. Volz, Lui Sha, and Dwight Wilcox. Maintaining Global Time in Futurebus+.The Journal of Real-Time Systems, 3:5{17, 1991.
