Timing is an important concern when designing an embedded system. While lots of researches on hard realtime systems focus on design-time analysis, monitoring the corresponding runtime behaviors are seldom investigated. In this paper, we investigate the conformity problem for runtime inputs of a hard real-time system. We adopt the widely used arrival curve model which captures the worst/best-cases event arrivals in the time interval domain and propose an algorithm to on-the-fly evaluate the conformity of the system input w.r.t. given arrival curves. The developed algorithm is lightweight in terms of both computation and memory overheads, which is particularly suitable for resource-constrained embedded systems. We also provide proofs and an Fpga implementation to demonstrate the effectiveness of our approach.
INTRODUCTION
Guaranteeing timing properties is an important aspect for building embedded systems. In particular for the class of real-time embedded systems, meeting timing constraints, e.g., worst-case response time and end-to-end latencies, is a major design concern. Researchers on hard real-time Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. timing analysis in the literature [1, 5, 6, 12] often focus on design-time analysis, trying to compute worst-case bounds on timing properties at an early phase of the system design. The validity of the design-time analysis and the safeness of the derived bounds rely on the assumption that the system input follows certain specifications. In order to not harm the safeness of the analysis results, the runtime inputs of the system (or components) need to be conformed to the specifications used by the design-time analysis.
The conformity verification, however, is non-trivial. On the one hand, the verification has to cover the worst cases in order to be in consistence with the offline analysis. On the other hand, the verification and a possibly preceeding regulation mechanism have to be lightweight due to the stringent timing and resource budgets of the system. Therefore, directly applying the commonly used techniques which usually rely on expensive numerical computation may not be suitable for the runtime monitoring.
In this paper, we investigate the runtime conformity problem. Targeting hard real-time embedded systems, we try to provide on-the-fly verification for the worst-case conformity of system inputs. We adopt the widely used arrival-curve model which captures the worst/best-cased system inputs in the time interval domain and propose an algorithm to evaluate the conformity of input traffic with respect to given arrival curves. In case too many events are detected, our algorithm regulates the traffic such that the traffic complies again with the curve specifications assumed at design time. In case too few events are detected, no generic solution can be offered, but the applications can be notified.
Based on the results in [9] that an arrival curve can be conservatively approximated by a set of staircase functions each of which can be modeled by a leaky bucket, we use a dual-bucket mechanism to monitor each staircase function during runtime, one for conformity verification and one for traffic regulation. By tracking the fill level of buckets, the computationally expensive (de-)convolutions used by the design-time analysis are eliminated. Our approach is thus lightweight in terms of both computational overhead and memory footprint, particularly suitable for embedded systems with limited resources. We also provide formal proofs and an Fpga prototype for our algorithm to demonstrate the effectiveness of our approach.
The rest of this paper is organized as follows: Section 2 reviews related work in the literature. Section 3 presents the system models and analyzes the problem. Section 4 presents our algorithm and the proofs. Experimental results are presented in Section 5 and Section 6 concludes the paper.
RELATED WORK
The analysis of traffic regulators is not new. In the domain of classical networking flow control, the studies of lossless greedy regulators by means of network calculus can be found in [2, 11] . Such traffic regulators are usually modeled as leaky-bucket shapers. To model lossy systems, traffic clippers [4, 3] are introduced to regulate network packets. Unlike shapers that delay network packets, a traffic clipper actively discards non-conformed packets. The modeling of leaky-bucket greedy shapers in the context of real-time calculus (Rtc) [12] is presented in [13] . The latest work on this direction [7] uses a greedy shaper to optimally reduce the peak temperature of a real-time system. All aforementioned work is offline analysis. Whether such analysis can be applied for online monitoring is not clear.
In [9] , a methodology for coupling timed automatabased [1] and Rtc-based models are proposed, which enables hybrid analysis of a system containing both state-based (timed automata) and state-less (Rtc) components. The basic idea underlying the proposed methodology is to use a set of leaky-bucket event generators as an interface between the state-based and stateless abstractions, such that models can be interchanged between these two abstractions. This technique is applied in [8] to predict tighter worst-case bounds for the arrivals of future workload. Taking the idea in [9, 8] to the level of runtime, this paper investigates traffic conformance, trying to develop lightweight routines for on-the-fly traffic verification and regulation. We also prototyped an Fpga implementation of our approach on an ALTERA Cyclone III development board.
MODELS AND PROBLEM

Event Arrival Curves.
Event streams in a system can be described using a cumulative function R(s, t), defined as the number of events seen in the time interval [s, t). While any R always describes one concrete trace, a 2-tuple α(∆) = [α u (∆), α l (∆)] of upper and lower arrival curves [10] provides an abstract event stream model that characterizes a whole class of (nondeterministic) event streams. α u (∆) and α l (∆) provide an upper and a lower bound on the number of events seen on an event stream in any time interval of length ∆:
with α l (∆) = α u (∆) = 0 for ∆ ≤ 0. The concept of arrival curves unifies many other common timing models of event streams. For example, a periodic event stream can be modeled by a set of step functions where
. For a sporadic event stream with minimal inter arrival distance p and maximal inter arrival distance p , the upper and lower arrival curve is α u (∆) =
, respectively. A widely used model to specify an arrival curve is the PJD model by which an arrival curve is characterized with a period p, jitter j, and minimal inter arrival distance d. The upper arrival curve is thus α u (∆) = min{
For details, please refer to [12] . Complex Arrival Patterns.
In this work, we deal with discrete numbers of event arrivals and their arrival patterns. In principle any (discrete) complex arrival pattern can be bounded by a set of upper and lower staircase functions, as long as the system under consideration is monotone and time-invariant [9] . The monotone property means that a higher number of input events seen in an interval yields a higher number of output events in intervals of equal or larger sizes. The timeinvariant property means that the system behavior depends on the system states only. No matter when this state is reached, the possible set of the reactions of the systems is always the same, independently upon the concrete time when the actual state is reached.
An upper arrival curve thereby can be conservatively approximated as the minimum on the set of staircase functions of the form
An example for such approximation is depicted in Fig. 1 , where α u is given as the minimum of three staircase functions α
, and α 
Problem Statement.
Given a trace R, checking its conformity w.r.t to an arrival curve is theoretically not a problem. One can simply inspect the trace by the definition in Eqn. (1) . In the case that
, a violation occurs. Alternatively, one can apply the min-plus de-convolution:
according to [10] , Once a violation is detected, the traffic can be regulated to re-conform again to the specified arrival curves, e.g., by imposing a certain delay for the over-bursty input events. A usual way is to use a greedy shaper σ such that
The shaper σ can be simply the convex hull of α u . One might notice that above approaches require numerical computation for the min-plus (de-)convolution, which demands intensive computing power as well as large memory footprint. Directly applying these approaches for online monitoring is thus prohibited, in particular for the class of embedded systems with stringent resource constraints. Therefore, lightweight alternatives are needed to conduct efficient conformance verification as well as violation recovery. In the next section, we will present an approach that solves this problem in a particular way.
OUR APPROACH
The idea of our approach is based on the knowledge that an arrival curve can be conservatively approximated by a set of staircase functions [9] , each of which can be modeled by a leaky-bucket kind event generator. Rather than generating events, we use the leaky bucket mechanism for online monitoring. In this context, the bucket capacity corresponds to the maximally tolerable number of bursty events and the leak rate models the period of the staircase function. The fill level of the bucket is used as an indicator for the remaining capacity of the tolerable burst.
For each staircase function, we employ two leaky buckets, namely V-bucket for input conformity verification and R-bucket for input regulation. The fill level of V-bucket indicates how many bursty incoming events can still be tolerated while the fill level of R-bucket shows how many bursty events are allowed to release. By simply tracking the fill levels of the buckets, the computationally expensive minplus (de-)convolution normally used by the offline analysis can be eliminated for the online monitoring, resulting in a lightweight software or hardware implementation.
The flow of our approach is shown in Fig. 2 . Upon each event arrival, the conformity of the event is tested. If the arrival of this event conforms to the specification, this event will be immediately released. If not, certain regulation is applied to enforce the conformity. For the current version of our algorithm, we delay the release time of non-conformed events. Discarding events due to deadline violation or buffer overflow can be easily adapted based on the proposed scheme. We will discuss their solutions in Section 4.3. Note that we only present the algorithm and proofs for the upper bound α u . The conformity verification of the lower bound works in a similar manner.
Algorithm
Assume an arrival curve is approximated by n staircase functions Si, i ∈ n. Each Si is defined by a leaky bucket with two parameters, namely bucket capacity N 
if BFL 
For any time interval (s, t], BFL v i can be computed as:
Since BFL This case can also be considered as a renew point of the bucket. We will use this property in the later-on proofs.
The R-bucket works similarly. BFL v i controls when and how many events can be released (Lines 23-31). Only when all BFL r i > 0, an event can be released. It is decreased by 1 when an event is released (Line 29). Otherwise events will be postponed until every BFL r i turns nonzero. To release buffered events, we use a first-come first-out scheme.
Correctness
This section proves the correctness of our algorithm. For simplicity, we provide the formal proof for the case of n = 1, i.e., α u (∆) = N u + ∆ δ u . Proofs for n > 1 cases follow a similar scheme.
To prove the algorithm, we divide the time axis for the system execution into a set of consecutive time segments. A time segment is defined as follows.
Def. 1. A time segment F is the time interval for the value of BFL v changing from N u back to N u , i.e., between two renew points in the trace (Lines 2-5, Algo. 1).
The starting and ending time instants for Fi are denoted by tS i and tE i , respectively. For the arrival of the n th event en in the trace, we also designate tn and xn the time instant and the value of BFL v , respectively. Based on above definitions, we have following lemmas. 
Proof. Consider an arbitrary interval [s, t].
There are m + 1 events arrived within this interval and these events are numbered as n, n+1, . . . , m+n, so that tn
(1), the lemma holds. Proof. Fig. 3(a) illustrates such an example. According to the algorithm, the timer CLK v is cleared at time instants tn, tS i , and tE i . With Eqn. (6), we have
Theorem 1. Given a trace R and a staircase α u , in the case that BFL v ≥ 0, Algo. 1 guarantees R conform to α u .
Proof. What we need to prove is for ∀ 0 ≤ s ≤ t, R(t) − R(s) ≤ α u (t − s). We consider two cases, i.e., s and t are located within one segment and in two different segments. First we consider the special case of N u = 1. In this case, each segment allows only one event to guarantee BFL v ≥ 0, i.e., there is only one event in each segment. Without loss of generality, let s and t the arrival time of event en and em which arrive at segment Fn and Fm, respectively. Form Lem. 3, we have tE n −tn = δ u and tS m −tE n = (m−1−n)δ u .
Based on Lem. 2, the theorem holds for N u = 1. In the following, we provide the proof for N u ≥ 2.
• Single-segment case: For any given segment Fi, assume en is the first event arrived in Fi, as shown in Fig. 3(a) . Obviously, BFL v is N u − 1 at time instant tn. We further assume that events e n+k and en+m (m > k ≥ 0) arrive within Fi at time instants t n+k and tn+m, the corresponding values of BFL v being x n+k and xn+m, respectively. According to Eqn. (6), we have
For k > 0, we know 0 ≤ xn+m, x n+k ≤ N u − 2 (Lem. 1). Therefore, we have tm+n − t n+k = (tm+n − tn) − (t n+k − tn)
Based on Lem. 2, the theorem holds for this case.
• Multiple-segment case:
We consider segments Fi and Fj with j > i. As shown in Fig. 3(b) , event en is the first event in Fi with BFL v = N u − 1 and event en+m is the last event in Fj with
. In Fj, event en+r is the first event with
Considering events en+s of Fj and e n+k of Fi, we have following equations based on Eqn. (6) :
For k = 0, s = r, we have 0 ≤ xn+s, x n+k ≤ N u − 2 (Lem. 1). Together with Eqns. (9)- (12), we get tn+s − t n+k = (tn+s − tn+r)
For k = 0, s = r, from Lem. 3, Eqns. (10), (9), and (12) as well as N u ≥ 2, we have
u from single-segment case and Eqn. (10). Then we have
From above cases, the theorem holds.
Corollary 1. At the time instant when BFL v turns small than 0, a violation to α u occurs.
Proof. As shown in Fig. 3(a) , assume BFL v < 0 occurs when event em+n arrives in Fi, i.e., xn+m ≤ − 1. According to Eqn. (7), tm+n −tn = (m+xn+m −(N u −1))δ u +σm+nδ u . Consider the interval [s,t] with s = tn − λ and t = tm+n, where 0 < λ
Theorem 2. The resulting trace regulated by Algo. 1 conforms to α u .
Proof. The output traffic is modulated by the R-bucket. The R-bucket works in the same mechanism as V-bucket. In addition, variable backlog (Lines 23) guarantees that BFL r will never go below zero. Therefore, the theorem holds according to Thm. 1.
Discussion
The algorithm and proof in the previous section are for the conformance verification and regulation of the upper bound of an arrival curve. Similar technique can be used for the violation detection of the lower bound. The regulation of input traffic to re-conform to a lower curve is however not possible in this context. As violation of the lower bound basically means no sufficient number of events occurs for a certain time interval, a regulation by injecting artificial events into the system would violate our basic assumption that we only consider time-invariant systems. Nevertheless, a warning can be issued to the application, so that the application itself might be able to react to the violation.
Note that another assumption of our approach is that the violated events will be stored in a buffer and released at a later point of time. Too many buffered events may lead to buffer overflow of our algorithm. Although it is unavailable, we nevertheless can detect such occurrence by modulating the event queue q in the algorithm. Another fact is that delaying the input events may result in deadline violation of input events. Detecting the deadline violations can also be included based on the current scheme.
EXPERIMENTS
This section demonstrates the effectiveness of our approach by empirical case studies.
We implement our algorithm both in Matlab and Verilog HDL. The Verilog HDL code is synthesized in ALTERA Cyclone III Fpga.
We adopt the PJD model (Section 3) for the specification of event streams. The upper bound α u for such a model can be represented as the minimum of two staircase functions. The parameters of the two staircase functions can be computed as follows [9] :
To generate traces with different patterns, the Rtc/Rtstoolbox [14] is used. We first generate a worst-case trace that conforms to the upper bound. Then we inject random events to artificially create violations. In our experiment, we employ an arrival curve with period of 100us, jitter of 300us, and delay of 20us. In order to validate our algorithm, we implement a discrete-time simulation in Matlab and an Fpga testbed. The Matlab simulation is implemented using the Rtc/Rtstoolbox. The testbed is comprised of an event generator IP and the algorithm IP, as the block diagram shown in Fig. 4 . The event generator IP is used to generate events that comply with those used in the Matlab simulation. The algorithm IP itself consists of four modules. As shown in the figure, the MultiBuckets module contains a reconfigurable number of bucket pairs, each of which contains a V-bucket and R-bucket. The EventSyn module synchronizes the FIFO and MultiBuckets modules when events arrive. The output module controls the release of events. The FIFO module is used to buffer regulated events. The testbed is simulated using ModelSim. Details of the implementation are given in Fig. 6 in the appendix.
We compare the theoretical and experimental results in Fig. 5 . The solid line Rorg and dashed line Rreg represent the original and regulated traces, respectively. The star dots P dec and round dots P buk indicate the violation events computed by Eqn. (3) and detection by our algorithm, respectively. As expected, the two sets of dots match ( Fig. 5(a) ). We also compute the upper bound αR reg for the regulated trace Rreg (by Eqn. (3)) and compare with the input specification α u . As the results shown in Fig. 5(b) , αR reg is bounded by α u . From Fig. 5 , we experimentally confirm that our algorithm performs correctly.
We also report the resource consumption and latency for the Verilog HDL implementation. We synthesis the implementations for 2, 4, and 6 pairs of buckets using ALTERA Cyclone III EP3C120F780 development kit. The resource consumption is shown in Tab. 5. As shown in the table, the used logic elements even for the case of 6-pair buckets are still under 0.3% of the total resources (in total 119, 088 logic elements in the Fpga board). Furthermore, the resource usage is linear w.r.t the number of bucket pairs, which indicates our algorithm can be used to regulate the runtime traces for complex arrive curves. Note that, in this implementation, the same FIFO buffer is used for all buckets as events of the trace belong to the same arrival curve. Therefore, the size of the FIFO module is independent on the number of buckets and is decided by the over-burst events that is intended to tolerate.
Regarding timing overhead, 6 cycles are needed to transfer an event from the input to the output of our IP in the case that no regulation is employed. As the working frequency of the Fpga is set to 50 Mhz, this latency corresponds 120 ns. This result indicates the timing overhead of our algorithm is considerably small, which can be integrated into the WCET of the events without significant side-effects for the timing 
CONCLUSION
This paper presents an online algorithm for the traffic conformity and regulation of hard real-time systems. Our algorithm can detect input violation and regulate the violated traffic to comply again with the specifications. We also present formal proofs, simulation results, and an Fpga implementation to demonstrate the effectiveness of our algorithm. The experiment results show that the resource and timing overheads of our algorithm are lightweight, particularly suitable for embedded systems with stringent resource constraints. Figure  6 : The detailed block diagram of algorithm IP (Fig. 4) , implementing two bucket pairs. Pins pin_eventin and pin_event_data, respectively, are event synchronization signal and event data bus. Pin pin_violation represents a pulse signal when violation occurs. Regulation signals are generated at pin pin_regulaion and pin_event_out_data. When BFLr value is larger than zero, pin_regulation asserts a high level and event data is put on pin_event_out_data at the same time.
APPENDIX
