Abstract -As a network QoS technique, two rate three color marker（trTCM） has been widely used in the implementation of traffic policing, traffic shaping and port rate limiting. The implementation of the trTCM is usually based on software,whichis simple and easy to be configured. However,the software trTCM has low marking precision and it cannot be easily scaled tosupport highspeednetwork traffic. We propose a novel method to design and implement thetrTCM based on FPGA. Thekey configuration parameters of the trTCMare optimized based on thethoroughly theoretical analysis.The experimental results show that the average mark accuracyof the designedtrTCMis as high as99.9734%withacceptable fewer hardware resources.
Introduction
With the rapid development of the Internet and computer technology, and constant diversity of multimedia services on the network, the best-effort service has become increasingly unable to meet the requirements of real-time business on the quality of service (QoS) nowadays. In the network, providing QoS control according to user needs has become the new challenges faced by the network service [1] . In this case, IETF RFC recommended standardized singlerate three color marker (srTCM) and tworate three color marker (trTCM) two token bucket algorithms. These two algorithms can be widely applied to limit the access rate, general traffic shaping and physical interfacesratelimiting [2] . The srTCMis a single-bucket or double-bucket structure [3] , it is relatively simple in tokenadd methods and packet processing procedure; ThetrTCMis double-bucket structure [4] , it is relatively complex in add token and processes packets. The srTCM focuses on the size of packets burst, while the trTCM focuses on the burst rate, both have their advantages [5] . In practical applications, appropriate method for different services should be applied.
Since trTCM is usually implemented based on the software that is deficient in precision and performance. The main contribution of this paper is trTCM implemented by FPGA. On the basis of the theoretical analysis, by optimizing the configuration parameters, wehave designed and implemented this trTCM with high precision, which occupiesfewer hardware resources.
Two Rate Three Color Marker design implementation

A. Parameter Description
According to the IETF RFC for trTCMregulations, the trTCM has four parameters need to be set, including: CIR, CBS, PIRand PBS. In order to make the hardware easier to control marker, we define the following parameters: 1) Time granularity (Time_GRA):describes intervals added to the token. 2) Token granularity (Token_GRA):describesthe number of bytes sent by a single token.
3)
P bucket token increase granularity (P_Bucket_Add_GRA) :describes the number of tokens want to add to P bucket in a single time granularity cycle.
4)
C bucket token increase granularity (C_Bucket_Add_GRA) :describesthe number of tokens added to C bucket in a single time granularity cycle. 5) P bucket barrel depth (P_Bucket_Size):describes P bucket can accommodate the number of tokens. 6) C bucket barrel depth (C_Bucket_Size):describes C bucket accommodated the number of tokens. According to the parameters defined above, we can have the following result: ; ; ; ; In this paper,time granularity, P bucket token increase granularity, C bucket token increase granularity, bucket barrel depth and C bucket barrel depth support external configuration via software to implement the flexible control.
For the purpose of simplification, we set the token granularity to be1 that means a single token can send a byte packet data.
B. Theoretical Analysis
In order to verify the function and performance of the proposedtrTCM, the trTCM has been implemented in selfdeveloped network processing engine. The operating frequency of Network Processing Engine is 250 MHz (clock cycle 4 ns).We specify the range of token bucket rate-limit is from 1Mbps to 40Gbps when designed.
(1)Time granularity values. According to the formula:
We have:
Or
In order tomake hardware reserved Time_GRA register to meet the maximum parameter needs, we should take the maximum ofP_Bucket_Add_GRA and C_Bucket_Add_GRA to get Time_GRA maximum, while the maximum of International Conference on Computer, Networks and Communication Engineering (ICCNCE 2013) P_Bucket_Add_GRA and C_Bucket_Add_GRA are not sure, so we design a reference time granularity to determine Time_GRA value, which is the value when both of P_Bucket_Add_GRA and C_Bucket_Add_GRAhave a value of 1. Thus, the maximum reference time granularity should satisfy:
or Then, we have： .
Converted to the hardware clock cycle: 2 -17 /4*10 -9
, so the maximum reference time granularity can take an 11-bit wide register.
(2) Token increase granularity values. According to the formula:
and In order to make the hardware reserved C_Bucket_Add_GRA, P_Bucket_Add_GRA register meet the parameters requirements, we set the maximum value of PIR and CIR to 40Gbps, the maximum value of Time_GRA is 1908, which is the maximum reference time granularity. Formula:
;
. Therefore, the C bucket token increase granularity and P bucket token increase granularity can take a 16-bit wide register.
(3) Bucket depth values. According to the RFC specification, we summed up the bucket depth value which should satisfy 1) greater than the maximum packet length, 2) more than the value of token increased granularity, 3) less than or equal to the maximum ratelimit. The maximum of the bucket depth (Bucket_Size) should meet the condition Bucket_Size 40Gbps/8=5Gbps.Therefore, barrel depth register can take a 33-bit wide register. Function module of trTCM consists of the clock module, token management module, and metering module. The modular structure of implementation is shown in Figure 1 .The signal definition list is shown in Table 1 . It is based on the size of the packet (Packet_Size), P bucket and C bucket remaining number of tokens (P_Bucket_Sum, C_Bucket_Sum) to color. It is according to barrel depth, P bucket token increase granularity,C bucket token increases granularity and result of metering to update the number of tokens in bucket. Clock module is responsible for timing. After external logic initial token parameters such as time granularity, token increase granularity and barrels depth, the start timing signal (Start_T) is set valid, and the clock counter starts from 0. When the counter reaches a time granularity period (arrived at Time_GRA), token-add signal (TokenAdd) is to be outputted. It notifies the token management module to add the token to bucket; counter is set back to 0 again. Token management module is responsible for control and management of the token by listening to TokenAdd signal. This signal is sent by the clock module, let P bucket and C bucket add tokens. The amount of added tokens is the token increase granularity. By listening to meteringmodule to judge signal issued by C_BucketSub or P_BucketSub, let P bucket or C bucket delete tokens .The amount of deletion is equal to the value of Sub_Size signal. Its state machine ofimplementation is shown in Figure 2 . State machine describesthe C bucket managing state. Since the state machine of the P bucket is the same as C bucket, so there is no needto describe it. As is shown in Figure 2 , the state machine consists of the initialization (C_initial), idle (C_idle), token-add (C_add) and token-subtract (C_sub). In order to avoid signal token-add and token-subtract conflict, we take precedence of token-subtract operation, which postpones the processing token-add operation. The state operation is specified as follows:
C. Design and Implementation
Or C_bucketsub==1
P_bucketsub==1
Figure 2 C bucket management state machine
C_initial: According to the external logic configuration, token increase granularity and barrel depth register are initialized. Then, Start_T signal is set valid, the state jumps to C_idle state; otherwise, the state remainsinc_initial state. C_idle: Listen to whether token-add or token-subtract signal is valid, if only the token-add signal is valid, statejumps to C_add state; If only the token-subtract signal is valid, thestate jumps to C_sub state; If the token-add and token-subtract signalarebothvalid at the same time, keep with token-add signal current values, the state jumps to C_sub state. C_add: Judge whether the sum of C bucket current number of tokens coupled and token increase granularity is greater than the bucket depth. If greater, set the current number of tokensequals to the barrel depth. Otherwise, the current number of tokens is equal to the sum of the number of remained tokens and token increase granularity. Then listen to whether token-subtract signal is valid. If so, the state jumps to C_sub state; If not, the state jumps to C_idle state. C_sub: the current number of tokens minus the length of the packet, the state jumps back to C_idle state.
Metering module is based on the current number of tokens in P bucket, the current number of tokens in C bucket and packet length information to mark different packets with different colors. Metering module notifies token management module by C_BucketSub or P_BucketSubsignal. Deletion of size of P bucket or C bucket is determined by the Sub_Size signal. (Sub_Sizeisequal to packet length). The specific code of two rate three color marker algorithm is shownin Algorithm 1. When the arrival packet length is greater than P bucket current number of tokens(P_Bucket_Sum<Packet_Size), packet is marked red; when the packet size is less than or equal to the current number of tokens in the bucket of the P and C,packet is marked yellow, P bucket-subtract signal is set valid, the size of token decrease is equal to the size of packet; when the packet size is less than the current number of tokens in the bucket C, packet is marked green, both P bucketsubtract and C bucket-subtract signal are set valid, the size of token decrease of both bucket is equal to the size of packet.
ExperimentalAnalysis
In order to verify the function integrity and coloring effect of the trTCM in the real flow measurement, we use Verilog HDL to compile trTCM algorithm. The trTCM is implemented based on Altera FPGAs [6] (Stratix IV EP4SGX180KF40C2). We analyze the implementation code and occupied logical resources by using Quartus II hardware programming tools .We use the IXIA XM2 tester to analyze the mark error of trTCM in a real environment. Table 2 analyzes the resource consumption of token management module and metering module. In table 2, the token management module and metering module occupy less than 0.07% of the total storage and logic resource. Therefore, the realization of the two rate three color marker has the characteristic of utilizing fewer resources. Figure 3 shows the error of marker marked in the case that P bucket and C bucket barrel depth is set to 40 k and 10 k respectively. As depicted in figure 3 , the average committed burst rate error is only 0.0322%, the average peak rate error is only 0.021%, the average error of both committed rate error and peak rate error is only 0.0266%,so the average mark accuracy is as high as 99.9734%.Therefore, it is a marker with higher accuracy.
Experimental results summary: it verified that the trTCM can effectively achieve color-coded and require fewer resources when implemented on FPGA. Performance test verified the mark error of trTCM. The experiment proves that the trTCM has good color marked effect, and average mark accuracy is as high as 99.9734%, the storage and logic resources cost is less than 0.07%.
Conclusion
This paper proposed to use FPGA to realize trTCM. Based on the theoretical analysis, the optimized configuration parametersand value ranges of the trTCM are determined. The logic structure and state-machine of trTCMare designedbased on the parameters,and it is further implemented based on FPGA. The experimentalresults show that the implementation of the trTCM is feasible and with high accuracy. The trTCM implementation technique can be well applied to network processing equipment such as switches and routers.
