We propose a rate controller for ATM switches. The rate controller supports multiple priorities, and dual leaky bucket (GCRA) traffic descriptors (such as VBR). While regulating each stream independently, our rate controller requires relatively modest computation bandwidth so that it can be implemented without any additional special-purpose hardware. It also enjoys the important advantage of being decoupled from the link schedulers. We study what is the best way to allocate resources to rate controllers along the path of connection, and demonstrate the effectiveness of aggressive shaping at the network ingress.
Rate (VBR); '(2) Support for multiple priorities. (3)
Rate regulation module is decoupled from the link scheduling module. This allows one, for example, to replace the priority policy or mechanism without changing the regulator part. It also allows for a simple analysis of the maximum cell delay, and the buffer space requirement; (4) The processing overhead of the regulator is low enough to he implemented in software. In particular, the computational complexity depends only on the actual number of the cells transmitted. regardless of the number of connections. This property makes the algorithm highly scalable.
T h e Rate Controller Architecture. The rate controller consists of two modules: trufic regulator and priority scheduler. Each connection has a dedicated buffer a t the regulator, in which incoming cells are stored. The regulator holds each cell until it is eligible for transmission? where a cell is said to be eligiblc if it is conforming according to its GCRA-based traffic model. When a cell beconies 0-7803-5842-2/00/fI0.00 02,2000 IEEE. eligible, it is put in a scheduler buffer: the scheduler maintains a buffer for each priority class. The scheduler's task is to select the next cell to transmit on the outgoing link according to the priority policy. Usually, the scheduler enforces the strict priority policy, wherein at any given time, the cells of the highest non-empty priority class buffer are served. Within each priority class, the scheduler may use another policy such as Weighted Fair Queuing (WFQ) [4], which ensures that each connection gets its allocated bandwidth share.
GCRA Regulator Algorithm. The algorithm we propose is based on calendar queues [l] . The basic idea is as follows. We maintain a calendar data structure, which is a cyclic array of 'linked lists of active connections. Each entry in the array corresponds to a cell-slot time, and the calendar, at any time, corresponds to a sliding time interval which starts at the current time and extends to the future.
A pointer called now advancing cyklically over the array , always pointing to the current entry. The list hanging from the current entry contains structures representing all connections which are eligible for transmission now. The action taken by the algorithm in a time step is as follows. After advancing the now pointer, the list of connections pointed to by the current calendar entry is scanned in order.
For each connection on the list, a cell is submitted to the scheduler (if available-see below), the next conformance time is computed, and the connection is concatenated to the list hanging from the entry corresponding to that time.
Some special care is needed for connections which become eligible for transmission, but do not have any cell to transmit yet. For this case, we add a special flag eligible to each connection, and an additional test performed by each incoming cell: When a cell arrives, if the buffer of its connection is empty and if the eligible flag is set, then it clears the eligible flag and joins the tail of the list of the next entry to be scheduled. Our algorithm can be implemented in 23 cycles on standard RISC processors.
Performance Evaluation. We evaluated the performance of the GCRA shaping in a simple network. The network model we use consists of a line of m switches (see Figure 1 for a schematic representation of a network with m = 4). This line is the path of a single connection we examine, called the focus connection henceforth. In each switch, other connections are multiplexed into the output link with various input and shaping parameters. Specifically, we have three types of connections denoted by A, B and C (see Table 1 ). Each traffic source conforms to GCRA(I,,, Lin) with the given parameters, and the regulator shapes them to conform to GCRA(Z, L ) as given. The focus connection is of type A (setting the source to be of another type does not affect the results in a significant way). In each switch, we inject fresh traffic streams: 9 of type A, 10 of type B, and 10 of type C. All streams except the focus connection leave the system after one hop.
Ail traffic sources are generated by a stochastic on/off process with GCRA(lin, Lin) parameters. The idea is that the process emulates a source which generates packets: the Li, parameter corresponds to the packet size, and the lin parameter corresponds to the average rate. To control the interaction between connections, the arriving time of each packet is uniformly distributed between 0 and the connection period Lin/Iin.
In this setting, a backlog of non-conforming cells may be generated in the input buffers since Li, > L, i.e., the the regulator shapes the outgoing traffic to be smoother than the incoming traffic.
Temporary congestion in the output buffers may be created when packets from several streams arrives at the same time. We measured the performance of the GCRA shaper in terms of the following parameters: buffer occupancy, end-to-end delay and inter cell arrival time at the destination. The delays and the inter-cell arrival times were measured only for the focus connection. We present results for various values of the shaper's L value, modeling the burstiness of the shaped traffic. We compared L values ranging from L = Li,/2 (corresponding to rough shaping) to L = L,,/20 (corresponding to smooth shaping). Figure 2 shows the buffer requirement inside the network for the focus connection. Observe that
smaller L values at the shaper reduce the buffer space requirement dramatically'at all switches except the first. An important consequence of space reduction for smoother traffic is that the end-to-end delay was reduced as well when more aggressive shaping has been enforced by the switch regulators, as can be seen in Figure 3 . This result is consistent: the more switches we have, the better reduction in end-to-end delay we get-see Figure 4 . . Our last set of results concerns the inter-cell spacing values, i.e., jitter (Figure 5 ) . Consistently with our previous observations, we see that the cell spacing in a typical time interval a t the 4th link is much better when the streams are highly regulated.
Related Work. Much research effort was devoted to rate control over high-speed networks. Zhang IS] provides an excellent survey. We briefly discuss later papers which are most relevant to our work.
In 131, a comparison is made between two non-work-conserving traffic shaping algorithms, Stopand-Go leaky bucket as a traffic shaper, but they do not discuss the implementation details.
In [6], Zhang proposes an implementation of a regulator based on a modified version of calendar queue [l] . In Zhang's algorithm, when a long burst of cells for a specific connection arrives, the cells belonging to that burst may wrap around the calendar, thus breaking the FIFO order. The simple solution is to have a sufficiently large calendar.
The work most closely related to ours is [5], where a fair leaky-bucket rate controller is proposed. The basic idea in that rate controller is very similar to ours (which was independently developed), i.e., using a per-connection calendar queue; however, there some aspects in which we believe that our algorithm is superior. These include decoupling traffic shaping from link scheduling, which allows our design to have several advantages, such as enabling support for arbitrary priority policies, allowing for simple analyses of the required buffer space and queuing delays, using standard techniques. In addition, our implementation is simpler. 
