Abstract-Meeting quality of service (QoS) requirements for various services in ATM networks has been very challenging to network designers. Various control techniques at either the call or cell level have been proposed. In this paper, we deal with cell transmission scheduling and discarding at the output buffers of an ATM switch. We propose a generalized priority queue manager (GPQM) that uses per-virtual-connection queueing to support multiple QoS requirements and achieve fairness in both cell transmission and discarding. It achieves the ultimate goal of guaranteeing the QoS requirement for each connection. The GPQM adopts the earliest due date (EDD) and self-clocked fair queueing (SCFQ) schemes for scheduling cell transmission and a new self-calibrating pushout (SCP) scheme for discarding cells. The GPQM's performance in cell loss rate and delay is presented. An implementation architecture for the GPQM is also proposed, which is facilitated by a new VLSI chip called the priority contentaddressable memory (PCAM) chip.
I. INTRODUCTION

D
UE to the large bandwidth-delay products involved, the heterogeneous traffic types that need to be supported, and the various quality of service (QoS) requirements imposed on ATM networks, traffic control, and management are difficult and challenging for network designers. For example, real-time traffic requires small end-to-end delay and jitter, while data transfer requires low cell loss rates to reduce the number of retransmissions so as to prevent the network from becoming more congested. Thus, it is necessary to control users' traffic so that network resources can be shared effectively and the QoS can be met for all users.
A queue manager of an output-buffered ATM switch, as shown in Fig. 1 , receives multiple cells and transmits one cell in each time slot. Cells waiting for transmission are stored in a buffer in the queue manager, which will schedule ATM cells' departing and discarding sequences based on both of their required QoS and the system status. During call setup, each connection can be assigned a service class with a delay and a loss priority, as shown in Fig. 2 . Time-constrained services are usually assigned higher delay priorities in order to have a lower Manuscript received May 1, 1996 ; revised December 1, 1996 . This work was supported by NSF Grant NCR-9216287 and the Center of Advanced Technology for Telecommunications in New York State. This paper was presented in part at GLOBECOM'94, San Francisco, CA, November 1994 and at the IEEE ATM'96 Workshop, San Francisco, CA, August 1996.
The authors are with the Department of Electrial c Engineering, Polytechnic University, Brooklyn, NY 11201 USA.
Publisher Item Identifier S 0733-8716(97)03371-4. cell delay, and loss-sensitive services are usually assigned higher loss priorities in order to have a smaller cell loss rate. This classification enables network nodes to support various QoS requirements. Several algorithms dealing with cell scheduling based on different priority service disciplines have been proposed [1] - [5] . For instance, the earliest due date (EDD) scheduling scheme achieves each service class's delay requirement by minimizing the tail probability, the probability that the cell's queueing delay exceeds its delay requirement [3] . Meanwhile, one of the main objectives of scheduling is to achieve fairness in the amount of services provided to competing connections, in which the share of service is proportional to some specified weighting factor. In addition, cells from different connections tend to be interleaved with each other as they are multiplexed 0733-8716/97$10.00 © 1997 IEEE onto a transmission link. Consequently, user traffic's burstiness is reduced, which in turn reduces the network congestion probability and increases the network throughput. Several cell scheduling algorithms that deal with fairness have been proposed [6] - [9] , specifically dealing with the amount of service share provided to each connection. Processor sharing algorithms provide a lower limit of assigned bandwidth and are appropriate for traffic requiring deterministic QoS [10] . However, the assignment of service share (i.e., the weighting factor in the processor sharing scheme) for each connection or grouped connections depends on the required QoS as well as traffic characteristics. Due to the difficulty in determining the weighting factor of a connection associated with its required QoS requirement, we will adopt the EDD scheme to select a class that is to be served and the fair queueing algorithm to achieve fairness based on the allocated bandwidth that could be determined by the call admission control procedure.
Several cell discarding policies have also been proposed. They are complete buffer sharing with pure pushout [11] , partial buffer sharing with nested thresholds [12] for different loss priorities, and the so-called expelling policy [13] . In the pure pushout policy, an arriving high-priority cell can overwrite a low-priority cell if the buffer is full. In the nested threshold policy, a threshold is assigned to each loss priority. Cells of a particular class are accepted to enter the buffer only if the current queue length is smaller than the associated threshold. In the expelling policy, when the number of high priority cells exceeds a predetermined threshold, all lowpriority cells in front of the queue will be expelled until a high-priority cell is found and transmitted. This policy has been proven to outperform the other two policies in the case of two loss priority levels. The pure pushout scheme does not provide guaranteed QoS to low-priority traffic, while the other two schemes have the difficulty of determining proper thresholds for different priorities under time-variant parameters, such as traffic burstiness and offered load. Since each service class has its own required cell loss rate, a devised discarding policy must guarantee each class's cell loss rate requirement. Recently, we have proposed a new cell discarding scheme, self-calibrating pushout (SCP) [14] , which maintains a constant cell loss rate ratio between two loss priority classes while the buffer is completely shared by all service classes.
In this paper, we propose a generalized priority queue manager (GPQM) that is capable of handling the delay and loss priority jointly while satisfying each service class's QoS requirement and achieving fairness for connections in the same service class. The GPQM adopts the EDD scheme to meet various delay requirements and the self-clocked fair queueing (SCFQ) [9] algorithm to achieve fairness. In other words, it handles multiple delay priorities in the class level and fair scheduling in the connection level. The GPQM also adopts the SCP scheme for selective cell discarding. Moreover, by taking advantage of the fair queueing, the GPQM achieves almost identical cell loss rates for all connections in the same loss priority without increasing implementation complexity. It thus meets each individual connection's cell loss rate.
Section II gives a detailed description of the adopted cell scheduling schemes: the EDD and the SCFQ. Section III describes the SCP scheme and the fairness in cell discarding. Section IV shows the performance of the GPQM through computer simulations. Section V presents an implementation architecture for the GPQM, which is facilitated by a new VLSI chip called a priority content-addressable memory (PCAM) chip. Section VI describes the PCAM chip. Section VII gives final conclusions.
II. CELL SERVICE SCHEDULING
Several cell scheduling algorithms have been proposed to achieve fairness, specifically dealing with the amount of service share provided to each connection. Among them, the packet-by-packet generalized processor sharing (PGPS) algorithm [8] is implemented using the concept of virtual time which was introduced in [7] . The computational complexity associated with the evaluation of , however, is not feasible in a broad-band ATM network. Golestani [9] proposed the self-clocked fair queueing (SCFQ) scheme, which is an implementable version of fair queueing. He defined the virtual time as the indication of the progress of work in a real queueing system instead of a hypothetical fluid-flow reference system model. Golestani showed that this approach approximates the optimal fairness found in fluid flow models. The SCFQ scheme, as applied to ATM, is described below.
1) The th cell of connection arriving at time is stamped with a virtual finishing time , which is determined as below, and then placed in the queue (1) with (2) in which is the offered load of connection . The cells in the queue are chosen for service in increasing order of the values of the associated virtual finishing times. 2) , regarded as the system's virtual time at time , is equal to the virtual finishing time of the cell receiving service at time . Although we can achieve fairness among competing connections by scheduling their cells based on their allocated bandwidth using the SCFQ scheme, we cannot meet their various delay requirements. Thus, we will also adopt the EDD scheme to handle various delay requirements between classes.
The EDD scheme works as follows. A cell from connection , arriving at time with local tolerable delay , is assigned a time-stamp value , called due time . When a cell is to be transmitted, the server chooses the one that has the minimum due time value. Here, we show how the SCFQ and the EDD schemes work together. Each arriving cell is time stamped with two values, virtual finishing time and due time . Within each class, cells are sorted based on their value, and are served according to that order, thus achieving fairness among the connections in the same class. The HOL cell in each service class is the one that has the smallest value in that class. When the server is to choose a cell from the HOL cells to transmit, it will choose the one that has the smallest value, thus implementing the EDD scheme.
There is one more interesting feature of sorting cells in each service class based on their virtual finishing time . It can distribute the cell loss evenly among the connections in the same class, thus achieving loss fairness (described in detail in Section III-B).
III. BUFFER MANAGEMENT
A. Self-Calibrating Pushout (SCP)
The main objective of the SCP scheme is to balance the cell loss rates by measuring, in real time, the numbers of discarded low-and high-loss priority (LP) cells such that each priority will meet its cell loss rate requirement (e.g., 10 for low LP and 10 for high LP). Since the SCP automatically balances cell loss rates among different classes, it may simplify call admission control rules when different cell loss rates are to be guaranteed.
The primary advantage of the SCP scheme over the other proposed cell discarding schemes is its capability of automatically calibrating in real time each service class's cell loss rate so as to meet each class's QoS requirement, i.e., it is "selfcalibrating." Many proposed cell discarding schemes rely on finding different discarding thresholds off line based on traffic characteristics and QoS requirements. However, when cell discarding is considered jointly with cell scheduling, finding the thresholds may become very difficult (if not impossible). On the other hand, the SCP scheme can easily be combined with any cell scheduling scheme because cell discarding is determined on line by the measured data from a system, not by statistical prediction like other schemes. We briefly describe the SCP cell discarding algorithms in the following.
Let us define the target cell loss rates of low-and high-LP classes to be and , respectively, and the measured cell loss rates of low-and high-LP classes to be and , respectively. Our goal here is to selectively discard low-or high-LP cells such that the ratio approaches . For example, if is 10 and is 10 , we will try to keep the ratio around 1000 regardless of the traffic characteristics.
A control parameter used in this algorithm, called loss weight, is analogous to the bandwidth weight in weighted round-robin service, where the bandwidth received by each virtual connection (or number of cells transmitted) is proportional to the bandwidth weights. Similarly, in the SCP scheme, the number of discarded cells is proportional to the loss weight of different loss priorities. Now, let us consider two loss priorities, low and high. Let and be the offered load of low-and high-LP traffic, respectively. The total offered load is equal to . The value of loss weight is defined as the following:
For instance, if , and , then the loss weight is equal to 1000. In other words, when 1000 low-LP cells have been discarded, a high-LP cell will be the next candidate to be discarded as the buffer is filled. A variable called CNT in this algorithm is used to keep track of the discardings. Initially, CNT is set to zero. It is incremented by one when a low-LP cell is discarded, and decremented by loss weight when a high-LP cell is discarded.
Another control parameter is the threshold (TH). TH is compared with the number of high-priority cells currently in the buffer. If the number of high-priority cells currently in the buffer is less than TH, cells of both classes will be admitted into the buffer. If it is larger than TH and the CNT value is less than the loss weight, we will discard low-LP cells. From our performance study, it is shown that the target ratio can be kept constant for a wide range of TH values. That means that choosing a TH value to ensure that the SCP scheme works properly is quite relaxed, which eliminates the necessity of adjusting TH for different traffic characteristics. This is because the TH control parameter only provides a coarse calibrating guideline to bring and to the range of interest, while the CNT of discarded cells is the control parameter that fine-tunes the cell loss of low-and high-LP's to ultimately keep at a desired ratio. The algorithm is shown in Fig. 3 .
B. Loss Fairness
As described above, the SCP scheme balances the ratio of the cell loss rates of low-and high-loss priorities. However, our ultimate goal is to ensure that each individual connection's cell loss rate is met, and not just for each service class. Now that the SCP scheme determines a priority level from which a cell is to be discarded, the remaining problem is to select a connection in that priority to discard its cell.
Since there may be up to a few thousand virtual connections in each transmission link, it will be very difficult to control their cell loss rates on an individual basis. Here, we propose a simple method to achieve cell level fair loss rate within same loss priority class. By taking advantage of the fair-queueing scheduling scheme, which provides fair service among connections in the same class, we can also achieve fair loss with bounded variation. As a result, we can guarantee each individual connection's cell loss rate requirement under the SCP discarding policy with a properly designed call admission control policy.
In the following, we explain how fair loss with bounded variation can be automatically achieved by adopting the fairqueueing scheduling policy. Suppose cells arrive at the buffer when the buffer is filled, resulting in cells to be discarded from the buffer. For fairness concern, we expect the portion of loss from connection to be (4) in which is the set of connections that has at least one backlogged cell at the moment of buffer overflow, and is the average arrival rate of connection . Note that (4) and (1) have the same feature in that the amount of discarded (or served) information is proportional to , which suggests to us to select the cell in the position of next transmission as the one to be pushed out. We can imagine the execution of discarding cells as a fictitious momentary transmission (with an infinite link rate). Then, the total served data from each connection, the sum of the actually served and the fictitiously transmitted (i.e., discarded), conforms to the proportional property. As a result, the queue manager will discard cells close to the front of the sorted queue when cells arrive and see the buffer filled. In the Appendix, we show that this discarding scheme can achieve fair loss among connections of the same loss priority with bounded variation. Note that discarding cells from the head of the queue also yields smaller average queueing delay [15] .
IV. PERFORMANCE OF THE GENERALIZED PRIORITY QUEUE MANAGER (GPQM)
A. Simulation Setup and Traffic Model
Four different service classes, I, II, III, and IV, are considered here and arranged in a priority matrix as shown in Fig. 4 (a). Each service class is associated with a logical queue. Cells in the same logical queue are FCFS. The SCFQ is not considered here because we just need to investigate the performance of each service class, while each virtual connection's performance will be met as the SCFQ is applied. The total occupancy of these four logical queues is always less than or equal to the buffer size. The time division multiplexer (TDM) distributes cells from an ATM switch to a particular logical queue according to their service classes. Cells are stored or discarded in the corresponding logical queue according to the rules described in the SCP algorithm. Namely, high-LP cells are always stored in the buffer when the buffer is not filled. However, low-LP cells may be pushed out from the buffer (even if the buffer is not filled) when the number of high-LP backlogged cells exceeds TH and the CNT is less than the loss weight. The server serves the HOL cell from one of the four logical queues that has the smallest due time . The tie of the due time can be broken by random choice.
In this performance study, we assume that the source traffic is bursty and alternates between the active and idle period, as shown in Fig. 4(b) . More specifically, cells arrive in consecutive slots in the active period, and no cells arrive in the idle period. Traffic sources are assumed to be independent. Both the active and idle periods are assumed to be geometrically distributed with mean burst length and mean idle time . Thus, the offered load is given by (5)
B. Cell Loss Performance
In our simulation study, we assume that the buffer size , the total offered load , where and , and the average burst length cells. We also assume that and , instead of and ; this is because the latter will take a very long time for simulations. The traffic mix ratio varies from 0.1 to 10. With TH set to 64, 128, 192, the ratio can be maintained around 1000
, shown by the two curves that are equally spaced in each case. This means when a cell of low or high-LP is to be discarded, most of the time, its associated logical queue is not empty. As a result, the measured loss probability ratio can be kept to the target value. is on the order of 10 and is on the order of 10 . Note that in Fig. 5 , both and increase when is larger than three. This is because, when becomes larger, high-LP cells will dominate. It would be quite likely that cells in the buffer are all high-LP cells when the buffer is filled. Thus, when a high-LP cell arrives and finds the buffer filled, it will be discarded because there are no low-LP cells in the buffer, resulting in increasing. This will make the CNT more negative because the CNT is decremented by loss weight when a high-LP cell is discarded. On the other hand, since most of the time the number of high-LP backlogged cells exceeds TH, low-LP cells are blocked more often. Thus, the is also increased. When is larger than four, and become more stable. Thus, we can conclude that because of TH, the ratio of loss probabilities can be maintained constant. Fig. 6 shows the ratio and versus the traffic mix ratio varying from 0.1 to 1 with TH set to 64 and 128. As expected, the ratio still remains around 1000. The reason that the case of TH equal to 192 is not shown in Fig. 6 is that the value is too small to be obtained from computer simulations in a reasonable time. Also note that the smaller the traffic mix ratio is, the less likely the condition " TH" will occur. For from 0.1 to 0.3, and fluctuate. The reason is the same as before, only now the dominating cells in the buffer are of low-LP class. Note that both and in this case are smaller than those in the case where the traffic mix ratio varies from 1 to 10. This is because low-LP cells prevail in the buffer and are always available when discarding is needed. Therefore, high-LP cells will not be mistakenly discarded due to the lack of low-LP cells in the buffer. This then decreases the number of discarded cells of high-LP and smaller. Meanwhile, since low-LP cells dominate in the buffer, the number of high-LP backlogged cells will usually be under TH and low-LP cells will not be blocked from entering the buffer. As a result, is smaller than that in the case of the traffic mix ratio ranging from 1 to 10.
From Figs. 5 and 6, we notice that, for a traffic mix ratio from 0.1 to 10, the target ratio can be kept constant for a large range of TH values. That means that choosing a TH value to ensure the SCP scheme works properly is quite relaxed, eliminating the necessity of adjusting the TH for different traffic distributions. This is because the TH is the control parameter that only provides a coarse calibrating guideline to bring and to the range of interest, while the CNT is the control parameter that fine-tunes the cell loss of low and high-LP's to eventually keep to a desired ratio. Fig. 7 shows the cell loss probabilities for each service class versus the traffic mix ratio ranging from 1 to 10 and from 0.1 to 1. The TH is set to 64. In both cases, we can see that Classes II and IV (low-LP classes) almost have the same loss rates, while Classes I and III (high-LP classes) are very close to each other. The discrepancy between Classes I and III is due to the difference of their delay priorities. Since Class I cells have higher delay priority, they have a higher chance of being served before Class III cells. As a result, Class III cells will stay in the buffer longer than Class I cells, and have a higher probability of being pushed out upon a turn to discard a high-LP cell. This is why Class III has a larger cell loss rate than Class I. One way to eliminate the discrepancy between the service classes that have the same loss priority is to provide a separate counter for each class. By keeping track of each individual's counter value, we will be able to maintain the same cell loss rate for the service classes with the same LP. However, this will increase implementation complexity. Since the discrepancy is small, we would rather combine the cell loss from Classes I and III, and that from Classes II and IV, to simplify the implementation. 
C. Cell Delay Performance
Let us assume that the 99 percentile delay requirement for high-DP cells is 150 s and for low-DP cells it is 5 ms [16] . For an OC-3 (155.52 Mbit/s) transmission line, each cell time is about 2.83 s. Then, the delay requirements for high and low DP are 53 and 1767 cell slots, respectively. Note that cells from the same class are served in a first-come, first-served manner because their due times are their arrival times plus the same tolerable delay.
We use the delay probability mass function (pmf) for each class to show its delay distributions and see the percentage of cells violating their delay requirements. The pmf usually gives us more useful information in practice than the average delay. Fig. 8 shows the delay distributions of Classes I, II, III, IV, where the traffic mix ratio is equal to one. For Classes I and II with a higher DP, 2% of cells violate the delay requirement (150 s, or 53 cell slots). For Classes III and IV with a lower DP, less than 1% of the cells violate the delay requirement (5 ms or 1767 cell slots). Note that the delay distributions for the traffic mix ratio varying from 0.1 to 10 are similar to Fig. 8 .
V. GPQM IMPLEMENTATION ARCHITECTURE
One of the main functions of a queue manager is to schedule cells' transmission based on their bandwidth share or delay requirement by using various algorithms, such as the SCFQ and the EDD described in Section II. One way to schedule cells' transmission is to time stamp arriving cells, store them in the buffer, and transmit them according to their time-stamp values. This time stamp can be calculated based on any scheduling algorithm. For instance, if the SCFQ algorithm is adopted, the time stamp will be the virtual finishing time . We have designed an ASIC, called the sequencer chip [17] The main difference between these two chips is that the sequencer performs the sorting function and the PCAM chip performs the searching function. The sequencer sorts the time-stamp values such that the smallest one appears at the rightmost side of the chip. Since the sorting is done globally inside the sequencer, it can be done in one clock cycle. The PCAM chip arranges the time-stamp values in fixed locations, and starts searching the first one that has a valid bit. As long as the searching speed is not a bottleneck, this approach is not limited by the VC number or by the buffer size.
A. Searching by the PCAM Chip
The PCAM chip consists of many entries that are addressed by the values (assuming the SCFQ is adopted). Each entry contains a zone bit and a validity bit , denoted as a pair, as shown in Fig. 9 . The bit is used to resolve the overflow problem of the (details will be explained later). The bit indicates whether there is any cell assigned to the . The pairs in the PCAM chip are addressed by different values, and are arranged in such a way that pairs with smaller values are on top of those with the larger values. For those cells with identical values, a logical queue is formed to link all of them, and is called the timing queue. Since cells are arranged by the values, the PCAM chip facilitates the required search function by identifying the first pair that has the bit set to "1." Cells that are associated with the identified pair will be transmitted to the network. For instance, in Fig. 9 , when the at location 2 is found, cells "a," "b," and "c" of the timing queue at will be transmitted in sequence. Once they are all transmitted, its bit is reset to zero, indicating that no more backlogged cells are assigned with . Thus, during the next round of searching, cell " " at location will be chosen and transmitted.
On some occasions, the calculated may overflow, i.e., exceeding the maximum number that the hardware can handle, say for 14 bits of the values. This is because, as time passes, the increases monotonically and eventually exceeds its maximum number. To overcome the overflow problem, we have previously proposed using two sorting devices to store nonoverflow and overflow time stamps separately [18] . Here, we store them in the same device to save hardware by using the bit to indicate whether the calculated time stamp is overflow or not. The definition of overflow here is different from the traditional one. We use a CZ (current zone) bit to indicate the zone of the cells that are currently being served. Whenever the fifteenth bit of calculated (i.e., the overflow bit) has the same value as the CZ, the is defined as nonoverflow. However, if its fifteenth bit has a different value from the CZ bit, the is defined as overflow. Thus, when searching the bits in the PCAM chip, the CZ facilitates the PCAM chip to choose the first bit from an appropriate zone. When all of the bits in the current zone are zero and there is at least one nonzero bit in the other zone, the CZ will be toggled after sending a cell from the other zone, indicating that the service zone is flipped. For example, when all cells " " " " " " and " " are transmitted and the bit at the other zone is found at , the CZ bit is toggled from 0 to 1. From then on, cells in zone "1" will be scheduled before those in zone "0." In conclusion, the searching zone in the PCAM chip is alternating between CZ and . As long as the fifteenth bit of the calculated does not change more than once when serving in the current zone, no cell out-of-sequence problem will occur.
B. Head-of-Line Time Stamp
In order to ultimately solve the time-stamp overflow problem, we will only assign the HOL cell from each virtual connection (VC) the virtual finishing time and due time . Upon the departure of the current HOL cell, the next cell from the same VC that becomes the HOL cell will be time stamped. If cells are time stamped as they arrive at the GPQM, those cells that are from extreme low bit-rate sources and arrive in a burst can easily cause the virtual finishing time value overflow. This is because the term in (1) is very large due to the low bit rate. Although the zone bit in the PCAM chip is designed to handle the time-stamp overflow problem, when there is more than one time overflow from each VC at any moment, the single zone bit fails to resolve the overflow problem. Thus, by assigning the virtual finishing time only to the HOL cell, it is guaranteed that there is only one time overflow from each VC at any given moment. As a result, (1) and (2) can be simplified as (6) with (7) where is the moment at which the th cell of VC becomes the HOL cell. In (7) without loss of generality, we let the system busy preiod start at . If we assume that the minimum rate of any VC is not less than (line rate/ ), where is the number of bits used to represent the value, there should be no more than one time-stamp overflow from each VC at any moment. This assumption ensures that we can maintain a correct order of the time-stamp values by using a single zone bit in the PCAM chip. It has been proven by us that the system that assigns the value only to the HOL cell of each VC operates identically to the one that assigns the to all arrived cells (i.e., the SCFQ scheme).
C. Architecture of the GPQM
The implementation architecture for the generalized priority queue Manager (GPQM) shown in Fig. 10 adopts the SCFQ and EDD schemes for cell scheduling and the SCP scheme for cell discarding. It consists of a cell time-division multiplexer (CTDM), a cell memory, a queue manager controller, four PCAM chips (one for each service class), two microprocessors (i.e., one for calculation and the other for calculation), an idle-address FIFO, and a group of virtual-channel queues (VC queues) and timing queues. In addition, selectors (SEL's 0-5) are used to select the data path properly. The CTDM multiplexes multiple cells from a switch fabric and stores them in the cell memory. The queue manager controller generates the necessary signals to control all functional blocks. In this architecture, four service classes, I, II, III, and IV, are considered.
Newly arriving cells that belong to the same virtual connection (VC) are linked together in a logical queue, called the VC queue. The content of each VC queue is the cell's address in the cell memory and the cell's due time . Only the HOL cell in each VC queue joins the timing queue, where HOL cells that have the same service class and virtual finishing time value are linked together. The content of the timing queue is the cell's VCI and the due time . Each VC queue and timing queue is confined by two pointers, the head pointer (HP) and the tail pointer (TP). The former points to the first cell of the logical queue, while the latter points to the last cell of the logical queue. Timing queues in each service class are handled by the associated PCAM chip in such a way that timing queues with smaller values are served sooner. Detailed operations of the timing queues have been described in Section V-A.
A counter, increased by one in every cell clock tick, is used to record the cell's arrival time and to calculate the cell's due time . Both the due time and virtual finishing time increase monotonically as time goes on, and are reset to zero when the buffer is empty. Here, we use a large number (e.g., 2 ) to accommodate the values. For a cell slot of 2.83 s at the line, a 32 bit value can sustain the busy period of the GPQM up to 2.83 s , or more than three hours. A 48-bit value can sustain the busy period for more than 20 years. Since the values are stored in the memory (VC queues and timing queues), we can afford to use a large number of bits to represent them.
In the following sections, we will explain how cells are written to, read out, or discarded from the cell memory.
1) Write-In Procedure: When a cell arrives at the GPQM, it is first stored in the cell memory at a location provided by the idle-address FIFO, which stores the address of current vacant cell locations. Its VCI (virtual channel identifier) in the cell header is extracted by the microprocessor to calculate the arriving cell's value. The cell's address and value are then written to the corresponding VC queue. If the arriving cell is the HOL cell, its VCI is also extracted by the microprocessor to calculate its value. The value is then used as an address to store the pair in the associated PCAM chip according to its service class and to link the arriving cell's VCI value and value to the corresponding timing queue.
2) Read-Out Procedure: According to the scheduling policy described in Section II, when a cell is to be transmitted, the smallest value in each class will be identified from each PCAM chip. These four values are used to access the HOL cell's value from the selected timing queue in each class. They are then compared, and the VC that has the smallest is chosen to transmit. Its VCI is then used to access the cell's address from the VC queue and read out the cell from the cell memory. In the mean time, its address is stored in the idle-address FIFO for next arrivals. As soon as a cell is transmitted, the value is returned to the microprocessor to calculate the value of the next HOL cell in the same VC queue. The new value is used as the address to store the pair in the associated PCAM and to store the and VCI to the timing queue. The queue manager controller will check whether or not the timing queue, from which a cell just departed, is empty. If the timing queue becomes empty, its associated bit in the PCAM chip will be reset to zero. The queue manager controller also keeps track of the CZ (current zone) bit. Whenever there are no more cells that can be found in the current zone and there is at least one cell in the other zone, the queue manager controller will toggle the CZ bit after sending a cell from the other zone, meaning that, from now on, the search will start from the other zone.
3) Pushout Procedure: The procedure of discarding cells is similar to that of transmitting cells. The queue manager controller uses the SCP scheme to identify the service class from which a cell is to be discarded. The HOL cell of the class, i.e., the one with the smallest , will be identified from the associated PCAM chip and discarded, thus achieving the loss fairness described in Section III-B. Once the cell is discarded, its address becomes available to the next arrivals and is stored in the idle-address FIFO.
VI. PRIORITY CONTENT-ADDRESSABLE MEMORY (PCAM) CHIP
A. Block Diagram Fig. 11 shows the PCAM chip's block diagram. It consists of a CAM array, a few selectors, one I/O interface, two inhibit circuits, two 7-to-128 decoders, two 128-to-7 encoders, and the chip controller.
The CAM array has 32 K bits arranged into 16 K modules, each module with two bits. The input data (two bits) are written into the array through the input bus IN[0:1] with an address provided on the input address bus X[0:13]. When a pattern in the array is to be searched, the searching pattern is fed through Table I .
When the CAM array is accessed for either a write or read operation, the first seven bits of the address are the column address, and the rest are the row address. The row decoder (7-to-128 Decoder) decodes the row address to enable one of column select lines, a module is identified. The column select line enables the bit-line pair, which routes the addressed data through the I/O data bus.
When the PCAM chip is operated at the search mode, each module in the CAM array will compare its internally stored data with the broadcast pattern through the IN[0:1] bus. If matched, the associated row match line and column match line will be asserted. If multiple matches occur, the row inhibit circuit will select the one that is closer to the top of the list (RM0-RM127) and forward the match enable lines (ME0-ME127), among them only one being enabled, to the CAM array so that only those modules on the enabled line will be able to participate the column search. Again, all 128 column match lines (CM0-CM127) are sent to the column inhibit circuit, which will then choose only one that is closer to the left of the list (CM0-CM127). Finally, two 128-to-7 Encoders simultaneously encode the results from both row and column inhibit circuit to the address output Y [7:13] and Y[0:6]. The PCAM chip uses parallel hardware in the inhibit circuit to identify the topmost or the leftmost bit, resulting in only a few gates delay in the inhibit circuit.
Note that the PCAM chip is different from the traditional CAM chip. First, the PCAM chip is deep and narrow (16K 2), while the traditional CAM chip is shallow and wide (e.g., 1K
64). Second, the PCAM chip gives out the address of the pattern that is found, while the traditional CAM gives out the data field associated with the pattern. Third, the PCAM chip can write the pattern to a specific location associated with the time stamp, while the traditional CAM chip cannot. Fourth, the PCAM reduces wiring complexity by searching and encoding the first matched pattern through vertical and horizontal dimensions, thus only processing match lines instead of match lines ( is the size of the PCAM chip).
B. Chip Operations
There are four different operations for the PCAM chip: initialization, write-in, search, and read-out. As shown in Fig. 12 , the initialization is started by asserting the "INIT" and setting input bus to . Following the initialization, three pairs of are written to the locations , and . Let us assume that ( means the value of ) and belong to the overflow zone, meaning that their value is different from the CZ value. Following the three writein operations is a read-out operation. The at location is accessed by asserting the "READ" signal and setting the input address value to . After the read-out operation, there are two search operations. First, the search patterns (IN[0:1]) are set to (1, 1) . The chip will search the first valid cell in zone one (i.e., ). Among the two matches, since the address value is less than , it is chosen and its value appears 
C. Connecting Multiple PCAM Chips
Multiple PCAM chips can be connected to accommodate a larger value. The connection is achieved by using the "CS" and "HIT" signals. Fig. 13 shows a system with four PCAM chips connected in parallel, which can accommodate a maximum value up to 64 K (2 or 4 2 ) with each chip handling 16 K (2 entries. The most significant bits, X [14:15] , are decoded to determine the page to which the timing queue belongs. X[0:13] is connected to all chips. During the read-out/write-in operations, only the PCAM chip that has "CS" signal asserted through the page decoder will be 
VII. CONCLUSION
QoS guarantee is very important and challenging for ATM network designers. In this paper, we proposed a generalized priority queue manager (GPQM) of an output-buffered ATM switch, which can handle cell scheduling and selective discarding so as to meet each connection's QoS requirement. The GPQM adopts the earliest due date (EDD) and the self-clocked fair queueing (SCFQ) schemes for delay control and fair queueing. It also adopts the self-calibrating pushout (SCP) to determine a service class to discard cells. The SCP has the capability of maintaining the cell loss rate ratio between loss priority classes, while fully utilizing the memory and eliminating the difficulty of setting thresholds that are normally required by other discarding schemes. Both delay control (by EDD) and cell loss rate control (by SCP) are applied on a class basis. By taking advantage of the fair queueing, each connection's service amount and number of discarded cells are proportional to its allocated bandwidth. This effectively achieves the ultimate goal of meeting each individual connection's QoS requirement. The performance of the GPQM was studied through computer simulations, and was shown to be satisfactory.
We presented an implementable architecture for the GPQM, facilitated by a new VLSI chip (PCAM). The proposed architecture can implement any scheduling algorithm that schedules cell transmission based on time-stamp values. The PCAM chip has several advantages over the traditional CAM chip. 
where (9) in which represents the closest integer value not greater than . With a fixed range of values of in (9) , is the maximum possible number of cells from connection whose values fall into that range. Now, let be the actual integer number of cells from connection among cells. Expression of in terms of can be obtained by considering an example in Fig. 14 Fig. 14(a) . In addition, cells from different connections may have an identical value and result in a smaller . For example, Fig. 14(b) shows three cells with an value of 10 and two cells with an value of 20. Assume that the buffer is filled after two cells with an value of 10 have departed. Since the cell of connection with an value of 20 is outside the discarding region (the first cells), is zero. From the above observations, can be expressed by with a variation of at most two cells, i.e.,
If we define an integer value as the summation of all , then from (8) (11) Here, by the fact that and from (11), we obtain (12) Then, from (12), (8) becomes (13) By assigning the minimum value in (13) , (10) becomes (14) Equation (14) implies that, for every connection is proportional to with respect to a specified amount with variation bounded by one. Notice that this variation is independent of the value. Comparing in (14) to in (4), in (4) is replaced with in (14) due to the nature of a discrete system. 
