Abstract-In this paper, we introduce a new approach to ATM switching. We propose an ATM switch architecture which uses only a single shift-register-type buffering element to store and queue cells, and within the same (physical) queue, switches the cells by organizing them in logical queues destined for different output lines. The buffer is also a sequencer which allows flexible ordering of the cells in each logical queue to achieve any appropriate scheduling algorithm. This switch is proposed for use as the building block of large-scale multistage ATM switches because of low hardware complexity and flexibility in providing 
I. INTRODUCTION
T HE design of an ATM switch architecture with minimum possible hardware complexity and maximum possible performance in throughput, buffer usage, and quality of service parameters such as delay and cell loss has been a challenge for the past few years. Output queueing, complete buffer sharing, and scheduling capabilities for a guaranteed quality of service are considered as major characteristics to be achieved by a successful ATM switch [1] , [2] .
Output buffering guarantees maximum throughput and buffer sharing reduces the amount of buffering space, which means less hardware complexity and lower cost [3] . While many switches use output buffering, not all of them achieve a high degree of buffer sharing. Full buffer sharing can be achieved in RAM-based shared-buffer switches, but it comes with a large overhead in terms of hardware complexity, to control the buffering mechanism and to keep track of cells and queues, as well as the free spaces in the buffer [12] .
When it comes to queue scheduling, there is always inflexibility in implementing an arbitrary algorithm. The scheduling Manuscript received May 1, 1996 ; revised December 1, 1996 . This work was supported by an operating research grant from the National Science and Engineering Research Council of Canada. This paper was presented in part at INFOCOM'97, Kobe, Japan.
The authors are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ont., M5S 3G4, Canada.
Publisher Item Identifier S 0733-8716(97)03366-0. mechanism has to fit within the constraints of the switch architecture or it will need extra hardware, or even a revision in the hardware which considerably increases the complexity. Considering the RAM-based architecture again, the output queues are originally FIFO's. To have priority-based queueing, a group of FIFO's needs to be used for each output line. And still, only a limited number of priority levels are possible for each output line. It is even harder to make any changes in scheduling when a switch was designed based on some particular scheduling scheme. This is usually possible only by redesigning the hardware. There have been some programmable scheduler proposals in the literature, but with little success because of the very high hardware complexity which makes them impractical [5] .
Scalability is another major issue. It is not always possible to increase the number of input and output lines easily without reconstructing the whole switch. This is very clear in banyanbased switches. In RAM-based switches, the buffering section is independent of the number of I/O lines, but still, the constraint is applicable because of the extra queues which should be implemented in the control section of the switch.
Increasing the buffering space also is not an easy task in switches. In RAM-based switches the buffering space can be increased by using larger RAM's, but in most of the other switches, more changes in the hardware are required.
In this paper, we describe a new ATM switch architecture which achieves the above goals. The switch is output buffered and offers full buffer sharing without extra overhead. The number of input and output lines can vary without affecting the switch core. The buffering space can be increased by simply cascading the buffering elements. The scheduling mechanism implemented in the switch is software programmable, and is flexible enough to realize a wide range of scheduling algorithms.
The architecture of the switch is based on the cell sequencer architecture which we introduced in [13] . In the sequencer, cells are stored and queued in a shift-register-type buffering element. The structure of the buffer allows the cells to travel in the queue while finding their appropriate places in the queue based on their priority level, age, or any other consideration. In this way, it is possible to distinguish groups of cells with some common characteristics as logical queues within the same physical buffer. It is then possible to schedule the service among these logical queues by appropriately organizing the logical queues in the buffer. Organizing the cells in the logical queues and the logical queues in the buffer are both achieved by adding appropriate tags to individual cells before entering the sequencer. As a result, a selected scheduling scheme can be implemented by translating the algorithm into an appropriate tagging method.
In this paper, we extend the same architecture to introduce a new and totally different approach to ATM switching. We propose an ATM switch architecture which uses a single sequencer-type queueing buffer for all of the output lines. Cells destined for different output lines are organized in different logical queues within the same physical buffer. But, unlike the sequencer, these logical queues are interleaved into the single queue. Each of the logical queues can have other logical queues inside it as described in the sequencer.
A major characteristic of this new switch, in addition to achieving the above-mentioned goals, is its simplicity. The core hardware of the switch is just a shift-registertype buffering element. The buffering element has a modular structure, and is composed of a collection of an arbitrary number of simple one-cell-size buffering units which contain a very simple logic part.
Storing the ATM cells in large shift registers could be considered as a disadvantage for the single-queue switch in comparison to the RAM-based switch where the cells are stored in RAM memory. Utilizing the RAM technology has been considered as a major advantage for RAM-based switches despite the limitations mentioned for this architecture.
We will show that the RAM-based architecture can be augmented with the single-queue switch as its controller/scheduler. Cells are stored in RAM, and minicells containing a pointer and the tag section are sent to the controller instead. The resultant switch benefits from the advantages of both the single-queue architecture and the RAM-based architecture while relaxing the queueing overhead and scheduling restrictions existing in current RAM-based architectures. The hardware of the single queue is also minimized in this way by using smaller buffering areas and fewer shift registers.
We considered the full buffer-sharing characteristic of the single-queue switch as a major achievement. It is still very important when the switch is used in RAM-based architecture even though the buffering section (RAM memory) is already shared in this architecture. The reason is that the buffering space used to store the minicells in the output queues in conventional RAM-based switches is very high, and is even comparable to the space used to store the cells themselves for a large number of input lines and priority levels. This is because this space is not shared among the queues, unlike the RAM which is shared among the output queues. The buffer requirement for the controller part can be reduced by a factor of , in which is the number of output lines and is the number of priority levels for each output line, if it is shared by the output queues assuming that in the worst case, each queue should be able to accept all of the traffic [6] .
To achieve this goal, Kondoh et al. [6] proposed a switch architecture in which the cells (minicells) destined for different output lines are all put in the same physical buffer. The cells are identified by a tag attached to them, showing their destination. Even though this idea sounds similar to our approach, the resultant switch architecture is quite different from the single-queue switch, and achieves none of the major goals we mentioned earlier, except for reducing the buffering space. In this architecture, cells are put in the common buffer in the order of their arrival. The buffer is a searchable queue in which a common search circuit and a common data bus are used to find the first matching cell for each output line at each cell time.
Even though the buffering space can be expanded easily in the single-queue switch, it would still be expensive because of the high speed of operation. Buffer requirements can be reduced by using the switch in the framework of a general model for ATM switches in which the size of required buffering space in the fabric is kept low by using extra buffering space at the inputs of the switch to absorb large bursts. Fig. 1 shows our approach to an ATM switching system based on this model. In this model, the switch fabric is an output-buffered shared-memory switch with the capability to appropriately schedule cells without being affected by the hardware constraints (Fig. 1 ). An internal flow control (backpressure) mechanism is used to control the flow of the cells into the fabric based on the feedback from the internal buffers. Also, an appropriate scheduling scheme is employed to service the cells in the input buffers to prevent degradation in QoS because of input buffering [4] .
Buffering the cells at the inputs results in a significant improvement in reducing the output buffering capacity needed to attain the required cell loss probability. Also, increasing the capacity is cheaper in input buffers because the speed in the input buffers is lower than the speed in internal (output) buffers [4] . We consider the sequencer-based architecture as the best solution for the input buffer because of its scheduling capabilities. We propose the single queue to be used as the fabric of the switching system as well, mostly because of its simplicity and efficiency in hardware, and its scheduling capabilities.
As a further step, to build very large switches with a reasonable hardware complexity, we propose the single-queue switch to be used as the building block for multistage switches. As we will explain later, the unique characteristics of the single-queue switch make it an excellent solution for multistage switches. In the following, we will first review the sequencer architecture. The single-queue switch will be introduced in detail in Section III. Both simple FIFO queueing and priority-based queueing will be presented. In Section IV, we will discuss the buffer sharing, throughput, scalability, and finally the complexity of the switch. We will emphasize the application of the switch in large-scale multistage switches, as well as RAM-based switches in Section V.
II. THE SEQUENCER ARCHITECTURE
The single-queue switch architecture is based on the sequencer architecture described in [13] . The following is a brief review of the sequencer architecture and its function. The sequencer consists of a chain of buffering units. Each buffering unit can accommodate one cell. Cells can travel from one unit to the adjacent units in forward and backward directions (Fig. 2) . The sequencer is designed to allow the cell with higher priority to move forward, leaving the lower priority cells back in the queue. At each cell time, a new cell enters the queue from the head of the queue. As in the Star Burst architecture [11] , the cell is compared to the cell at the head of the queue, and the one with higher priority is sent out as the winner. The other cell, the loser, is sent one step back inside the queue where it is compared to the cell in that unit. Again, the winner is sent forward, to occupy the unit at the head of the queue, and the loser is sent backward to the next unit in the queue. This procedure is repeated spreading into the buffer, like a wave, until it reaches the last unit. In this way, the winner of the unit is always forwarded to the unit (the forward path), and the loser to the unit (the backward path) as shown in Fig. 2 .
Regardless of the situation of the traveling wave carrying the cell inside the queue, the next cell enters the buffer just after the previous one, generating a new wave. The events inside the sequencer are exactly synchronized. Therefore, whenever the new wave arrives at unit , the outcome of the previous wave is already in the unit, and the comparison can start immediately. Each flying cell moves backward into the queue until it finds its right place in the queue, pushing other cells back in the queue.
A blocking capability can be implemented in the sequencer, so that the cells can enter the queue even if there is no outgoing cell. To realize this capability, in case of blocking, the winner in the comparison in each unit remains at the same unit instead of being forwarded to the adjacent unit. The loser is sent backward as before.
The basic achievement of the sequencer architecture is the ability to have several logical queues within the same physical queue. It is also possible to build sublogical queues within each logical queue. In this paper, we will explain how this capability can be extended to a new dimension, in which we not only put the logical queues destined for the same output line in the same queue, but also merge the queues of different output lines in a single physical queue.
A. Scheduling and Programmability
In comparing the cells, the priority is determined based on the tag of each cell. Cells are tagged before entering the sequencer. Tagging can be done in different ways to achieve different purposes. The tagging criterion has to be derived from the desired scheduling algorithm which could satisfy the QoS requirements of the connections.
The basic mechanism of sequencing is flexible enough to perform different scheduling schemes by only changing the tagging algorithm. As an example, fair queueing [7] , [8] and priority-based algorithms [9] can be handled by the same hardware. In the first case, the tag reflects the service time of a cell, which is determined upon its arrival and tagged on it, and the cells are served in the order of their service times. In the second case, the tag reflects the priority level of a cell, and the cells are served in the order of their priority levels.
The tag can be composed of several distinct fields, each controlling a relative aspect in sequencing the cells. For example, an age field can be used, in conjunction with the priority field, to arrange the cells with the same priority in the order of their ages (Fig. 3) .
Besides this, some programmable options could be implemented in the hardware, controlled by flag bits in the tag. For example, discarding over aged cells, aging, and joining the priority and age fields together could be enabled or disabled by using flag bits in the tag field of each cell [13] .
As a further step, regarding the advances in the field of FPGA, programmable logic could be used for the logic section of the units in the sequencer, allowing a wider range of selections for the function provided by this section. A general chip could be manufactured and customized for specific switch designs by appropriately programming the logic section. The length of comparator, length of age and priority fields, position and function of other fields in the tag, and other features could be selected in this way.
Besides all of these options, the fundamental mechanism of sequencing could be employed to design other circuits. The switching circuit which we explain in this paper is a good example of the potential use of this architecture. Fig. 4 shows the overall block diagram of the single-queue switch system. At this point, we consider a switch with an equal number of input and output lines , all of them with the same speed. As in any other ATM switch, appropriate header translation is required after the cells enter the switch system. In the next stage and in a round-robin fashion, all of the input lines are scanned, and their cells, if any, are sent to the tagging unit one after the other. This is similar to the RAM-based switch where the cells are written into the RAM one by one. This could become a bottleneck. In the RAMbased switch, the speedup required to overcome this bottleneck is achieved by handling the cells in parallel format [10] . The same approach is used in the single-queue switch to write the cells into the tagging unit and the buffer. Therefore, cells are first changed into parallel format before being scanned by this section.
III. SINGLE-QUEUE SWITCH ARCHITECTURE
The operations in the tagging unit and after are pipelined. After the tagging unit, the cells enter the sequencer. The latency in the sequencer is very small, and is equal to the delay in the operation of the first unit only. When a cell enters the sequencer, the winner of the first unit leaves the sequencer after a short delay, while the loser would experience a long delay during a long journey in the sequencer. The departing cells then enter the output stage, where operations such as multicasting can be performed. Finally, the cells are sent to the output lines in serial format, after the section. In the tagging unit, a tag is added to the header of the cell, based on the selected scheduling scheme, and also on the information held in a lookup table for the virtual channel to which the cell belongs. A local controller programmed by a processor which is connected through the interface logic controls the tagging. In this way, software can determine the operation of the controller and the scheduling scheme which it implements. The controller can also receive information from the output stage and the sequencer regarding the last serviced cell and the queue length to implement scheduling schemes such as generalized processor sharing [7] and self-clocked fair queueing [8] . 
A. Switching Mechanism
As we saw before, the sequencer architecture has the capability of organizing the logical queues within the same physical queue. In our single-queue switch architecture, we use the same capability to interleave the queues of different output lines into the same physical queue. The interleaving mechanism is as follows. The sequence of cells is divided into groups. Within each group, the cells of outputs 1 to are placed in the first up to the th positions. The first group contains the first cell of each logical (output) queue. The second group contains the second cell of each logical (output) queue, and so on. The interleaved sequence of the cells looks very much like a TDM signal (Fig. 5) . But unlike an ordinary TDM signal, in this sequence, if the logical queue of one of the output lines has only cells, there will not be a reserved place for that output in groups and up. In this way, in each group only outputs which have a cell for that group will occupy a place. This does not mean that new cells cannot be inserted in the groups. For instance, in the above example, if a new cell comes in for the mentioned output, it will be inserted in the appropriate place in the group , pushing the rest of the cells one step back in the queue. As a result, in this switch, all of the buffering spaces are shared by all of the input as well as output lines. Each unoccupied place in each group will be used by others. However, a newcomer cell destined for the output related to an unoccupied space will have priority to use the place. This cell will push the rest of the queue one step back in the queue, and will have a push-out property only on the queues that are already full. In this case, the discarded cell will be the one with the lowest priority. Fig. 6 illustrates this situation, assuming that the output is blocked for duration of one cell time. If the queue has the situation shown in (a), and a new cell for O3 (output number 3) comes in, the result will be as (b), while the last cell of O4 is pushed out because O4 has already occupied all of the existing places in the buffer. Normally, at each cell time, each output with a nonempty queue will have a departure. As a result, one group will leave the queue at each cell time. Therefore, the final situation in the above example will be as (c).
There will be a push-out only if the physical queue is already full, the outgoing group is not full, and the number of its cells is less than the number of newcomer cells at that time slot, and the newly arriving cells do not include the missing cell in the outgoing group (Fig. 7) .
In an switch, up to cells enter the switch at each cell time. On the other side, one group leaves the switch at each cell time. The cells are sent to their related output lines using a synchronous round-robin scheme. The switch can have up to departures at each cell time. This will happen only when the HoL group has one cell for each output line, otherwise, one or some of the outputs will not be utilized during the time slot. But still, the group will be sent out during the same time. The output will be idle during the missing cells' turns. Obviously, the queue will not move forward during this time period, but this will not harm the operation of the queue because of the aforementioned blocking mechanism.
B. Grouping Mechanism
As was shown in the previous section, the grouping mechanism has a key role in extending the logical queueing mechanism from a single-output queue to a multiple-output switch. In fact, in the single-queue switch, switching and queueing are both achieved by the grouping mechanism. In this section, we will explain the grouping mechanism in more detail. For simplicity, at this point, we assume that the output queues are simple FIFO queues. In the next section, we will extend the mechanism for priority-based queues.
Based on the operation of the sequencer which we explained earlier, each cell entering the sequencer carries a field in its tag which indicates the output line which is its destination. Each cell carries also two flags, each represented by one single bit in a different field of the tag. Let us call these flags " " and " ."
The " " flag of a cell in the queue indicates whether or not the cell is the last cell of a group. Groups are recognized and distinguished from each other by setting the " " flag of the last cell of each group to "1." The " " flag of a cell which occupies an empty unit is always set to " ." This, in fact, starts building the groups at the beginning or adding new groups to the bottom of the queue later. From now on we will refer to the " " flag of a cell already settled in a unit as " " to distinguish it from the " " flag of a flying cell entering the unit in the backward path. We will use the same notion for other flags too.
To show that cells can be grouped as we explained before, we start with a simpler method in which the order of the cells within the group is not fixed. We also assume that the output is blocked so the queue does not move forward, and therefore no cell leaves the queue.
In a group, a newcomer cell first looks for the cells with the same address tag. In this method, such a cell could be anywhere in the group. Therefore, the cell examines all of the cells in the group from the beginning of the group until it reaches a cell with the same address or the end of the group, whichever comes first. The cell always examines the " " flag of the cells in the units to recognize the last cell of the group.
If the cell meets a cell with the same address during its journey in the group, this means that it should continue its journey to the next group, to find the bottom of its related logical queue. The cell memorizes this, which is done by setting its " " flag to "1" (Fig. 8, branch III) , and continues its flight to the end of the group which is recognized by examining the " " flag of the rest of the cells in the group (Fig. 8, branch  II) . When this cell meets a cell with the " " flag set to "1," it resets its " " flag to "0" and goes to the next unit, which is the beginning of the next group (Fig. 8, branch I) .
If the cell meets a cell with the same address which itself is the last cell of the group, it does not need to set its " " flag to "1" because it is already at the end of the group. The cell goes to the next unit without any change in the flags (Fig. 8,  branch IV) .
If the cell meets a cell with a different address which is not the last cell of the group, it goes directly to the next unit to continue its search (Fig. 8, branch V) . If the cell does not meet a cell with the same address in the group and reaches the end of the group, it means that the cell must stay at this group as the last cell of its related logical queue. In order to do this, the " " flag of the last cell of the queue is reset to "0," and the " " flag of the new cell is set to "1" instead (Fig. 8, branch VI) ; then the cell continues its flight to the next unit, and stays at that unit as the last cell of the group, as well as the last cell of its related logical queue. To guarantee this operation, a cell whose " " flag is "1" always occupies the unit (Fig. 8, branch VII) . In this method, a new cell always enters the unit at the bottom of the first group that does not have a cell for the same destination, and the cell in that unit, which is the first cell of the next group, is sent out. This cell, pushed out from the beginning of the group, eventually settles at the bottom of the same group in a similar way. This procedure repeats as a chain phenomenon until the end of the physical queue. Now, we consider the method in which the cells inside the groups are in order, that is in increasing order from 1 to , based on their destination addresses. We still assume that the queue is blocked. In this method, a flying cell not only looks for the cells with the same destination address, but also examines whether or not their address fields are greater than its own address field. Since the addresses are in increasing order, if the cell with " " equal to "0" meets a cell with a larger address without meeting a cell with the same address, it means that there is no cell for the same destination of the flying cell in this group. The cell must stay at that unit, as the last cell of its logical queue. The cell occupies the place sending out the cell in that place. This will start a chain phenomenon again which will push all of the cells one step (one unit) back. In all other cases, the procedure is similar to the previous method. Fig. 10 shows the required modification in Fig. 8 to cover this method.
Finally, we consider the case that the queue is not blocked (the normal case). In this case, a new cell may enter the queue at the middle of a group (the HoL group). In order for the algorithm to be consistent, the " " flag of the cell entering the queue should be set to "0" or "1," respectively, if the cell is entering the queue before or after the time slot belonging to its related output line. This is done easily because the output stage of the switch operates synchronously, and it has an assigned time slot for each output line during which the cell destined for the output line departs or the queue remains idle (blocked) Fig. 9 . Virtual structure of a logical queue.
if there is not such a cell in the group. The " " flag must always be "0" when a cell enters the queue.
C. Priority-Based Logical Queues
The logical queues explained so far were simple FIFO queues. A newcomer cell in those queues was always put at the bottom of the logical queue. In this section, we will show that priority-based queueing can also be realized in this architecture.
In fact, despite the fact that all of the logical queues are interleaved in a single physical queue, according to the mechanism explained in the previous section, each logical queue can be virtually viewed as an independent queue built on an independent sequencer. And since the basic structure of the hardware backbone is the same as the sequencer structure [13] , briefly reviewed in Section II, we can easily implement the same functions as in the sequencer on top of the interleaving mechanism.
So, let us assume that we have a single logical queue, with sequencing facilities (Fig. 9) . Now, all we need is an appropriate tagging algorithm. Obviously, the sequencing mechanism should take place after a flying cell finds the unit which belongs to its logical queue. We can summarize the whole procedure after a cell enters a unit as the following.
-The address field of the cell is compared to the address field of the cell in the unit. -If the two addresses are different, the procedure continues as explained before. -If the two addresses are the same, the priority fields of the two cells are compared to each other before any other action. -If the flying cell is the loser, the procedure continues as before.
-If the flying cell is the winner, it enters the unit and pushes its cell out, at the same time the " " flag of the loser (now in the backward path) is set to "1." Also, if the loser was the last cell of the group (" " was "1"), now its " " is set to "0" and the " " of the winner (now in the unit) is set to "1" instead. -After this point, the situation is similar to the one in the previous section, so the same procedures happen again. The whole procedure is shown in Fig. 10 , and is summarized in Table I .
IV. SPECIFICATIONS

A. Buffer Sharing
The single-queue switch architecture allows full buffer sharing for the output queues. In this architecture, the buffering spaces are not dedicated or reserved for a specific queue. The buffer is filled from top to bottom. No space is left empty. When a new cell arrives, it is inserted in its right place, and the rest of the queue is pushed one step back to use the next empty unit from the buffering space (Figs. 5 and 6 ).
Since the logical queues are filled independently, a shorter queue can grow independently of the situation in the other queues as long as there are empty units in the buffer. If the buffer is full, a shorter queue still grows independently, but in this case, other queues which occupy the rest of the physical queue will lose their cells in the bottom of the queue. So we can conclude that the buffer-sharing mechanism will show the push-out property if the buffer is full (Fig. 7) .
In this way, there is a minimum guaranteed space available for each output whose size is cells, where is the total buffer size and is the number of output lines, and each output queue can use free space on top of this number as long as the other outputs do not utilize their own shares of the buffer.
B. Scalability
The Switch Size (N): The number of input and output lines in this architecture can vary independently. The core switch architecture is completely independent of the number of input and output lines if the internal speed is high enough to carry excessive traffic.
The mechanism of grouping is totally independent of the number of the input lines; the aggregate flow of incoming cells enters the queue regardless of their origin. Regarding the output lines, the length of the group will change if the number of output lines changes. But the group size will be automatically adjusted to the number of the output lines. Only the output stage should be modified to handle the added lines.
The Buffer Size (B): The size of the buffer (maximum queue length) can easily grow by adding the number of buffering units in the switch. There is no impact on the operation of the switch. The speed of operation is completely independent of the buffer size. The maximum number of the units in each single chip switch would be limited by the technology used in manufacturing the chip. But still, the buffer size can be increased by cascading the chips to each other. The structure is modular and the chips are cascadable. This feature is more feasible in the case that the internal path width is not too high. Otherwise, the limitation on the number of pads would be a problem. Fig. 4 shows the overall block diagram of the single-queue switch system. The input and output stages are required in all switching systems in various forms, but with almost the same complexity. The tagging section is the only major difference in this switch. This section can be merged with the header translation section which usually uses a lookup table RAM by using a larger memory and extended header attachment. An additional local processor would be required in conjunction with the controlling processor to determine the tags dynamically for the cases that use complicated scheduling algorithms. As usual, the switching system relies on an attached computer, for required communication processes, as a controller. This controller, run by software, can program the local hardware-driven processor based on the selected scheduling algorithm.
C. Complexity
The core switching section of this switch, the sequencer, is very simple in structure in comparison to the other switches. The structure is modular, and is composed of a chain of similar small units. Each unit is made up of a cell storage section and a logic circuit which controls the comparison operation. The logic section is not considered as a major factor from the VLSI technology point of view. As shown in Table I , the comparison logic could be implemented using two comparators and a combinational circuit with the outputs of the comparators and " ," " ," and " " as its inputs and the direction selector signal and new values of " ," " ," " ," and " " as its outputs.
The storage section is a -bit-width shift-register-type memory whose length is , in which is the total cell length including the header and is the internal data path width. This section has a simple structure because addressing is not required for the buffers. Data bits or words are simply shifted in the buffer, which could be implemented by customizing the available memory technology. This section, which, in fact, is the dominant part of the switch, could be minimized, making possible implementation of larger circuits in VLSI by combining the switch with a RAM memory to store the main body of the cells. We will elaborate on this in the next section.
V. APPLICATIONS
A. Concentrator and Expander
Since the core switch architecture is independent of the number of input and output lines, the switch can have a different number of inputs and outputs. Therefore, it can be used as a concentrator or expander. Assuming that the number of input lines is and the number of output lines is , the physical queue will have groups of cells. And if and , where and are integer numbers, " " groups will be built every " " cell times. If and the input traffic could exceed the output traffic if the line speed is the same. If the concentrator is used for statistical multiplexing, the speed of the output lines could be lower, based on the characteristics of the traffic. Otherwise, to be able to handle the worst case traffic, the speed of the output lines have to be . If , then and the input traffic is always less than the output traffic if the speeds are the same. The output line speed can be as low as , still being able to handle the worst case traffic.
B. Multistage Switching
Because of its simple structure and flexibility in handling a different number of input and output lines, the singlequeue switch can be used as the building block of multistage switches. A wide range of configurations is possible, and in each stage, any group of lines can be selected to be handled by a single switch.
If a priority-based scheduling scheme is used, the priority level of the cell is the same in all of the stages. Therefore, a single tagging section before the first stage of the switch is sufficient for the whole switch system. This further simplifies the structure of the multistage switch. In this case, the age field of the tag can be preset at the beginning if a global age control mechanism is preferred; otherwise, it can be set to its initial value at the entrance of each subswitch without a major impact on the hardware complexity.
C. RAM-Based Switches
The single-queue switch architecture can be used as the centralized controller in RAM-based switches. In this case, the original cells are saved in the RAM, and the minicells are sent to the switch (controller) instead. The minicells are composed of the tag of the cell and a pointer which addresses the original cell in the RAM (Fig. 11) .
The size of the buffer in each unit and the width of the data path are reduced to the lengths of the minicells. The advantage of the smaller size of the buffering section of the units is the feasibility of a very large number of units in the buffer, since most of the area in each unit is covered by the storage section. Also, using a smaller number of data lines reduces the complexity, as well as the number of pads required to make the chips cascadable.
A major advantage of using the single-queue switch as the queue controller in a RAM-based switch is the capability to have any number of priority levels and the flexibility to use a wide range of scheduling algorithms while still benefiting from the RAM as the major storage resource in the switch. It is worth noting that in this new architecture illustrated in Fig. 11 , there is no need to have hardware which controls the virtual queues of chained cells for output lines or their priority levels. Usually, a set of pointers and a linking mechanism or a set of FIFO's is used for this purpose in RAM-based switches [12] . It is also worth mentioning that when a set of FIFO's is used for the priority queues of the output lines, since these buffers are not shared, to achieve a certain cell loss ratio, the total buffering space required for the controller part could be comparable to the buffering space required for the cells themselves. Sharing these buffers in the single queue reduces the buffering space by a large factor, as mentioned in the Introduction.
Another advantage of the RAM-based architecture is its applicability to non-ATM broadband packet switching, where the packets are longer and possibly not of fixed length. In this case, the principle of the ATM switching can be achieved by adding a higher level to the ATM switch, which breaks the packets into the cells and uses the ATM switching facility for their switching. This method has its own disadvantages. Using the RAM-based architecture reduces these disadvantages, including the overhead caused by the cell headers, because in this case, the whole packet can be written in the RAM, and a fixed-size block of data, similar to a minicell, containing the pointer(s) to the location(s) of the packet in the memory can be given to the controller, which in this architecture can effectively schedule the service among the packets to achieve the required QoS for the connections.
VI. CONCLUSION
A new ATM switch architecture was described which is quite different from the architectures addressed in the literature. The architecture is proposed to be used as the building block of large-scale multistage switches and the controller/scheduler of RAM-based switches. The core of the switch is only a buffering element whose structure is similar to a shift register, which can buffer one cell (minicell) at each unit, with a small logic circuit added to each unit to provide the necessary functions.
This element is called a sequencer, which can sequence the cells in the buffer based on a tag added to each cell prior to entering the buffer. In this paper, we described a specific sequencing algorithm, called grouping, which changes the sequencer into a switch. This algorithm is controlled by two flag bits together with the destination output address bits which are all added to the tag of each cell. The queues of the cells destined for output lines (output queues) are interleaved to each other. Each output queue is independent of the others, and all of the capabilities provided by a sequencer, such as priority-based or other scheduling algorithms controlled by extra tag bits, are still applicable to them.
Besides the very simple switching structure, which is modular and made up of a very simple building block which is only a one-cell (minicell) size register and a simple logic circuit, the switch provides other unique characteristics which were discussed in the paper: full buffer sharing is experienced, the number of input and output lines can vary independently and without affecting the switching section, and the buffering space can be extended by simply cascading the switching elements.
Massoud R. Hashemi 
