Survey of switching techniques in high-speed networks and their performance by Oie, Yuji et al.
UC Irvine
ICS Technical Reports
Title
Survey of switching techniques in high-speed networks and their performance
Permalink
https://escholarship.org/uc/item/6r54g1n5
Authors
Oie, Yuji
Suda, Tatsuya
Kolson, David
et al.
Publication Date
1989-10-12
 
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California
Notice: This Material 
may be protected 
by Copyright L~w 
(Title 17 u_s.c.) 
_SURVEY OF 
SWITCHING TECHNIQUES IN HIGH-SPEED NETWORKS 
AND THEIR PERFORMANC~ 
Yuji OIE 
Tatsuya SUDA 
David KOLSON 
Masayuki MURATA 
Hideo MIYAHARA 
Technical Report No. 89-37 
Department of Information and Computer Science 
University of California, Irvine 
Irvine, California 92717 
§urvey of 
Switching Techniques in High-Speed Networks 
and Their Performance* 
Yuji Or~;, Tatsuya SUDA2 , David KOLSON2, Masayuki MURATA3, Hideo MIYAHARA3 r .,,. 
October 12, 1989 
1. Department of Electrical Engineering, Sasebo College of Technology, Sasebo 857-11, 
Japan. 
2. Department of Information and Computer Science, University of Califor-
nia, Irvine, CA 92717, U.S.A. 
3. Department of Information and Computer Sciences, Faculty of Engineering Science, 
Osaka University, Toyonaka 560, Japan. 
Mailing Address: 
All the future correspondence should be addressed to Tatsuya Suda at the above 
address. 
* . This material is based upon work supported by the National Science Foundation 
under Grant No. D CI-8602052. This research is also in part supported by the 
University of California MICRO program. 
Abstract 
One of the most promising approaches for high speed networks for integrated service 
applications is fast packet switching, or ATM (Asynchronous Transfer Mode). AT~! can 
be characterized by very high speed transmission links and simple, hard wired protocols 
within a network. To match the transmission speed of the network links, and to minimize 
the overhead due to the processing of network protocols, the switching of cells is done in 
hardware switching fabrics in ATM networks. 
A number of designs has been proposed for implementing ATM switches. While many 
differences exist among the proposals, the vast majority of them is based on self-routing 
multi-stage interconnection networks. This is because of the desirable features of multi-
stage interconnection networks such as self-routing capability and suitability for VLSI 
implementation. 
Existing ATM switch architectures can be classified into two major classes: blocking 
switches, where blackings of cells may occur within a switch when more than one cell 
contends for the same internal link, and non-blocking switches, where no internal blocking 
occurs. A large number of techniques has also been proposed to improve the performance 
of blocking and nonblocking switches. In this paper, we present an extensive survey of the 
existing proposals for ATM switch architectures, focusing on their performance issues. 
1 

section 3.3. Section 4 surveys nonblocking switches (subsections 4.1, 4.2 and 4.3) and 
their improvement techniques (subsections 4.4 through 4.8). Other related research on 
nonblocking switches are summerized in subsection 4.9. Subsection 4.10 summerizes and 
compares the performance of a variety of nonblocking switches. Concluding remarks are 
given in section .S. 
2 Assumptions and Notations 
Before we start surveying various ATM switch architectures, we sumrnerize assumptions 
and notations used in this paper. Throughout the paper, we assume switch fabrics of size 
N x N (N input ports and N output ports). Input and output channels to a switch are 
of the same speed. Cells are assumed to be of a constant length, and the channel time 
is slotted with the slot size being equal to a cell transmission time. All the channels are 
assumed to be synchronized. Arrivals of cells at each of the N input ports follow a Bernoulli 
process. Namely, a cell arrives with probability p in a slot, and there is no arrival with 
probability 1 - p. Since we use the slot length as the unit of time, p also corresponds to 
the input traffic load to the input channel. 
Uniform traffic refers to the situation where incoming cells are destined to N output 
ports with a uniform probability of~· Unless otherwise stated, uniform traffic is assumed 
in this paper. When all the cells from one input port are going to a particular output port, 
the traffic is referred to .j a point-to-point connection. A hot spot refers to an output 
port where a heavy concentration of cells is expected to happen. 
In some switch architectures, buffers are provided to store cells. Because of the different 
approaches taken in the design of a switch, there are possible choices for the physical 
location of the buffers relative to the switch. Buffers may be placed on the inputs to the 
switch, or on the outputs to the switch, or possibly on both. Queueing of cells may be 
implemented in a shared buffer common to all the inputs. Binput and Boutput denote the 
size of a buffer on an input and an output, respectively. B denotes the size of a shared 
buffer. Binput, Boutput and B may be zero, finite or infinite. Unless otherwise stated, we 
follow the above assumptions and notations in this paper. 
3 Blocking ATM Switches 
3.1 Banyan Switches 
Banyan switches belong to the class of blocking switches, a major class of multistage 
interconnection networks. An N x N Banyan switch is constructed of switching elements 
which have k inputs and k outputs, i.e., k x k switching elements, arranged into logk N 
stages. Fig.I shows an 8 x 8 Banyan switch with binary switching elements ( k = 2). Cells 
3 

port pairs. In the following we discuss these switch architectures. This subsection also 
briefly refers to an.other switch architecture, the Batcher Banyan switch, in which the 
internal blocking is completely eliminated. The Batcher Banyan switch has a sorting 
network which precedes the Banyan switch. A sorting network orders the incoming cells 
according to the destinations in their ascending order, and feeds them into the Banyan 
switch, making the switch internally nonblocking. The Batcher Banyan switch is discussed 
again in detail in section 4. 
3.2.1 Multipath Banyan Switches 
By replacing each link which connects switching elements in the Banyan switch by d (dis-
tinct) parallel links, we obtain the d-dilated Banyan switch. Another way to provide 
multiple paths is to construct a switch from a multiple, say d, of Banyan switch planes in 
parallel. This architecture is called the d-replicated switch. These two switch configura-
tions provide d multiple paths between any input and output port pairs, and thus reduce 
the probability of the internal blocking. 
These multipath Banyan switches have been analyzed in [17, 32], assuming that switch-
ing elements do not have any buffers. For the d dilated Banyan switch, it was shown that 
the cell loss probability decreases when the dilation d increases (17, 32]. Dilation of be-
tween 4 and 8 is shown to be sufficient enough to reduce the cell loss probability to a very 
small value. The d-dilated Banyan switch becomes a nonblocking switch, when d = Mm/21, 
where k is the size of a switching element, and m is the number of stages in the Banyan 
switch. The d-replicated switch was shown to provide a similar performance to that of a 
d-dilated switch (17]. 
Adding a supplementary switch plane to the Banyan switch can also provide multiple 
paths between input and output port pairs. Anido et. al. [1] considered a multipath 
switch constructed from two Banyan switches: one act as a routing network, while the 
other acts as a switching network. They discussed several path selection algorithms and 
their performance. 
3.2.2 Batcher Banyan Switches 
The Banyan switch possesses an interesting characteristic. If all of the incoming cells are 
in ascending order relative to their output addresses, the switch is guaranteed to prevent 
cells from being internally blocked. To guarantee that the cells will be in ascending order 
requires only that there be some type of sorting network preceding the Banyan switch. In 
the Starlite switch [11], a Batcher network is used as the sorting network. This switch 
configuration is called the Batcher Banyan switch. The Batcher Banyan switch belongs to 
a major class of multi-stage interconnection networks, the class of no~blocking switches. 
This class of switches is discussed in detail in section 4. 
5 

output link of a switching element (output contention), only one cell may pass through the 
switching element; _the others are blocked and stored in the input link buffers. When the 
head of the line cell is blocked due to output contention, all the cells in the same input 
link buffer are blocked, if cells are served on an FIFO basis within the buffer. The HOL 
blocking will be discussed again in subsection 4.5 in more detail. 
In the two papers discussed above [14, 16], the buffered Banyan switch is constructed 
of binary switching elements. In [3, 32], it was shown that the buffered Banyan switch 
constructed of switching elements of size greater than two results in a higher throughput. 
It is interesting to note that, in case of nonblocking switches, it has been shown that 
the maximum throughput decreases, as the size of switching elements increases [15]. For 
instance, a nonblocking switch constructed of 2 x 2 switching elements achieves the maxi-. 
mum throughput of 0. 75, while the switch constructed of 4 x 4 switching elements results 
in smaller throughput of 0.68. 
The performance of the buffered Banyan switch under non-uniform traffic has been 
analyized. Wu [35] investigated the effects of existence of a point-to-point connection 
on the performance of the single-buffered Banyan switch through simulation. (Refer to 
section 2 for the definition of a point-to-point connection.) Wu showed that the maximum 
throughput of the buffered Banyan switch decreases significantly, when there exists a point-
to-point connection in addition to uniform traffic. Therefore, the buffered Banyan switch 
does not favor non-uniform traffic. 
Kim et. al. [16] showed that, if all the inputs to the switch are of a point-to-point 
connection type, single-buffered and multiple-buffered Banyan switches result in almost 
the same throughput. They also showed that, for the mixture of a point-to-point con-
nection and uniform traffic, the multiple-buffered Banyan switch results in throughput 
improvement of 10% to 15% over the single-buffered Banyan switch, depending on traffic 
load from a point-to-point connection. This improvement, however, becomes negligible as 
the size of the switch becomes large. 
3.3.2 Buffered Banyan Switches with Bypass Queueing 
In the above subsection, we observed that the HOL blocking can happen on the input link 
buffers of the switching element in the buffered Banyan switch. The HOL blocking occurs 
when the HOL cell is blocked due to output contention, and if cells are served on an FIFO 
basis within the input link buffer. This observation leads to the following technique called 
"bypass queueing" in order to improve the performance of the buffered Banyan switch. 
Bubenik and Turner [3] proposed the bypass queueing discipline. In bypass queueing, 
when the HOL blocking occurs cells in that particular input buffer "bypass" the HOL 
cell and sequentially join competition for the available output links until some cell wins 
competition. This bypass queueing discipline is a variation of the window selection policy 
to be discussed in subsection 4.5. 
7 

The architecture of the crossbar switch has some advantages. First, it uses a simple 
two-state cross-point switch (open and connected state) which is easy to implement. Also, 
the modularity of the switch design allows easy expansion. One can build a larger switch 
by simply adding more cross-point switches. Lastly, this switch design provides for a low 
latency as compared to the Banyan type switches, because it has the smallest number of 
connecting points between an arbitrary input and output pair. One disadvantage to this 
design, however, is the fact that it uses the maximum number of crosspoints (cross-point 
switches) needed to implement an N x N switch. 
The Knockout Switch is a nonblocking switch 1>ased on the crossbar design (8, 36]. It 
has N inputs and N outputs and consists of a crossbar type ( crossbarlike) switch with a 
bus interface module at each output (Fig.4). Since each input has a direct connection to 
each output, cells will not interfere with one another; internal blocking does not occur. 
However, since more than one cell may go to the same output port, output contention 
may result. The purpose of the bus interface is to resolve this conflict. The bus interface 
has three components. They are: the cell filter, the concentrator, and the shared buffer 
(Fig.5). The functions of these components are described below. 
In the Knockout switch, each bus interface sees all cells from the input ports. The 
function of the cell filter is to separate cells destined for this particular output and those 
destined for other outputs in the switch. The cell filter does this by setting an activity bit 
in the cell's header to logical one, if that cell is destined for this output. The activity bit 
is set to logical zero otherwise. 
At the beginning of each slot, all of the cell filters are initially open. That is, they will 
allow data bits from the input buses to pass into the concentrator. As the cell headers pass 
through these filters, there is a bit-by-bit comparison performed between the destination 
address (found in the cell header) and the particular output's address. When the filter 
finds a cell destined for another output, it sets the activity bit of that cell's header to 
logical zero, otherwise the activity bit is set to logical one. Thus, at the end of the slot, 
all cells destined for a particular output will be in the concentrator associated with that 
output. 
The name "concentrator" comes from the function it performs. Specifically, the con-
centrator provides for an N to L concentration, where L is the number of separate buffers 
in the shared buffer. There are L outputs from the concentrator into the shared buffer. If 
some number of cells, k ( k ~ L ), arrive for a particular output, then they will appear on 
the outputs 1 to k of the concentrator. When k (> L) cells arrive, then k - L cells will be 
dropped. 
The work of the concentrator is essentially analogous to a tournament of N players 
competing for L prizes, where the prizes represent the output ports. Initially all of the 
cells are able to compete for the first of the L concentrator outputs, where there will be 
one winner. All of the losers (N - 1) then compete for the second output. This process, 
all of the losers of the previous stage competing for the concentrator output in this stage, 
9 

communication. In order to yield multicasting capabilities, a copy network is placed be-
tween the inputs and the Batcher Banyan switch (Fig.7). The first stage of the copy 
network is a sort-to-copy network, which takes as input both source and copy cells. The 
output from the sort-to-copy network is then those cells ordered by their source addresses. 
Note that copy cells and source cells from the same input will be adjacent to one an-
other. The copy network then copies the information field from the source cells into its 
corresponding copy cells. These cells are then sent into the Batcher Banyan switch. 
Regardless of whether it is a crossbar based or a Batcher Banyan based switch, output 
contention still occurs in a nonblocking switch, when more than one cell is destined for the 
same output. When this happens, cells which lose contention are dropped from a switch, 
if the switch does not have any buffering discipline. In order to minimize the cell loss, 
buffering of cells is necessary. Buffers may be placed on the inputs to the switch, or on 
the outputs to the switch, or possibly on both. Queueing of cells may be implemented in 
a shared buffer common to all the input/output ports. In the following, we first study the 
performance of nonblocking switches without buffers. We then investigate various buffering 
schemes and techniques to improve the performance of nonblocking switches. 
4.2 Nonblocking Switches Without Input Buffers 
In this subsection, we study the performance of the simplest nonblocking switch, a non-
blocking switch without any buffers. In this case, when output contention happens, only 
one cell is successfully transferred to the destination output port, and the remaining cells 
are dropped from the switch. 
In (26], Patel analyzed an N x 1VI nonblocking switch (the cross-bar switch) in the 
context of interconnecting multiprocessors, and obtained the cell loss probability for both 
of N = finite and N = oo cases. It is assumed that the speed of a switch is equal to that 
of the input channels; a switch transfers at most one cell per slot from each of the N input 
ports. Patel showed that, for the case of N = M = oo, the probability p( success) that a 
cell wins an output contention is given by p( success) = l-e-P, where p is the input traffic p 
load. (See section 2 for the definition of p.) The numerator 1 - e-P is the throughput of 
the switch. The cell loss probability is given by p( loss) = 1 - p( success). The throughput 
takes its maximum when p is 1 (and N = oo), and its value is 1 - e-1 = 0.632. It is 
noteworthy that the maximum throughput of 0.632 is achieved at the large expense of the 
cell loss at inputs; 36.8% of incoming cells are dropped when p is one. This level of cell 
loss is not acceptable for ATM networks. 
4.3 Nonblocking Switches With FIFO Input Buffers 
In nonblocking switches without buffers, when output contention happens, cells which lose 
contention are lost from the switch. This limits the throughput of the switch to 0.632 as we 
11 

for instance, achieves a cell loss probability of less than 10-6 at p = 0.5 when Binput ~ 20. 
Karol et. al. (15] have obtained the maximum throughput of the switch for both finite 
N and infinite N, as well as the average delay time for infinite N. Tab.I is from (15] and 
shows how the maximum throughput decreases as the number of inputs N increases. It 
is seen that, as N increases, the maximum throughput decreases to 0.586. Note that the 
throughput 0. 75 for N = 2 gives the throughput of a 2 x 2 switching element, a basic 
building block of the Banyan switch. 
The maximum throughput (0.568) of the nonblocking switch with FIFO input buffers 
is smaller than that of the nonblocking switch without buffers (0.632). This is due to the 
head of the line cell blocking ( HOL blocking). When the head of the line cell ( HOL cell) 
is blocked due to output contention, all the cells in the same input buffer are blocked, if 
cells are served on an FIFO basis within the buffer. This HOL blocking severely limits the 
maximum throughput of the nonblocking switch with input buffers, resulting in a lower 
throughput than that of the nonblocking switch without buffers. 
In order to improve the limited throughput of the nonblocking switch discussed above, 
a number of improvement techniques has been proposed and investigated. One possible 
approach is to speed up the switching fabric. Effect of speed up on the performance of a 
switch is discussed in subsection 4.4. 
Another possible approach to improve the performance of the nonblocking switch is 
to reduce or eliminate- the HOL blocking. When the HOL cell is blocked due to output 
contention, a cell behind it going to an available output port can be sent instead. This 
reduces the HOL blocking and results in a better throughput performance. Subsection 4.5 
explains the window selection discipline, where one of the first w cells in an input buffer is 
selected and sent prior to the HOL cell. Use of a shared buffer also improves the throughput 
of a switch by eliminating the HOL blocking. In the shared buffer switch there is no buffer 
on the inputs, nor on the outputs. Arriving cells are immediately injected into the switch. 
'When output contention happens, a winning cell goes through a switch, and the losers are 
stored in a shared buffer common to all of the input ports for later transmission. Since 
cells which lose output contention are stored in a shared buffer, the queue structure is not 
retained. Newly arriving cells immediately join in the competition for available outputs. 
In addition, cells in the shared buffer also have access to the switch. Since more cells are 
available to select from, it is possible that less outputs will be idle in the shared buffer 
scheme. Thus, the throughput of the shared buffer switch is slightly better than that of 
the switch with the window selection discipline. In subsection 4.6, the switch with a shared 
buffer and its performance are discussed. 
The third possible approach to improve the performance of the nonblocking switch is to 
optimally select one cell among the contending cells and transfer it -to the output, instead 
of selecting a cell randomly. There are a number of possible selection policies. For instance, 
selecting a cell from the longest queue may improve the switch performance. In subsection 
4. 7, we discuss the longest queue selection and the priority selection schemes. 
13 

distribution on an output port in the steady state becomes 
Q( z) = ( 1 - p )( 1 - z). 
e-p(l-z) - z (4) 
Interestingly, this Q( z) is same as the z-transform for the queue length distribution in the 
.\1/D /1 queue. It is clear that the switch attains the maximum throughput of 1.0, if the 
speedup ratio of a switch is N (i.e., if the switch transfers the maximum of N cells to a 
particular output in a slot). 
As we saw in the above, the speedup of L = N achieves the highest possible maximum 
throughput of 1.0. However, it is very difficult and costly to build a very high speed switch 
of large size due to hardware limitations. Thus, the case of L < N becomes of practical 
importance, when N is large. When the speedup ratio Lis less than N, if k (> L) cells are 
destined for the same output, k - L cells are blocked at inputs. Therefore, queueing occurs 
not only on the output ports, but also on the input ports. (Binput and Boutput denote the 
size of an input and an output buffers, respectively.) See Fig.10 for a switch with both 
input and output buffers. The performance of switches when the speedup ratio is less than 
N has been analyzed in [36, 24, 25, 8]. 
Yeh et. al. [36] and Oie et. al. [24, 25] analyzed the cell loss probability on input buffers, 
assuming infinite capacity buffers on the outputs ( Boutput = oo ). In [36], no buffering is 
assumed to be on the inputs (Binput = 0), and in [25], infinite buffer capacity is assumed 
on the inputs (Binput = oo). Yeh et. al. [36] obtained a cell loss probability at an input 
port as follows: 
1 N 
- L (k - L)ak (N < oo) 
p( loss) = p k=L+t ( 5) 
L L p" PL (1 - -)(1 - L 1 e-P) + -,e-P (N = oo). p k=O k. L. 
Their analysis showed that a small speedup ratio can achieve a cell loss probability nearly 
equal to zero; that is, a speedup ratio of L = N is not required to achieve the very small 
cell loss probability. For example, a speedup of L = 8 is sufficient enough to achieve the 
cell loss probability of less than 10-6 , when the input traffic load is 0.9 (p = 0.9) and N is 
infinity. 
Oie et. al. [24, 25] analyzed a nonblocking switch, assuming N = oo and Binput = 
Boutput = oo, and obtained the maximum throughput as a function of the speedup ratio 
L. Tab.3 shows the values of the maximum throughput for various values of L from [24]. 
They also obtained an upper bound on the cell loss probability by truncating the tail of the 
queue size distribution for the case of Boutput = oo. Fig.11 shows the upper bound on the 
cell loss probability at the input buffers as a function of the buffer size for various values of 
L and p (input traffic load). By comparing their results with Yeh's [36], they showed that 
15 

(12], and "window policy" in (15]. In (3], ·•bypass queueing discipline" is used to describe 
the window selection discipline in the context of blocking switches. In this paper, we use 
"window selection discipline". 
Hui et. al. [12] proposed a priority scheme to implement the window selection discipline 
on the nonblocking switch with FIFO input buffers (i.e., the Batcher Banyan switch). Their 
priority scheme allows the first w cells in each input buffer to sequentially contend for the 
idle switch outputs at the beginning of each slot until a cell wins an output contention. 
Once a cell wins this output contention, it is given priority and no other cells will be 
assigned to the same output. 
Oki Electric Industry Company (20] implements the window selection discipline on the 
nonblocking switch with FIFO input buffers. The switch has a "Nemawashi" ("negotiation" 
in Japanese) network followed by a nonblocking switch (i.e., the Batcher Omega switch). 
The N emawashi network choses at most one cell from each input buffer in such a way that 
the selected cells do not cause any output contention in the Batcher Omega Switch. This 
Nemawashi network can be implemented using the priority scheme proposed in by Hui et. 
al. in [12]. Masaki et. al. (20] show simulation results on the throughput-average delay 
performance of the Nemawashi switch. They assumed a 32 x 32 switch with the window 
size of 7 · ( N = 32 and w = 7) and showed that the maximum throughput increases to 
approximately 0.9 from 0.586 (the maximum throughput of the nonblocking switch with 
FIFO input buffer). 
Performance study on the nonblocking switch with the window selection discipline is 
found in (29] and (10]. The throughput-average delay performance of a binary switch 
(N = 2) with the window selection discipline is obtained through an exact analysis in 
(29]. In [10], Hluchyj et. al. present simulation results on the maximum throughput of a 
nonblocking switch assuming the window selection discipline. Tab.4 shows the maximum 
throughput values of a nonblocking switch with the window selection discipline from (10]. 
This table shows that the window selection discipline is most effective when N is small 
and w is large. For instance, the maximum throughput is 0.96, very close to 1.0, when 
N = 2 and w = 8. Even when N is large, the window selection discipline achieves high 
throughput. For instance, when N = 128 and w = 8,· the maximum throughput is 0.88. 
This throughput value is significantly higher than 0.586, the maximum throughput of the 
nonblocking switch with FIFO discipline. (See subsection 4.3.) However, it should be 
noted that this window selection discipline can not achieve the throughput of 1.0, even 
when N = oo and w = oo (10]. This is because that the window selection discipline limits 
each input to send at most one cell into the switch fabric per slot, and as a result, prevents 
the maximum throughput from reaching 1.0. 
Lastly, we note that parallel buffers at the inputs can implement the window selection 
discipline at the expense of additional control hardware. Fitzpatrick et. al. (9] proposed 
an N x N switch where each input port has N buffers in parallel (Fig.13). Buffer i at 
an input port stores cells going to the output port i (1 ~ i ~ N). A controller selects at 
17 

analyze the performance of the Starlite switch with trap (i.e., the nonblocking switch with 
a shared buffer) is depicted in Fig.14. 
Eckberg et. al. [7] studied the Starlite switch with trap and developed an approximate 
analysis to obtain the cell loss probability at a shared buffer, assuming that the queue 
length in the shared buffer follows a Gamma distribution. They obtained the capacity of 
the shared buffer required to satisfy a given cell loss requirement as a function of N (the 
number of input ports) and p (input traffic load). Furthermore, it is shown that, as N 
approaches infinity, the value of B (capacity of the shared buffer per output) required to 
satisfy a cell loss requirement of practical interests approaches its lower bound 
p2 
f(p) = 2( 1 - p). (6) 
As pointed out in [10], this lower bound is same as the average queue length in the M/D/1 
system. 
Hluchyj et. al., in their study of the performance of the Starlite switch with trap 
(10], used the N fold convolution of an M/D /1 queue length to approximate the steady 
state distribution for the queue length of the shared buffer. They showed that the lower 
bound f (p) on the buffer size required to satisfy a given cell loss requirement is given by 
the average queue length. in the M/D / 1 system, confirming the results obtained in [7]. 
It is also shown that the Starlite switch with a large shared buffer attains the maximum 
throughput of one. 
As We saw in subsection 4.4, a nonblocking switch with the speedup ratio of N (referred 
to as the output buffered switch in the following) can also achieve the throughput of one. 
However, the buffer space required in the shared buffer switch to attain the throughput of 
one is ~uch less than that needed in an output buffered switch. For example, as we saw in 
subsection 4.4, to satisfy a cell loss probability of less than 10-6 at the input traffic load 
of p = 0.9 in the output buffered switch, it is required to have a buffer for 55 cells at each 
output port. On the other hand, for the shared buffer switch, eq.(6) gives the lower bound 
on B (the buffer capacity required per output) to satisfy the same cell loss requirement, and 
the value of B is 5 (cells). The shared buffer switch only needs enough buffer space for 5 
cells per output, as opposed to a buffer space for 55 cells per output in the output buffered 
switch. This decrease in the required buffer capacity is at the expense of an increase in the 
number of input and output ports internal to the switch. An N x N switch with a shared 
buffer of size B = 5 internally consists of a 6N x 6N switch, thus, the number of input 
ports and output ports are six times as many as those in an output buffered switch. This 
increase in the number of input and output ports is one of the drawbacks of this switch. 
Another drawback of the shared buffer switch is that cells may be delivered out of sequence 
because newer cells may win an output contention [12]. 
19 

data traffic exist in the same input buffer, cells from the real time traffic are sent first. 
Among the cells at t_he same priority level, FIFO is assumed at an input buffer. 
Chen et. al [4] assumed a nonblocking switch with the speedup ratio of 1 ( L = 1) and 
analyzed the performance of the switch assuming the priority selection policy described 
above. Uniform traffic is assumed in the analysis. They obtained the maximum through-
put, the cell loss probability and the average delay time. The maximum throughput is 
given by 
(0 ~ >..y ~ 0.586) (7) 
where Amax( AH) is the maximum allowed arrival rate of the low priority cells for a given 
value of Ay. Amax(>..H) is given by the following: 
(>..k - 6Ay + 4) - J-3..\'h + 12>..h - 16>..y + 8 
Amax(>..y)= . 2(l-Ay) • (8) 
From eqs.(7) and (8), the maximum value of S is obtained as 0.6063 when >..H is 0.447. 
This throughput value is larger than 0.586, the maximum throughput of the nonblocking 
switch with FIFO input buffers when priority selection is not assumed. 
In [15], the longest queue selection policy was introduced. Under this policy, when 
output contention happens, a cell is taken from the queue which has the longest length, 
and sent to its destination output. Simulation results show that this policy offers smaller 
delay time than with the random selection policy [15J. 
4.8 Parallel Switches 
In the previous subsections, we first observed that the maximum throughput of a nonblock-
ing switch is 0.586, if the speedup ratio of the switch fabric is 1, and if cells are served on 
an FIFO basis within each input buffer. We, then, discussed three major classes of tech-
niques to improve the performance of the nonblocking switch; speedup of the switch fabric, 
techniques to reduce or eliminate the HOL blocking, and policies to select a cell from those 
contending for the same output. As the speed of the switch fabric increases, so does the 
throughput. By speeding up the switch fabric N times faster, the maximum throughput 
of one can be achieved. It is, however, difficult to implement a large size switch operating 
at very high speeds due to the limitations of current hardware technology. The window 
selection discipline at input buffers reduces the HOL blocking and increases the throughput 
of the nonblocking switch up to 0.88 at the expense of cell scheduling overhead (hardware) 
on input buffers. Using a shared buffer eliminates the HOL blocking and achieves the 
maximum throughput of one, when the size of a shared buffer is infinitely large. However, 
as the size of the shared buffer grows, so does the size of the switch. This is because the 
shared buffer switch is internally implemented using a (B + l)N x (B + l)N switch, where 
B is the size of a shared buffer. Again, limitations of current hardware technology put a 
21 

In the switch architecture shown in Fig.16. it is assumed that each switch plane has 
dedicated input buffers, and incoming cells are randomly assigned to one of the switch 
planes. One of the disadvantages of this switch architecture is that cells may be delivered 
out of sequence to output ports. This is because different cells from the same input stream 
may be assigned to different switch planes. One possible approach to solve the out of 
sequence problem is to equip common input buffers shared by all the switch planes, and 
send cells in an input buffer on an FIFO basis (Fig.17). If some number of calls are 
multiplexed onto one input port it is also possible to assign, not an individual cell, but a 
call, to a switch plane. All the cells belonging to the same call take the same switch plane, 
and therefore, cells will be delivered in sequence to their destinations. 
4.9 Related Research on Nonblocking Switches 
In this section, we consider two research topics so far not addressed in this paper: per-
formance analysis assuming non-uniform input traffic, nonblocking switches with parallel 
input buffers and parallel service, and multicast switches. 
4.9.1 Performance Analysis of Nonblocking Switches Under Non-Uniform 
Traffic 
Yoon et. al. (37) analyized the performance of the Knockout switch assuming the existence 
of a hot spot. (A hot spot refers to an output port where ·heavy concentration of cells is 
expected to happen.) In their model, a fraction h of the incoming cells go to a hot spot, 
and the rest of the cells are uniformly destined to the N output ports. With this hot spot 
traffic model, the arrival rate of cells going to a hot spot becomes hp + (l-;)p (cells per 
slot, per an input port). Thus, the probability that k cells arrive at the hot spot (from all 
the input ports) is given by 
pk= (~)(hp+ (1 ~ h)p)k(l - hp - (1 ~h)p)N-k. (9) 
Using the above Pk, the cell loss probability is given by 
1 N 
p(loss) = - L (k - L)Pk. 
p k=L+I 
(10) 
From this equation, it can be shown that, in the limiting case of N = oo, the speedup 
ratio L has to be at least 20 to achieve a cell loss probability of less than 10-6 , when the 
input traffic load p is 0.9 and h is 0.005, a fairly small value of h. On the other hand, as 
we saw in subsection 4.4, if the traffic is uniform, L = 8 is sufficient enough to achieve 
the same loss probability. We can see that existence of a hot spot significantly reduces the 
performance of a switch. 
23 

when more than B cells are destined for an output, there is no guarantee which cells are 
transmitted. This leads to an out of sequence problem. 
This switch architecture, used in conjunction with some of the following improvement 
techniques, may result in switch of more practical value. One may increase the switching 
speed of the fabric. One may increase the capacity of a parallel buffer so that the cells 
which lose an output contention are stored and retransmitted later. 
4.9.3 Multicast Switches 
A generally agreed upon feature of future high-performance networks is the ability to set 
up one-to-many or many-to-many connections for such applications as teleconferencing, 
commercial television, and multi-way telephone conversations. A key element of the de-
sign of such a system is the multicast switch module, which is responsible for duplicating 
incoming cells and forwarding them to every output port which belongs to the multipoint 
connection. 
As we saw in subsection 4.1.2, the Starlite switch has multicast capability. Cells are 
duplicated by a copy network placed between inputs and the Batcher Banyan switch and 
multicast to the destination output ports (Fig. 7). 
The Broadcast Packet Switch proposed by Turner [33, 3] is another example of multi-
cast switches. Fig.19 shows the design of a Broadcast Packet Switch. This switch fabric 
composed of a series of major components: a Copy Network, Broadcast and Group Trans-
lators (BGT), a Distribution Network, and a Routing Network. The Routing Network is 
a self-routing, binary switching network (Banyan network) with buffers at each input port 
capable of holding two complete cells. Blocking on the Routing Network is reduced by 
the Distribution Network. The Distribution Network evenly distributes all cells it receives 
across all its outputs breaking up any "communities of interest" that may exist. The Copy 
Network and Broadcast and Group Translators are included to accommodate multi-point 
connections throughout the network. 
An alternative Turner's copy network has been proposed by Lee [18]. Lee proposes a 
non-blocking copy network consisting of a running adder network, a set of dummy address 
encoders, a concentrator network, and a broadcast Banyan network. 
4.10 Summary of the Performance of Nonblocking Switches 
Tab.6 summerizes the past research on the performance of the nonblocking switches. In 
this table, switches are classified according to the following characteristics: 
• the selection policy used in case of output contention to choose a cell from those 
contending for the same output port (random, priority, or longe~t queue selection 
policies), · 
25 

5 Concluding Remarks 
In this paper, we have surveyed various switch architectures for ATM networks. Surveyed 
switch architectures include the blocking switches and the nonblocking switches. Improve-
ment techniques to these switch architectures are also discussed. 
One of the areas that needs more research attention is the performance evaluation and 
comparison of switch architectures under integrated service environments. In such environ-
ments, different types of network traffic may co-exist in a switch, heavy concentration of 
traffic may occur, and the uniform traffic assumption may not hold any longer. This area 
of research is key to the successful application of ATM networks for integrated services. 
27 

[14] Y-C. Jenq, "Performance Analysis of a Packet Switch Based on a Single-Buffered 
Banyan Network," IEEE J. Select. Areas Commun., vol.SAC-1, pp.1014-1021, Dec. 
1983. 
[15] M. J. Karol, ~f. G. Hluchyj, and S. P. Morgan, "Input versus Output Queueing on a 
Space-Division Packet Switch," IEEE Trans. Commun., vol.COM-35, pp.1347-1356, 
Dec. 1987. 
[16] H. S. Kim and A. Leon-Garcia, "Performance of Buffered Banyan Networks under 
Nonuniform Traffic Paterns," Proc. INFOCOM'88, pp.4A.4.1-4A.4.10, New Orleans, 
March 1988. 
[17] C. P. Kruskal and~'!. Snir, "The Performance of Multistage Interconnection Networks 
for Multiprocessors," IEEE Trans. Computer, vol.32, pp.1091-1098, Dec. 1983. 
[18] T. T. Lee, "Nonblocking Copy Network for Multicast Packet Switching," IEEE J. 
Select. Areas Commun., vol.6, pp.1455-1467, Dec. 1988. 
[19] S.-Q. Li and M. J. Lee, "A Study of Traffic Imbalances in a Fast Packet Switch," 
Proc. INFOCOM'89, pp.538-547, Ottawa, Apr. 1989. 
[20] T. Masaki, et. al., "A Study on a Switch for High Speed Packet Switching," (in 
Japanese) IECEJ, SE87-132, 1987. 
(21] R. Melen and J. S. Turner, "Nonblocking Networks for Fast Packet Switching," Proc. 
INFOCOM'89, pp.548-557, Ottawa, Apr. 1989. 
[22] P. Newman, "A Fast Packet Switch for the Integrated Services Backbone Network," 
IEEE J. Select. Areas in Commun., vol.6, pp.1468-1479, Dec. 1988. 
(23] H. Ohara and T. Yasushi, "High Speed Transport Processor for Broad-Band Burst 
Transport System," Proc. ICC'BB, pp.29.5.1-29.5.6, Philadelphia, June 1988. 
(24] Y. Oie, M. Murata, K. Kubota and H. Miyahara, "Effect of Speedup in Nonblocking 
Packet Switch," Proc. ICC'89, Boston, June 1989. 
(25] Y. Oie, M. Murata, K. Kubota and H. Miyahara, "Effect of Speedup in Nonblocking 
Packet Switch," under preparation, 1989. 
(26] J. K. Patel, "Performance of Processor-Memory Interconnections for Multiproces-
sors," IEEE Trans. Comput., vol.30, pp. 771-780, Oct. 1981. 
(27] G. M. Parulkar and J. S. Turner, "Towards a Framework for High Speed Communica-
tion in a Heterogeneous Networking Environment," Proc. INFOCOM'89, pp.655-667, 
Ottawa, Apr. 1989. 
29 

List of Figures and Tables 
Fig. l 8 x 8 Banyan switch with binary switching elements 
Fig.2 Example of internal blocking in the Banyan switch 
Fig.3 8 x 8 buffered Banyan switch 
Fig.4 Knockout switch 
Fig.5 Bus Interface of the Knockout switch 
Fig.6 Batcher Banyan switch 
Fig. 7 Starlite switch 
Fig.8 Nonblocking switch with input buffers 
Tab.l Maximum throughput of a nonblocking switch with FIFO input buffers 
Tab.2 Past research on the effects of speedup on the performance of a nonblocking switch 
Fig.9 Nonblocking switch with output buffers 
Fig.10 Nonblocking switch with input and output buffers 
Tab.3 Maximum throughput of a nonblocking switch with input and output buffers 
(N = oo) 
Fig.11 Upper bound on the cell loss probability at input buffers (N = oo) 
Fig.12 Cell loss probability at output buffers (N = oo, p = 0.9) 
Tab.4 Simulation results on the maximum throughput of a nonblocking switch with 
window selection discipline 
Fig.13 Nonblocking switch with N parallel buffers 
Fig.14 Nonblocking switch with a shared buffer 
Fig.15 4 x 4 Starlite switch with trap 
Fig.16 Parallel switch with two switch planes (dedicated input buffers) 
Tab.5 Input rate p* and the number of switch planes K* 
31 

[]QI] 0 
1 
Input 
2 
3 
Ports 4 
5 
6 
7 
0 
1 
2 
3 
Output 
4 Ports 
5 
6 
7 
Fig.1 8 x 8 Banyan switch with binary switching elements 
conflict 
~ 
0 0 
[QQ[] 1 1 
2 2 
[]Q[] 3 3 
Input Output 
Ports 4 4 Ports 
5 5 
6 6 
7 7 
Fig.2 Example of internal blocking in the Banyan switch 
33 

Inputs 
1 2 
Concentrator 
1 2 ······ L 
Shared 
Buffer 
Output 
N 
Cell 
Filters 
Fig.5 Bus Interface of the Knockout switch 
Input 
Ports 
Batcher 
Network 
! 
Banyan 
Network 
! 
Fig.6 Batcher Banyan switch 
35 
Output 
Ports 

Analytic model for switch fabric Performance measures 
L N Binput Boutput obtained 
1 00 00 ·NQ upper bound on p( loss) on inputs 
average delay 
:::; 00 00 NQ maximum throughput (see Tab.l) 
N :::; 00 NQ 00 average delay 
:::; 00 NQ < 00 p( loss) on outputs 
1 < L < N :::; 00 0 00 p( loss) on inputs 
:::; 00 0 < 00 p( loss) on outputs 
00 00 00 maximum throughput (see Tab.3) 
upper bound on p( loss) on inputs 
00 00 <oo p(loss) on outputs 
:::; 00 1 00 p( loss) on inputs 
N Q : Queueing does not occur. 
< oo : finite value 
:::; oo : both finite and infinite values 
Tab.2 Past research on the effects of speedup 
on the performance of a nonblocking switch 
'Output buffers 
References 
Hui (12] 
Karol (15] 
Hluchyj TlOT 
Yeh (36] 
Oie T24, 25] 
Eng ISJ 
Input 
Ports NxN 
Output 
Ports 
Fig.9 Nonblocking switch with output buffers 
37 

Cell Loss Probability 
10-0 
10-l 
10-2 
10-3 
10-4 
10-5 
10-6 
10-7 
10-8 
10-9 
-10 
10 
0 2 4 6 8 10 12 14 16 18 
Buffer Size 
Fig.11 Upper bound on the cell loss probability at input buffers (N = oo) 
39 

I II 
Window size w 
N --l--~2-....--3---.j __ 4__ j __ 5 __ j -6---..~7----8--
2 0.75 0.84 0.89 0.92 0.93 0.94 0.95 0.96 
4 0.66 0.76 0.81 0.85 0.87 0.89 0.91 0.92 
32 0.59 0.70 0.76 0.80 0.83 0.85 0.87 0.88 
128 0.59 0.70 0.76 0.80 0.83 0.85 0.86 0.88 
Tab.4 Simulation results on the maximum throughput of 
a nonblocking switch with window selection discipline 
Input 
Ports 
NxN 
. l==rJ 
~~ 
.____ __ ____. 
Fig.13 Nonblocking switch with N parallel buffers 
_,,,, 
-
--
--- N(B + 1) x N(B + 1) 
_ ... 
-
~ ~ 
Shared 
--
....-
Buffer 
--(NB) 
Fig.14 Nonblocking switch with a shared buffer 
41 
_,,,,, 
---
_,., 
-.--
Output 
Ports 

1 0.000002 450000 
2 0.002451 369 
3 0.029013 32 
4 0.106534 9 
5 0.243505 4 
6 0.437178 3 
7 0.681421 2 
8 0.969856 1 
Tab.5 Input rate p* and the number of switch planes K* 
Input Buffers 
~ =rJ--.~ 
... ---------
Input : 
Ports : 
• // ~" ......... -----1... .... .. 
~=n--+ 
Output Buffers 
: Output 
· Ports 
Fig.17 Parallel switch with two switch planes (common input buffers) 
43 

Contention Resolution Buffering Size Performance Ref. 
selection speed losers wmners 
random 1 input (FIFO) no queue N 0.586 [15] 
longest queue 1 input (FIFO) no queue N > 0.586FT [10] 
priority 1 input (priority) no queue N 0.606 (4] 
random 1 dropped no queue N 0.632 [26] 
random 1 dropped output lOON o.s5T··1 (10] 
random 1 input (window) no queue N 0.88 (12, 20, 10] 
random 1 shared no queue 6N 0 (7, 10] 
random 3 input (FIFO) output N 0 (24] 
priority 4 shared output 2N 0 [8] 
random 8 dropped output N 0 (36] 
unnecessary N no losers output N 0 (15] 
* : lower bound on the maximum throughput 
** This switch achieves the cell loss probability of less than 10-3 at p = 0.85. 
Tab.6 Performance of a nonblocking (single plane) switch 
(large N and uniform' traffic) 
45 
