High-Speed Switching by Engbersen, T
High-Speed Switching
Ton Engbersen
IBM Research Division, Zurich Research Laboratory, Rüschlikon, Switzerland.
Abstract
This lecture introduces switching by viewing the development of switching technology
from a historical perspective. Circuit switching, packet switching, and modern combina-
tions of these two basic technologies are presented. Extensive discussion is devoted to the
performance aspects of switches, and how to deploy switches in a network. The lecture also
includes an in-depth handling of ATM switch technology in part 2.
1 A historical perspective.
The first applications of switching were in the field of telephony. The increasing use of
telephones prompted the use of ‘exchange centers’, where connections with other telephones
were made by human operators. Actually, the word Operator was first used in this context, and
at least in the US, is still used today. The function of these manually operated switches is shown
logically in Figure 1 ”Basic Circuit Switch Principle’ . This figure also shows one of the main
functions of these early switches: to make connections between two subscriber lines, or, from
another perspective, to establish a copper circuit between two subscribers. In fact, for the
duration of their conversation, the two partners in a phone connection ‘own’ the piece of copper
connecting them. No one else can make use of that piece of copper. This method of switching
later became known as Circuit Switching. We call the center of these switches the ’crossbar’.
This type of switching became established early in the history of the phone network and, for a
long time it remained virtually unchanged: the Operator was a well-known part of the phone
network who made the connections between subscribers. Increasing traffic prompted the
development of more automated equipment, and this resulted in the circuit switch. As switching
needs grew, the demand for a formal background became evident and, in a landmark paper
published in 1953, Clos [2] described ways to extend switches to larger entities using identical
switch-building blocks.
Figure 1:   Basic Circuit Switch Principle
CONTROL
SWITCH FABRIC
2 Introduction of data communications.
During the 1960’s, computer use was often organized in the form of large computer centers, and
access to these computers via remote terminals, using a new method called time-sharing,
became fashionable. Via telephone connections, remote terminals were connected to these
computing centers, and modems translated the digital information into analog signals for
transmission over the phone system. Although this system appeared to function very well, a
more detailed investigation of the actual mechanism reveals serious inefficiencies. Figure 2
”Analysis of Packets over a Circuit Switched Network’  shows what actually flows over each
connection in time. The low interaction rate of the terminal user(s) results in a poor utilization
of the composite of connections, and the circuit characteristic of the connections do not allow
users to be multiplexed effectively over one line or circuit. A new technology was then
invented: Packet Switching. In packet switching, each piece of information is prefixed by a
header, which supplies information to the network concerning how to handle (i.e. route) the
subsequent data. This mode allows the efficient handling of typical bursty computer traffic over
one physical connection. However, it also creates new problems: can the use of the physical
medium be optimized by statistical multiplexing? If so, what happens when the traffic
temporarily exceeds the physical capacity of the circuit? A special role in this mode is played
by the switching technology: in circuit switching, the switch had to establish a physical
connection during the call-setup time, and maintain the connection until it was no longer
needed; in packet switching, the switch has to inspect every incoming packet header, interpret
the contents, and switch the remaining data, plus a potentially changed header, to the proper
switch output, all in the presence of statistical multiplexing.
3 Packet switching architectures.
Two tasks must be accomplished by a packet switch: the header must be interpreted/modified
and the packets must be routed to the output(s). The statistical nature of the traffic requires that
routing and header processing must be accomplished at the aggregate rate, i.e. the sum of the
line rates of all incoming communication lines. As long as this aggregate rate is low, a simple
processor/bus/memory system is sufficient: packets move from input line-adapters into main
memory, are accessed by the processor, inspected, and dispatched to output line-adapters. Note
that the packets pass the system bus twice, which limits the packet throughput. Direct input-





All connections have same fixed bandwidth
CIRCUIT SWITCHED DATA NETWORK
adapter to output-adapter(s) communication over the system bus removes this bottleneck. The
fact that all adapters on the system bus require interfaces that operate at the aggregate
throughput (which easily reaches several Gigabit/sec) means that these systems will not migrate
easily to higher speeds in the future: any extension of the system to accommodate more adapters
requires the redesign and replacement of all existing adapters.
By making use of space-division mechanisms, systems can be built that support a much
higher aggregate throughput. These systems can also be extended to support higher aggregate
speeds without adapter redesign [3]. With regard to throughput, this case also requires that the
aggregate speed be supported under all conditions. Note that a simple crossbar switch does not
fulfill this requirement. Only when the arriving packets do not compete for identical outputs can
a crossbar sustain the aggregate. By adding appropriate input queuing, it is possible to alleviate
this problem to some extent, depending on the expected traffic characteristics. The proper
solution is to sort and queue the packets at every input according to destination, and to develop
a controller that dispatches the maximum amount of traffic at all times to the crossbar. This,
however, is a design that cannot easily be extended to larger switches: the controller complexity
grows at a rate of N*M. (N=number of inputs, M=number of outputs). 
Another method is to provide queues at the outputs that can be written to at the aggregate
input rate. Performance evaluation shows that already a small amount of memory used as a
queue per output results in a significant increase of sustainable throughput, i.e. an output queue
memory is highly effective. This creates a completely new opportunity: to integrate a small
amount of queue memory together with appropriate control on a single chip. In fact, single-chip
packet switches become feasible when based on this output-buffered architecture.
One final observation should be made, however. In the output-buffered architecture
described here, there are M queues, each supporting the aggregate input rate. Hence the M
queues would support M times the aggregate input rate. This is a factor of M too much!
Realizing that the individual queues start to queue packets when more than one input is sending
a packet to the output to which the queue belongs, we can propose a solution. By sharing all the
queue memory among all outputs, a heavily loaded output can ’borrow’ queue space from the
necessarily less heavily loaded output(s). Besides allowing an even more efficient use of the
memory available, a shared memory needs only to support exactly the required aggregate
throughput. Important operational and performance aspects of this shared-output buffered
architecture will be discussed below.
4 Operating a shared-output buffered switch.
 Figure 3 ”Delay vs. Load characteristic of shared-output buffered switch’  shows one of
the most important parameters of a switch: because there is competition for outputs, which is
solved by queuing, the more packets there are being queued, the longer a single packet will take
to traverse the switch. As the load increases, there will be more competition, and thus longer
delays. At some point the internal queue storage is exhausted and packets will be lost, or the
delay saturates to infinity. Not allowing loads that cause a very steep or long delay is tantamount
to limiting the throughput of the switch. Saturation of the throughput of these switches is
dependent on the ratio of the mean available output queue storage per output and the average
packet length. It should be noted that the effect of sharing the output buffer among several
outputs becomes noticeable only when there are at least 8 outputs that share. It is necessary to
always keep the operating point of the switch below the knee-point where the normalized delay
increases quickly to infinity with only a small increase in the offered load. Note that the actual
average packet size is very difficult to estimate, and often changes quickly. This means that the
actual saturation point changes dynamically, depending on the offered traffic characteristic. It
is therefore advisable to maintain a sufficient margin between the saturation point and the
operating point. In fact, in order to achieve lossless operation, a local flow-control mechanism
is needed that activates relatively low-cost queue storage on the input adapters when the switch
momentarily reaches saturation. Note that this mechanism is not suited to operate the switch for
longer periods of time at a point closer to the saturation point. 
5  References
[1]  A Study of Non-Blocking Switching Networks, C. Clos, Bell System Technical Journal,
March 1953, pp.406-424
[2] A Survey of Modern High-Performance Switching Techniques, H. Ahmadi and W. Den-
zel, IEEE JSAC Vol. 7, No. 7, Sept. 1989, pp 1091-1103















0 3 10 00
Load
No
rm
al
iz
ed
 D
el
ay
0.80.60.40.20
