Abstract-Performance studies, linking ATM switch capabilities to physical limitations imposed by integrated circuit technology, have been scarce. This paper explores trends in circuit capabilities, and makes projections toward the 0.25-m technologies that will be available to all switch designers in the year 2000. The limits imposed by circuit technology are applied to shared buffer ATM switches. We determine requirements and physical limits for buffer capacity, buffer throughput, chip I/O throughput, and power dissipation. As a result, we are able to project chip counts, aggregate switch throughputs, and switch dimensions. As well, performance capabilities of singlechip shared buffer switches are estimated. A single-chip shared buffer switch implemented in 0.25-m technology will be capable of an aggregate throughput of 1.3 Tb/s, will accomplish almost arbitrarily low cell loss rates for bursty traffic, and may be integrated together with translation tables supporting hundreds of connections per port.
I. INTRODUCTION
T HE industry trend toward ever higher integration levels is leading inexorably toward systems-on-a-chip [1] , and it is prudent to take a proactive view of this development. How would a terabit-per-second switch, with tens or hundreds of physical ports and millions of virtual connections, be built on a single chip? Would it be safe to assume that the surest route toward such a switch implementation would use a memoryintensive architecture, since memories continue to lead all other integrated circuits in terms of density?
This paper provides a theoretical and quantitative view of the performance limits of shared buffer switches and how these limits scale. This projection provides a view of the future of today's most promising switching node architecture, and demonstrates how novel architectural concepts may be advantageously used in a leading edge switch of the year 2000.
Although scores of studies have been undertaken and published on the topic of ATM switch performance analysis, these have almost exclusively been concerned with queueing parameters such as offered load, cell loss rate, and waiting time. Analyses of physical performance, including parameters such as raw throughput, area and power, have been scarce.
Ahmadi and Denzel [2] present a survey of emerging VLSIintensive switching circuits, but their presentation is mostly qualitative and anecdotal. A few trends relating embedded RAM performance to switch performance are given in [3] . Coppo et al. present a comprehensive cost/performance analysis in [4] , but their only physical cost factor is yield, and performance is in terms of blocking probability. Zegura's quantitative study [5] covers a large range of switch architectures, but deals only with chip counts (mostly based on heuristic cut sets). In [6] , Lardy provides an analysis of basic physical trends in microelectronics, but applies these to ATM only through a few examples.
In this paper, as in [7] , we present a quantitative analysis of shared buffer ATM switch performance, using minimum fabrication linewidth, switch dimension, and line rate as input parameters, and throughput, area, power, and chip count as the primary output parameters. We build on [2] - [6] , and model our methodology (outlined in Fig. 1 ) after Bakoglu's comprehensive analysis of the physical factors that determine microprocessor design constraints and performance [8] . We limit our observations to single-stage switches, because we believe that these simple implementations will be able to handle most switching requirements in the near future; the migration to multistage networks is straightforward, and the results are still applicable. Fig. 1 essentially provides the outline of this paper. We begin, in Section II, by deriving area trends, and using them to determine chip counts. Following that, in Section III, we investigate the factors limiting buffer throughput. Effects of pin speed limitations on throughput and chip count are the subject of Section IV, while power dissipation is explored in Section V. Section VI summarizes the way the capabilities of a single-chip switch scale with technology, and Section VII discusses additional features that may be added to that singlechip switch in 0.25-m technology. Beyond the content of Fig. 1 , Section VIII outlines topics for further investigation. Conclusions are presented in Section IX.
II. BUFFER CAPACITY
The majority of chip area for shared buffer switches is occupied by the buffer itself and its peripheral circuitry. This area constraint is the first that we investigate. Rather than using loss rate ( ) and offered load ( ) as output parameters, Fig. 1 . Analysis methodology for the performance study. refers to minimum feature size, to offered load, R to cell loss rate, and N to switch dimension. we set appropriate (strict) values of and as minimum requirements, and thus determine buffer needs. 1 
A. Cell Storage Requirements
We consider two traffic types. Random traffic (with and ) requirements are taken from [9, Figs. 9, 12] and [10, Fig. 12 ]. The requirements for supporting bursty traffic (with , Poisson arrivals, and average burst length ) are taken from [11, Fig. 3 ] and multiplied by the reduction ratio from [10, Fig. 12] . This is the same approximation technique employed in [12] . The resulting buffer requirements are summarized in Table I .
B. Area Calculations and Examples
These buffering requirements must be converted from cells to area. To that end, we survey five published implementations of shared buffer switches [13] - [17] to derive and check some 1 Because we have taken this approach, our conclusions are strongly influenced by our assumptions about traffic profiles, in addition to our selection of minimum criteria. For the sake of tractability, only one specific bursty traffic profile has been considered in this paper. Buffer requirements will also vary depending on the selected buffer management policy. heuristic area formulas. Area is divided into five components: (1) where is the area due to RAM storage of ATM cells, is a dimensionless scaling factor for each design derived from the RAM core cell area, is the technology minimum linewidth, is the switch dimension, is the number of cells buffered per port (as in Table I ), is the ATM cell width in bits, and is the number of chips (or bit slices) into which storage is divided. Further, (2) where is the area due to standard RAM periphery, such as decoders, sense amplifiers, and write drivers. (3) where is the area due to serial-to-parallel and parallelto-serial conversion circuitry at the input and output of the buffer. (4) where is the area due to the read, write, and decoder peripheral circuitry associated with serial-to-parallel and parallel-toserial converters. (5) where is the area due to control circuitry, and
where is 2 if multiple latency priority levels are supported, and 1 otherwise; is 1.5 if multicast connections are supported, and 1 otherwise; is 1.5 if a throttled-buffer architecture [18] is supported, and 1 The scaling factors and exponents in the above formulas were derived heuristically by observing the dependence between the various configuration parameters and the actual implemented areas-for each of the five area components-as measured from die photos included in the reference publications [13] - [17] . Where a dual-port RAM cell is used for buffer storage (the last example [17] ), we derate the parameter by a factor of 1.7 in the formulas for , and , to compensate for the larger area of the cell.
The actual total areas of the five example chips were compared to the areas predicted by (1)- (6) , and errors varied from 0.5% to 55% (the formulas consistently underestimate the actual area required). If we omit [13] , the error spread is from 0.5% to 14%, and we consider the formulas validated for the purposes of this study. We will use (close to an "optimum" layout density) and (the "smallest" possible ATM cell, with no local information overhead) for all projections. Hence, our predictions will tend to be aggressive, although they do not necessarily represent a true "upper bound" on performance.
C. Area Requirements and Chip Counts
A number of physical effects simultaneously impose limits on chip area: yield, optical registration, wafer costs, and others. It is beyond the scope of this paper to investigate individual effects; rather, we simply use the trend line presented in [1] for chip area versus minimum feature size, believing that it adequately reflects the cumulative effect of the various physical phenomena. The following formula is derived from the graph by Komiya [1] : (7) where is in micrometers and is in millimeters squared. We further assume that active area is limited to 0.75 of the total chip area, and the remaining area is occupied by bonding pads, data buses, and the power distribution grid. Table II shows the resulting area limits for the line-widths of interest. Fig. 2 shows the result of applying the buffering requirements from Table I , through the area formulas (1)- (6) , to the chip area bounds given by (7) . Single-chip implementations are possible for many configurations under the random traffic assumption, but for relatively few configurations under our bursty traffic model. In fact, with this traffic, is required to provide sufficient buffering for even the smallest switch. The figure makes it clear why random traffic assumptions were made in dimensioning buffers for early single chip switches. Clearly, as switches are developed to handle more demanding traffic profiles associated with a variety of traffic types, large buffer capacities and accurate cell loss models will both be required.
III. BUFFER THROUGHPUT
The next limitation we wish to investigate is that caused by shared buffer throughput. This is commonly assumed to represent the bottleneck for a shared buffer switch. This section shows that it represents a very wide bottleneck for future technologies.
The two limiting trends are embedded memory operating frequency and memory word width. The former can be taken from the trend line in [3] as (8) with in micrometers and in gigahertz, under the aggressive assumption that cycle time equals access time, and assuming only a single access port to the shared buffer (the trend is shown numerically in Table II ).
This analysis assumes that on-chip dedicated processing for buffer management will not constitute the bottleneck. Elsewhere [19] , we demonstrate buffer management circuitry achieving MHz in 0.8 m BiCMOS, compared to the prediction of 77.6 MHz. Aggressive pipelining, and other de- sign techniques borrowed from leading-edge microprocessors, must be brought to bear on this aspect of the design problem.
The width trend can be derived from a comparison of core cell size and chip size from (7), with the assumption that the physical memory word may span up to 0.5 of the width of the chip. 2 The resulting memory word widths, in ATM cells, are also shown in Table II .
Writing and reading of multiple ATM cells in parallel (for ) may be achieved by: 1) dividing the shared buffer into multiple banks, each with separate access control, perhaps using an architecture similar to a shared multibuffer [20] ; 2) increasing the minimum unit of information processed by the switch, essentially switching bursts of cells, similarly to the space switch in [21] ; or 3) employing some as-yet-unknown mechanism.
The resulting bandwidth capabilities are shown in Fig. 3 (labeled "memory"). A single-chip shared buffer in 0.25 m technology is capable of an aggregate throughput of 1.32 Tb/s. These impressive capabilities are an example of a general observation: massive bandwidths are available by tapping the data that are transferred in the columns of a memory, if these data need not be transferred on and off chip. This principle has been used to advantage in page-mode DRAM's, computational memories, and other devices.
IV. PIN SPEED AND I/O BANDWIDTH
In this section, we investigate inherent physical limits imposed by chip boundaries, by determining how quickly bits can be moved on and off chip. We will find that this is an important limiting parameter. We first derive the physical bounds, and then determine their impact on switch performance.
A. Pin Count Limits
Chip pin counts are limited by chip area and I/O pad pitch. An area trend was presented in (7) . Pad pitches are dependent on bonding and interconnection technologies, rather than onchip constraints. According to [22] , a typical pad pitch for a chip to be mounted on a printed circuit board (PCB) is 200 m, independent of the minimum feature size on the chip itself. The authors of [22] foresee an eventual decrease in pitch to 50 m in response to advances in flip-chip, tape-automation bonding (TAB), and multichip module (MCM) technologies. As well, two-dimensional pad arrays will also be possible [22] , built on top of all (or a portion) of the chip, rather than merely along its periphery.
We begin our analysis by assuming one-dimensional pad arrangements with a 200-m pitch, typical of today's interconnection technology (we will relax these assumptions later). We also assume that 1/4 of all chip pins can be devoted to ATM cell input, 1/4 to output, and 1/2 to power, clocking, setup, control, monitoring, and testing. The effective bandwidthlimiting number of pins on a chip is thus (9) where is in millimeters squared and is related to by (7).
B. Pin Speed Limits
This development closely follows that presented by Bakoglu [8] . Here, we assume that the switch is composed of two bit-sliced shared buffer chips, a control chip, and headerprocessing chips. This is meant to represent an average implementation-for most switches that require a significantly larger number of bit slices, wiring effects will be dominated by the large number of header-processing chips. Thus, for the purposes of this portion of our study, the number of chips in the switching system is The average board-level interconnection length (in units of chip pitches) is (11) where is Rent's constant, estimated for this application. With an estimated fan-out , number of board wiring layers , wiring efficiency , and wiring pitch m, we can find the average chip footprint as (12) from which we can find the worst case board trace length (13) The capacitance associated with a minimum-sized transistor can be estimated (in femtofarads) as (14) Given the results of (9)- (14), with an estimated channel resistance for a minimum sized PMOS transistor in saturation k (roughly independent of technology), pad capacitance pF, package output impedance , intrinsic wiring resistance /cm, intrinsic wiring capacitance pF/cm, and transport velocity cm/ns, we can determine the worst case chip-to-chip delay (15) where is the number of stages required in the output pad driver, and is given by (16)
C. Throughput Limits
The information above can be combined to determine the number of pins per port (constrained to be an integer) and their operating frequency. Switch throughput per port can then be obtained. This throughput is plotted in Fig. 3(a) , superimposed on the memory-limited throughput. For m, the switch bottleneck is chip I/O, and not the shared buffer (as commonly believed). Thus, advances in packaging and module technology are required just to keep up with embedded memory capabilities. Effort should be devoted to widening this "new" bottleneck-a scenario for developments in this area is presented below.
We can now derive chip counts for various switch configurations, based on memory capacity, memory throughput, and I/O bandwidth. These results are plotted in Fig. 4 , for four switch dimensions and four different line speeds. Where there are two lines plotted for a single line speed, different limitations are imposed by the memory (denoted by the suffix "M") and the chip I/O (denoted by the suffix "X"). In both of these cases, the shared buffer function is bit sliced across multiple chips until the aggregate bandwidth can be satisfied by the number of memories and the number of pads, respectively. The higher of the two lines indicates the limiting case. In many cases, chip counts are limited by the number of bits that can be buffered per chip to satisfy cell loss requirements; these chip count limitations are labeled "random" and "bursty" on the graphs.
One could simply plot the limiting case, reducing Fig. 4 to four simple, single-curve graphs. This hides the underlying issues, however, and also hides the options available to the switch designer. The designer may choose to implement sufficient buffer capacity to support random traffic or bursty traffic (as we have defined it). Alternatively, a continuum of capacity choices is possible, some with good fits to throughputlimited chip counts, and these will result in intermediate levels of cell loss.
D. Improvement Techniques and Results
Although we have demonstrated that standard chip I/O technology cannot keep up with embedded memory speed, this bottleneck can also be widened by applying appropriate improvement techniques. In this section, we present a realistic scenario for introducing these techniques. This scenario is not meant to represent our opinion of the only valid sequence of events, but rather to demonstrate how improvement may be achieved. The order can certainly be rearranged somewhat, and some gradual trends are represented by discontinuities (note that these assumptions are reflected in Table II ).
• At m, decrease the pad pitch and board wiring pitch to 150 m.
• At m, introduce BiCMOS pad drivers with .
• At m, decrease the pad pitch and board wiring pitch to 100 m.
• At m, introduce MCM technology, with parameters from [8] , including the significant change to pF; the module wiring pitch is 50 m with three wiring layers.
• At m, decrease the pad pitch to 50 m and the module wiring pitch to 20 m with four wiring layers.
• At m, introduce a two-dimensional grid of pads, built on top of the chip, perhaps in a fourth or fifth metal layer; the pad pitch is 100 m, and the module wiring pitch is 10 m; five wiring layers. The net effect of these improvements is reflected in Fig. 3(b) , again superimposed on memory limitations. In most cases, the I/O bandwidth is no longer the bottleneck, and as a result, we may neglect the corresponding chip limits ("X") in Fig. 4 . Nonetheless, it seems that memory throughput is accelerating more steadily, and it appears doubtful that economically feasible means exist of keeping I/O as fast as memory. Fortunately, optical interconnect should solve this problem decisively. Emerging SiGe technology [23] can be integrated together with advanced silicon BiCMOS processes to support optical I/O. Transistors with an of 100 GHz have already been demonstrated. These ultrafast bipolar transistors can be used to implement not only the optical receivers and transmitters, but also the serial-to-parallel shift registers required to interface between the optical I/O and the shared buffers.
Since 1 Tb/s throughputs could be supported by 32 optical ports operating at 32 GHz, a shared buffer switch employing optical chip I/O would have lower pin counts than a corresponding implementation using standard I/O. In fact, this will likely be accompanied by a general trend toward a higher reliance on optical multiplexing and demultiplexing external to the switch. The resulting heavily multiplexed lines are processed by the switch chip as single entities (as in [24] ).
V. POWER DISSIPATION
In the previous two sections, we have demonstrated the capability of shared buffer switches to support exponential performance growth. Will an exponential increase in power dissipation accompany these performance gains? Or will power dissipation remain more or less constant? This is certainly an important question to consider, as it affects the physical realizability of the impressive circuits we have described.
According to Lardy [6] , if we allow and as decreases, and scale approximately proportionately to , power dissipation is roughly constant-independent of .
On the other hand, Komiya [1] shows that power dissipation is related to performance (expressed in MIPS or megahertz) according to (17) or, by cross-multiplying, (18) In this section, we have shown that performance improvements will be superlinear as decreases linearly (see especially Fig. 3 ). This can be expressed as (19) where is a constant to be determined. Hence, by substituting (19) into (18), we could expect (20) This implies that there is a possibility that power may increase without bound as decreases.
Analysis is required to determine which view holds in our case. As well, if (20) is true, we should determine if power dissipation can be expected to exceed reasonable values for m. Total power can be expressed as (21) and we will consider I/O power and internal dynamic power separately.
A. I/O Power
We will neglect input power, 3 since I/O power is dominated by dissipation in output circuits (which must drive relatively large off-chip loads), where the power can be expressed as (22) where is the output pin count, pF for a printed circuit board and pF for an MCM, is the pin data rate (transitions will occur at half this rate in the worst case), and is the signal swing. Since we assume that the shared buffer will be the chip bottleneck,
TH
where TH is the aggregate buffer throughput. We set for worst case signal swing, 4 and set (24) which corresponds to the realistic power supply voltage trend from [6] , shown in Table II . The resulting power dissipation
is shown in Table II as . Thus, I/O power is not increasing beyond reasonable bounds. It is likely, however, that I/O power will increase with a shift to optical technologies: dissipation of 2 W per receiver/transmitter pair is estimated for the near future [25] , resulting in I/O power dissipation of 64 W-high but manageable-for the 0.25 m switch.
B. Internal Dynamic Power
Our task now is to predict how the power consumption in the on-chip memories, parallel/serial circuits and control logic will vary with throughput. This is a particularly challenging assignment since there are many building blocks involved, and each could potentially have a different power-performance relationship. As well, different design styles are possible, and these will also affect power dissipation. Thus far in this paper, we have based our derivation of physical trends on theoretical and experimental work gathered from various "reliable" sources. Unfortunately, no accurate study exists of power dissipation in memory-intensive digital circuits, and it is beyond the scope of this paper to perform such a study. Therefore, we instead attempt to derive a trend based on measured results from our set of implemented switches.
If we can characterize internal dynamic power dissipation of the switch chip according to (18) , we can determine a figure of merit for each switch implementation, given by (26) This value is calculated for each of our five sample switches, and the results are shown in Table III .
The figure of merit varies by a factor of almost 20, and is greatly dependent on layout style and circuit design priorities. Clearly, we cannot make accurate predictions based on this information. We can, however, check the bounds. For Gb/s, m, and J/Gb m, we estimate power as 1700 W! At the other extreme, with J/Gb m, power is 92 W. This suggests that the terabit single-chip switches we have projected should be realizable, but they will definitely be "hot chips." Successful implementations will require careful power management to achieve a low value for , and to dissipate heat efficiently. We estimate total power dissipation, including optical I/O, for the 1.3 Tb/s 0.25 m single-chip switch to be 156 W. An example of dealing with this order of power dissipation is the "thermosiphon" attached to a 115 W microprocessor in [26] . Supply current could be routed by a gold bus bar arrangement similar to that used in [26] . 5 
VI. SINGLE-CHIP CAPABILITIES
We can now explore implementation possibilities for singlechip shared buffer switches. In Fig. 5 , we show the maximum switch dimension supportable for both random and bursty traffic. The dimension is limited both by the memory throughput capability (as explained in Section III), and by storage requirements (as explained in Section II). We assume that no 5 The intent of this paper is to explore the limits of feasibility, and not necessarily the best design practice. Should the power dissipation of a singlechip switch prove to be unmanageable or otherwise unjustifiable in light of other design alternatives, lower power multichip realizations would certainly be appropriate. limits are imposed by the chip I/O. For bursty traffic, the limit is set by storage requirements for all but OC-192 line rates, while in the random traffic case, storage requirements only come into play for m. Note that, for the defined OC line rates, aggregate throughput will fall short of the aggregate memory capability, due to quantization; raw memory throughput capability, as plotted in Fig. 3 , is not quantized to specific line rates.
Another useful figure of merit is the cell loss rate attainable by a single-chip switch. Under the bursty traffic assumptions made in Section II, these data are plotted in Fig. 6 . As expected, the loss rate is unacceptably high for m, while it is quite tolerable for m. As explained in Section II, this analysis is approximate, and only one set of traffic parameters is considered; nonetheless, the graph highlights trends, and confirms that arbitrarily low loss rates will be achievable for m. A horizontal line, drawn at 10 , yields the same chip dimension versus linewidth information as a horizontal line, drawn at a chip count of 1, in Fig. 2(b) .
VII. ATM NETWORK-ON-A-CHIP
In the year 2000, ATM switch designers will have access to 0.25 m technology. Tens of megabits of embedded memory will fit on a single chip. The most powerful-and likely the most commercially successful-ATM switch will be the one that best exploits the tremendous possibilities offered by this technology. While we have not compared the shared buffer architecture with other switch fabric architectures, we have aptly demonstrated the impressive capabilities of a shared buffer switch at m. These capabilities include an aggregate throughput of 1.32 Tb/s, with an internal clock speed of 444 MHz. The chip has an area of 585 mm , corresponding to a maximum buffer capacity of 26 Mb (approximately 61 000 cells). We estimate power dissipation to be 156 W at a 1.8 V supply; this is the sum of the predicted optical I/O power plus the lower bound internal dynamic power, as determined in Section V. 
A. Additional Functionality
The switch architecture employed thus far has not implemented multicasting or multiple priority levels. These are important features for future networks, but require a control circuitry area overhead of approximately 200% (see Section II). This overhead is unacceptable for approximately m, but is quite easy to accommodate when m. We therefore add these features to our m switch. It seems to be universally true for mature technologies that a user's peak demands can never be met. By employing a throttled-buffer architecture [18] , we can allow the peak bandwidth on a switch port to approach the aggregate throughput capability of the entire switch. By extension, the per-user bandwidth may approach the same limit. This is an intelligent way of dealing with insatiable user demand. In the resulting architecture, however, shared buffer access is a blockable resource-this access must be controlled efficiently and intelligently for acceptable operation; fortunately, the asynchronous nature of the switch enables a rich variety of service algorithms.
We estimate that support for a throttled buffer architecture results in a further (compounded) increase in control circuit complexity of 50%, and we include this function in our network-on-a-chip switch as well.
B. On-Chip Header Processors
Even with the above functions added to our switch, there is still a great deal of extra area available (the shared buffer to satisfy bursty traffic requirements occupies 156 mm , and the full-featured control circuitry occupies another 19 mm , on the 585 mm die). We propose including the header processing circuitry on our chip; this is composed of a lookup table for header translation, as well as usage parameter control (UPC) storage and processing. Although the lookup function is ideally suited to content addressable memory (CAM), standard CAM architectures will not be able to support the required capacities. An integrated "preclassified" CAM/RAM [27] is capable of supporting multimegabit capacities with fully parallel search operation. In addition, all UPC information can be stored in the RAM portion, and all necessary UPC processing can be performed using full custom circuitry integrated with the custom memory [19] .
Using information from [19] , the area required for each CAM/RAM-based header processor may be estimated, assuming 320 b of storage per connection. Combining this with the required shared buffer storage for ( , length 10) bursty traffic, we can determine the number of connections that can be supported by the header processors at each input port. Results are plotted in Fig. 7 . We may conclude that a complete ATM switching network of any size up to 128 128, with any link data rate up to OC-192, will be implementable on a single chip within three years. 6 
VIII. DIRECTIONS FOR FURTHER INVESTIGATION
There are numerous architectural alternatives for the implementation of an ATM switch, including variations on the shared buffer (shared multibuffer [20] , searchable queue [28] , throttled buffer [18] ), extensions of the shared buffer (growable switch architecture [29] , shared buffer direct access [30] ), and space switches (crossbar, Batcher-banyan, and many others). A performance study similar to the one of this paper could be conducted for each of these types, and these could be combined into a comprehensive survey. Further, the accumulated information may be combined into an "expert system," which could be queried much like a simulator, allowing switch designers to: 1) select an optimum switch architecture, configuration, and physical partitioning for the task at hand and 2) estimate the performance of the final implementation, in terms of throughput, area, and power.
A similar study, with an even broader scope, could look at an entire ATM switching system (including interface devices, adaptation protocols, and the switching network), and enumerate bandwidth (or perhaps, similarly, area, power, or cell loss) limits throughout the system, looking for bottlenecks, or trouble spots. Further, the study could determine the appropriate technologies and resources that must be applied at these trouble spots. This appears to be a large modeling problem, where a systematic mini-max approach is required, along with accurate performance models.
The inevitable advances in implementation technology will not only have (perhaps drastic) effects on the performance of known architectures, but they may also allow the introduction of previously unknown architectures. Obviously, studies such as the present one must be updated as new technologies become either available or better understood. Beyond that (and potentially most interestingly), a better understanding of physical performance limits could point toward areas ripe for fundamental technological innovation.
IX. CONCLUSIONS
An awareness of the physical limits of ATM switch performance allows a switch designer to make intelligent choices in the selection of architectures for future switching networks. The analysis in this paper, along with the first published paper on the topic [7] , relate physical independent variables (implementation technology, line speed, and switch dimension) to the physical performance of ATM switches.
This study has focused on shared buffer switches, and has demonstrated that they are well-suited to advanced circuit technologies. Embedded memory throughput only limits aggregate switch throughput when advanced chip I/O technologies are applied; otherwise, the bottleneck is at the chip boundary. A single chip shared buffer switch in 0.25 m technology will be able to support 128 OC-192 links, aggregate throughputs of 1.3 Tb/s, cell loss rates less than 10 for bursty traffic, as well as multicast and multiple priority capability, and on-chip header and UPC processing. 
