Introduction
Asynchronous Transfer Mode (ATM) is a circuitbased switching technology that meet,s t,hc everincreasing iriarkct demands for net,work bandwidth. ATM riclworking t~cclinology proviilcs support for Broadband Integrated Services Digital Nclworks (U-ISDN), the high-spccd t,raiisfer of voicc, video and data over a single net,work. In addition, .WWI teclinology is an cxcellcnt choice for net,work backbones that, carry sizeable amoiint,s of tmffic. In t,hc near futiire, we may cxpect to sce ATM switching t,echnologies dcployed to provide local arid wide-arca int,crnet,working.
While tlic cost oC currently-available ATM networking solutions has hindcrcd it,s adoption and deploymcrit on a large scale, we bclievc that, a low-cost soliition and a number of rccciit, markct devclopmcnts promise t,o create a sizcable demand for ATM swkcliing tcchnology.
We hntle dC7JCkJpCd (1% 11,Tdl t w e for a n I R A MtCarnbridgc University, S11ppor1.cd by a Marie Curic l?a-'Ilarverd Univorsity scarch 'Ihining Grant ondcr TMR activity 3 ATM swit,chcs in recent, years have hocn rnovirig out, of rcsearch t,cstbeds and into cornmcrcial iisc. Tho rcccnb adopiion of several ATM sigrialling st,andartls by t,hc ATV1 Forum increases t,lie potential for int,croperability of differcnl. brands of ATM switchcs with different capabilit,ies and opens up tlie ATV1 switch rnarbat to i:ompcl;it,ion. Wc helievc that, the Iargescde dcploymcnt of fiber-bascd Wick Area Networks (WANs) hy telcos and cahlc cornpanics will lower bot,li the cost of iibcr-intcrfacc circuit,ry and the cost of fiber deployment. As t,hc costs associated with installation of fibcr net,works dccreasc, we bclievc that t,echnologies such as ATNI and Gigabit Ethernet, will hecome more viahle Local Area Network (LAN) technologies. Adtlitionally, as cable companics and tclcos hcgin t,o deploy advanccd, widc-area, high-baridwidth digit,al scrviccs over these ncw fibcr networks, t,hcy will rcqiiirc advanced switching technology that is both small and iiicxpeusive; wc €ccl that an A' I?ivI network using our switch will proviilc (.hat, technology.
Architectural Overview
Thc switch we have dcsigried is in most rcspect,s a 32x32 port fully-int,erconni?ctcd oiit,put-buffcred switch capable of opcrat,ing at OC-3 (1% Mb/sec). The switch has 35.5 Mbits of buircr space Cor cclls (85 IcCells of space), its buffer is organized iu a pipclincd fashion as proposed in [I] an1l used in 121, and cclls queued for a singlc OC-3 out,put port, can occupy as much as one half of t,lie huffcr at any givcn time. Thus, while the buffer is not, fully shared, it mimics t,hc hi:-liavior of a shared buffer in most t,raffic couditions.
Modifications lo our core 32x32 archit,ecture have hecn made to allow ports t,o bc "bundled" and function together as a single port servicing a higlicr d a h rate.
For instance, our swit,ch can he configiircd seamlesslg to fiinctiori like a fully-intcrconiiected 8x8 OC-12 (622 Mb/sec), or 4x4 OC-24 (12Gb/sec) swit,ch. Any bundlcd port can have access t,o the entire memory space, and thus for any non-OC3 port, the cell menlory is cffcct,ivcly fully sharcd. A det,ailed clcscriplion of the bundling schemc can bc found in [3] . A block dia,gram of our switching chip can be found in Figure 1. 0-7503-5682-9/99/$10.001999 IEEE. We must hrielly nieiidiort tlic fmictionalit,y of t,lie! exist,ing support harilwarc. Each hi-directional port must be supplied wil.tt t,wo distinct pi One is a opl.ical/elcct,rical conversion chip, which handles all the light genrration and dctcctiiin as wi:ll as clock recovery. This chip, wliiclt is indcpetirlcnt~ of cell framing, hands off I.hc electrical data to a SONETbased cell del.cction processor (such as the ortc made hy PMC-Sicrra), which processes and frames each cell and pcrfol-nis some error detection. Of note is t,he facl. that, 1.hcse chips will pad out l.he lieader to G hytcs, with t,he s i d i hy1.c containing error dct,ect.ion informatioii, and t,hiis briiig the t,otal cell length t,o 54 bytes. For rcasoris that will beconie clcar later, this exlra byte paddiiig makes the cont.rol of our swit,cli far sirrrplcr. Also, these support, chips arc Iii-directional, so dat,a [low in the qiposit,~ &rei:t,ion occurs at, tire OI I I , -put, port,s.
Getting Cells Onto and
Our ATV1 switch can be set up in several differerit, configurat,ions, and each configuration has a diiCereut, ~minlicr of "virtual port,s" operat,ing at, a diffi.rent data rate. Thus, the pin width for each "virtual port" is iiaturally dcpcndcnt, on t,he type of port being serviced. For an OC-24 data port, (typically used in ii 4x4 fashion), the pin widlli is 32 pins per port (or G4 pins per hi-direr:tional port). If our chip is to ha used as B 8x8 OC-12 switch, t h e n t,he pin width is halved to 16 pins per port. Likewisa an OC-3 port, will require 4 pins.
In all of these configiirations tlie pitis will be clockcxl a,t ;in external speed of roiighly 40 h M z , a spectl easily realimblc on a PC board.
It is important t,o notc that tlierc is nothing prcventing our swikh from being setup in sonic "hybrid" coiifiguration. sion circuitry on-chip so that it, was not necessary to worry about, performing t,he light gericratioii and c kon on-chip. Also, t,he serial-to-parallel framing circuit;ry ninst. run with an extremely fast clock, so pushing it, off-chip simplifies our design significantly, and docs not significantly affect t,he coinplexit,y of an erit,irc switch built wit,h our chip at its core. Also, t,liesc chips are readily available on t,hc markct,.
Cell Headers: Routing Lookup, Handling, and Storage
Let's now cxarriine how the headers are stored and updated. A n ATM cell header is 5 bytes (40 bits) long, and contains cxaclly one 24-bit ficld (bits 4-27) which rnusl. be updated by the switch (the ohhers can be left init,ouchcd). This is t,lie virtual circiiit/pat,li idcnt,ificr (VCI/VPI). The switch must, maintain st,atc t,hat rnaps an input pair (input.port, input.VCI/VPI) to an out,put pair (output-port,, output.VCI/VPI). Thc input-port is known (since the switch knows on which bus the cell arrived), a,nd the input.VCI/VPI can be determined. from t,he ccll header. The fields of the (outpiitLport, output.VCI/VPI) pair must be compul.ed for each arriving cell, as the outpiit-port field ilet,crmines the output, port for which the cell is destilled, and the outputLVCI/VPI must be written into the cell's header bcfore it is sent out. The translations betwccn (inputLport, input.VCI/VPI) and (outpiit-port,, ont,put-VCI/VPI) arc initialized externally by t,hc routing control unit,. The switch rnaintains 32 lookup t,ablcs, one per input port, in its internal DRAM; tlicsc t,ables arc used 1.0 map t,hc input.VCI/VPI to an ont,put port and ontput VCI/VPI (sincc each port. has its own table, the input port nced not be spccificd in i;hc mapping).
We would like to bc able to support many v i r t d circnit,s through t,hc switch, idcally as many as 256IC (or 8K circuits per OC-3 port). Howcver, therc are 224, or lGM, possible VCIjVPI identifiers. Keeping a table of 224 ent,rics per port is proliibit,ivc in tcrms of memory uscd. Even using a BK-cntry open-hash table is not idea,l, as the worst-case probe t,ime for such a table is 0(8K), which would severely impact the latency arid synchronization of the swit,ch. Luckily, since t,hc problem only affects the input VCI/VPI (since we can easily store cnt,irc 24-bit output VCI/VPIs), wc can merely specify a ccrt,ain 14-bit snhset, of the VCI/VPI field which will bc uscd by our switch (and only use t,hose VCI/VPIs when cstablishing VCs or VI'S). Tlrcti we nced only addrcss a 8IC-entry liricar t,able of (oiitput.port, output,-VCI/VPI) cnt,rics with t,hcse 14 bit,s in order to determine t,hc destinat,ion and ondput header for each incoming packet. For a 32-port swit,ch, each entry in this table takes 29 bits (24 for ontput VCI/Vl'I and 5 hits t,o select, from 32 output, ports), so each port,'s table t,akcs up 232 kbits, and t,ho spacc overhcad of thc 32 rout,ing t,ablcs is 7.2 Mbits.
Becausc we only st,ore ccll bodies in tlic niaiii mcrriory bank, and because the cell hcader arrives in the swit,ch in iis own set ol eight &bit, words, we can perform t,hc tasks of writing t,hc cell body t,o rncniory in parallel pcrforrning t,hc headcr lookup and stora,ge. l?mt,hcrniorc, the ccll headers will be st,orcd in a 8-st,age pipelined memory in order t,o maximize dat,a thronghpu t.
Each output port has associated with it, a FIFO qncue. Each ent,ry in the qucue is a 17-bit. word. cont,airiing the address of t,he cell in tho main mcrnory hank and the hcader in the header rricniory (they share the same 17-bit address). These qneiics are filled as follows. When a cell header arrives at an input, port, the ccll is assigncd an address from t,lie free address FIFO. The header is then sent t,o the look-np t,able for its iripul port, whcrc it is lal.clicd int,o a register. The addrcss assigned t,o the cell is also lat,chcd by a rcgister associated with the look-up tablc, and is also sent, t,o the pipeliticd incmory. The t,ablc lookup and translation has been cornpleted by t.he timu tlic cell payload arrives at the switch, and t,hus tlie payload arid lieailer are writt,en to t,hcir respective pipelincd ineniories diiring the sarnc clock cycles. Meanwhile, t,hc addrcss of the header and payload in riiernory is Forwarded t,o tlic appropria,t,e output. port and storad in that, port's outpnt addrcss queuc.
Once an output, port, sees that, its queuc is noncmpty, it nses the stored address to access the pipelincd memory hanks and l,n sdrcani l,lic ccll hcadcr and body that it, retrieves ont, to the external da1.a bus to he sent, on the fiber. It, coiitinnes doing l.his iintil the queue is empt,y again, at, which point it, waits for more packets. Thus it, is the output, port t,liat drives its own t,ransmission, and which rctrieves the data from mem ory, so t,licre should he no unnecessa,ry output blocking in our switch design. There is some cent,raliml corit,rol that synchronizes and orders the port,s' aci to memory so that contention is not an issne.
Memory System: Size and Refresh
It was decided that an ATM switch was well snit,cd from ATTM vcndors. In addition, the widespread deployment, o l high-speed TVAA' s by t,elccortimunicat,ioiis companies will increase demand for low-cost, higlibandwitlt~h ncl.working solutions. We believe t,hat our switching core is idea,lly suit,cd to such applicat,ions because of its large feat,nre set, its flcxiliiliby, and its sinall siza. Not, only is our design mell-snil,cd to WAN ncl,morbing solntions, it, is also appropriat,e ror higlispeed LANs. While t,he costs associated with whyt,ion of high-speed ATV1 riet,working 11;~ve bccn prohihitivcly high, we believe that large-scalc deployrneiil of such networks will provide an economy of scale. LAN switches based npoti our integrated switching fabric will be well-positioncd to capturo sizeable marbet share. Flirt,licruiore, through the use of appropriate "filter" circuitry, we helicve that our switching fabric can be used to siipport, network t,echnologies other h n ATVI-l'ast Etliernet or Gigabit, Etbernct for cxarriplc. 111 conclusioii, wi! helicva that, t,liis design for a,n 1R.AM-basd uet.work switching fabric provides a Iiigliperfoorinancc, low-cost product. thal. is well-posii.ioucd to provide significant, profitsbilit,y in the high bandwidt,li nal.working market.
socvcr.
