Abstract
1.-Introduction to the ATM switch
The switching function is performed using self routing spatial techniques [ 1, 2] taking advantage from the statistical gain effect [3] . Its maximum operating rate is 2.5 Gb/s. The switch core is implemented with only one IC, the ICM2 [4] . A switching fabric is constructed by interconnecting several ICM2 chips together, therefore it is an scaleable architecture. Figure 1 shows the architecture of a 4x4 High speed 2.5 Gb/s switch. Every CM and MC blocks in the figure are implemented with one CMC circuit. At the input of the switch ATM cells are converted to an internal parallel format: Microcells (CM outputs), while at the output of the ICM2 ICs the routed microcells are reconverted to ATM cells (MCs). It can be easily deducted that cell parallelism is used to increase the throughput [5] . The CCITT recommended ATM cell size is 424 bits. It was decided to manipulate internally to the switch a 512 bits cell to increase the flexibility. In fact two different parallelism degrees are implemented: 32 and 64 bits. The switch manipulates internally 4 bits wide data buses. The different throughput values of the switch can be seen on the 
3.-Output processing and ATM cell reassembling: The MC function
The MC receives 8 or 16 microcell flows generated from the ICM2 switch, and reassembles them into 8 bits parallel ATM cells at 31 1 Mhz (exactly the same format as the one fed to the input of the switch of Figure 1 ). The microcell flow is first synchronized in the MC and then the resulting parallel cell is resized to be finally serialized to 8 bits at the output. Apart from this basic behavior very similar functions to the ones of the CM are done: cells are identified (VPWCI), ATM headers can be replaced by stored ones in an external memory, previously reserved cells are extracted via the microprocessor interface and statistical measures are also obtained.
4-The CMC architecture
The CMC stands for Cell to MicrocellMicrocell to Cell converter. Its architecture is shown on Figure 2 . Basically we can divide the circuit in three main blocks: The CM block, the MC block and the microprocessor interface which is shared by the two above blocks and provides the communication control with an external microprocessor. 
Figure 3. ECL Cell Integrity Controller
To be able to work at 31 1 Mhz the maximum gate level permitted was three in ECL. The ATM cell is paralleled and resized to a 64 bid64 bytes cell data flow in two steps (see Figure 4 ): the first one with ECL cells and secondly with CMOS flipflops to reduce power dissipation.
ECL

Figure4 -Parallel data converter
The parallel data is written to a FIFO to resize the ATM cell from 53 to 64 bytes. The decimation operation above implies that a new data clock can be used which frequency is: ckin/6=5 1.86 Mhz. The latter was chosen to avoid a Fifo overflow. Data interleaving was needed due to the operation requirements of the FIFO (44 Mhz cycle time). So this means that one logical Fifo is implemented with two physically (64 bits x 32 words each). The effective reading rate is : ERR = cksys* (cell size)/(parallelism degree).
The Fifo is able to store up to 4 complete cells. To insure data integrity empty cells are inserted to the flow by the reading process circuit of the Fifo. 
CM/MC internal cell Format
Finally the Microcell generation block is formed by: 1. A synchronization Fifo (two interleaved Fifos of 64 bits x 32 words), since the microcell flow at the output of the IC has a data rate of 65 Mhz to achieve a 2.5 Gb/s throughput.
2. A labeling information processing block which inserts parity (PAR), a sequence number (SN) and the value (IN) of a microprocessor interface register (see Fig 6) . 3 . A microcell generator which transforms the cell [ Fig.6 ] into 16 or 8 microcell flows depending on the microcell format (two formats available: 4 bits wide and 12 or 20 bits long). This block inserts also periodically a synchronization microcell to build a framed structure of microcells at the outputs of the CMC functioning as CM (16x4 bit output data buses).
4.2-The MC architecture
The MC block architecture [ Figure 21 is almost a "mirrored" implementation of the CM block functionality. The MC input consists of 16 or 8 microcell flows depending on the format it is working with I3.11. In fact bi-directional pads are used for the microcell buses. The MICROCELL DELINEATION & CONVERSION block has the following structure: Every incoming microcell input is processed by identical blocks (microcell delineation) in which the framed structure is recognized and the inputs are synchronized using an FSM with the following state diagram:
U
Figure 8. FSM in the microcell delineation blocks
In the synchronized state the data is passed directly to the input Fifos (Figure 7) . The write pro. blocks generate the write control signals for the Fifos. These are asynchronous Fifos (4 bits x 6 words) build with flipflops due to the 65 Mhz clock frequency. They are used to compensate slight differences on the input clocks (ckinl ... ckinl6) and to synchronize all the microcell flows discarding the routing tag. A single clock is employed to read the Fifos obtaining a 64 bits DATA bus with cells of 64 bytes (Figure 6 ).
The ATM PROCESSING block checks the parity (PAR in Figure 6 ) of the labeling information setting a flag in the microprocessor interface if an error is detected. The ATM cell header identification and statistical measures are performed on exactly the same way as for the CM [3.1] . The ATM cell header can be replaced by one that is stored in an external memory. In fact the data bus, address bus and control signals are shared by the CM and the MC (same pads). Control is given (multiplexing) to one of the blocks depending on the mode the CMC is working with. Reserved cells are extracted to the microprocessor via an asynchronous Fifo (two interleaved Dual port Rams 64 bits x 16 words). When a complete cell is available (flag), the microprocessor can read it. The CELL REASSEMBLER block has as main functions to convert the 64 bytes cell to the 53 bytes standard ATM cell and to serialize the cell data to 8 bits at 31 1 Mhz (same format as for the CM input). The latter is implemented on an ECL block. Its structure is the following: 
4.3-Microprocessor interface
It is an asynchronous interface capable of communicating with a MOTOROLA 68000 family microprocessor which is used to program setup information , to extract and insert ATM monitoring cells into the main data flow and to read error flags. It is implemented with 2 basic blocks:
An asynchronous control circuit in which a data acknowledge output is generated from the asynchronous microprocessor interface signals (reaawrite, chip-select ...) and the data and microprocessor address buses are synchronized to eliminate meta-stability .
A synchronous register block were all programmable and status information is stored.
5.-Design Methodology and Tools
5.1-Top level design
Basically 2 very different design methodologies had to be applied for the design of the CMC: 1. The ECL blocks had to be designed with a cell oriented custom design methodology (Bottom-up). Nevertheless HDL VERILOG models were produced to be able to generate a complete behavioral model of the circuit. A cell library (SGS-THOMSON BICMOS4) was available (schematics, netlists and layouts). The low level design was done completely by hand because of the high frequency requirements (3 1 1 Mhz). SPICE simulations were carried out to validate the logic behavior, the frequency performance and to prove the communication interfaces between CMOS and ECL by including only the circuitry involved in their communications. The design capture tool was UNICAD from SGS-Thomson based on OPUS from CADENCE.
2. The CMOS part was designed following a conventional top-down design methodology [ 71. VERILOG was the HDL in which the behavior was expressed and simulated. Very complex VERILOG test benches were written which were used for the simulations in every design step down to the post-layout phase (UNICAD-OPUS). After validating the behavior, automatic synthesis (SYNOPSYS) has been applied to produce a standard cell netlist (HCMOS4T from SGS Thomson).
5.2-Place and Route
1.
For the ECL blocks all cells were placed and routed by hand. Special care had to be taken for the powering scheme: the width of power and ground lines was adjusted in accordance to the power dissipation, mainly a 25 micron width was employed which permit a 125mW consumption at the given power line. The shape of the blocks had to be as well carefully drawn since they had to be placed on the CMC floorplan at the edges as close as possible to their respective pads. The Layout editor was VIRTUOSO from CADENCE.
2. The CMOS layout was carried out in a standard cell fashion with a semi-automatic tool environment: CELL ENSEMBLE from CADENCE. The amount of FIFOS limited very much the floorplan. The ECL contours with their interface metal terminals (abstract views) were included on the CMOS layout at this stage to obtain a definitive floorplan of the whole IC.
3. The final BICMOS layout is obtained using the latter CMOS layout which is post-processed by add-hoc software tools from SGS-Thomson to basically incorporate the required layers for the BICMOS technology. After DRC checking and error correcting, the ECL blocks abstract views were replaced by their actual layouts together with the ECL pads.
4.2.-Test strategy
A hierarchical Full Scan methodology was applied for the CMOS part: Independent scan chains were designed for the CM, MC and microprocessor interface which are to be tested with separate vectors. The idea behind this, is to be able to recuperate and use IC samples where only the MC or CM blocks function. TEST COMPILER from SYNOPSYS was used to insert the scan flipflops, interconnect the scan chains and generate the vectors. The fault coverage is around the 86% as an average for the whole CMC. To test the memories, a dedicated BIST strategy was conceived: A set of LFSRs and MISRs blocks were designed and validated. Equivalent memories are tested with only one BIST circuitry reducing in such a way the area overhead. Input patterns are fed to the memories via the scan chains.
5.-Results and Conclusions
The CMC layout and main statistics of the realized IC are reported below. The implementation of a BICMOS 0.7 micron inputloutput processor for an ATM switch was outlined on this paper. Through the use of ECL blocks at the high speed I/Os, 3 1 lMhz STM16 ATM cells could be handled. Two very different design styles were carried out: a standard cell approach for the CMOS part and a quasicustom design style for the ECL blocks. Different design kits, but centered on the same framework tools (UNICAD-OPUS) configured the tool set for the design of the chip. The BICMOS layout of the CMOS cells was produced by transforming the CMOS cells to the BICMOS technology with automatic software tools and merging the ECL blocks with the transformed CMOS layout. Full scan was applied to test the dies in combination with customized BIST structures, reducing significantly in such a way the area overhead. The area of the die is almost limited by the number of pins, but great effort had to be put on the layout optimization phase.
