Special care was taken with regard to data-to-clock phase alignment in the layout design. One of the differential signals in each demultiplexed channel is output, while the other is lcrminalcd with a 5OQ resistor in the IC chip. This is becausc of the restricted numbcr of pads and lack of appropriale probes for on-wafer measuremenls. The chip is 2mm x 3nim in sizc and contains 379 active elements. 
Experimentul results: The DEMUX IC was fithricatcd using reliable, non-self-aligned InP-based HBTs with a carbon-doped base and InGaAsAnP composite collector 151. The HBTa had a unity current gain cutoff frequency (fr) of I1SGHz and a maximum oscillation frequency (f , , , , ) of 154GHz at a collector cnrrent of 3mA and a collector-to-emitter vollage of 1.2V. These values arc almost thc sanic as those for HBTs previously presented by this group [SI, which indicates good reproducibility in non-self-aligned HBTs. High-speed mcasnrcmcnt of the DEMUX IC performance was carried out on-wafer with R F probes. A 40Gbitls pseudorandom bit stream (PKBS) with a length of 223 ~ I was generated by quadniplexing 10Gbit/s, four-channel PRBSs in a 4 2 MUX and an HEMT 2:1 MUX [2], and then input to the DEMUX IC. Three out of the four channels from the DEMUX IC were monitored with a sampling oscilloscopc; the other was input to an error detector. Fig. 2 shows the eye diagrams of the DEMUX IC input and output. The phase difference in the outputs is due to cable length diffcrcnccs. Error-free operation was confirmed with a phase margin of 100". Power dissipation of the DEMUX IC was 2.97W at a supply voltage of 4 . 5 V . Reducing the current in thc 2 4 DEMUX part can decrease thc consumption.
Conrlusiuns:
We havc fabricatcd a 1:4 DEMUX IC using reliablc, non-self-aligned InP-based HBTs with an j;. of I15GHz and an .f,,,,~v of 154GHz. 40GbiUs error-free operation with a phase margin of 100" was obtained. large scale inlegration (LSI) chips in the switch core are asscmblcd into multi-chip modules (MCMs), which are interconnected. Deep sub-microinetre process lechnology has resulted in an increase in lhe switching speed of the core LSI chips, and the use of high-performancc MCM tcchnology has enabled thcse chips lo be assembled inlo MCMs. The throughput of the inlerconnections in these MCMs must be high emu& io prevent them from becoming a bottleneck. Instead of conventional electrical interconnections, parallel optical interconnections were used in a 100Gbitls throughput ATM switch MCM [I, 21. In llie fulurc, skew-compensation techniques such as hit-synchronisation and ~ame-synclironisatioii circuits for the rcccivcd data will become important especially in highly parallcl high-speed oplical intcrconncctions. We propose a frame-synchronisation circuil which has a scalable arcliitccture that can handle highly parallel interconnections witliout any reduclion in lransmission rale.
A.nme-.~ynclironisalion circuit: A frame-synchronisation circuit is normally used to (i) detect the frame-synchronisation pattern in each channcl and iiii count thc offset clock cvclcs for each channel master channel, a large propagation delay for the master channel vrevents thc offset sienals from beine nronerlv ecnerated when the I. . , I
iumbcr of channels 3 large.
We propose a scalable frame-synchronisation circuit that can handle intcrconnections with any nunbcr of channels. Fig. 1 shows a circuit that can be used to count offset clock cycles c [i] for n-channel parallel interconnections. This circuit operates in lwo steps. In the first step, the relative offset clock cycles between a pair of neighbouring channels is determined by comparing the phases of the frame-synchronisation-pallern detected signal P [i-I] and fir]. The number of relative offset clock cycles d[i.-l, 11 is stored in the (k+l) bit upldown counter in each channel. The most significant bit oF the counter indicates a positive or negative bit and the initial values OS all the counters are set to '0' Thc counters are iiicremented by upldown cnable signals synchronised to the transmission clock. The counter value varies from -2k(11 I... I) to 2"-1 @I1 ... I), where a negative value means that thc ith channel has advanced ahead of thc i-lth channel. In this step the circuit operates channel independently, so it can operdtc fast enough to synchronise the transfer rate For any numbcr of channels. 
Vig. 1 Proposed circuitfor counting offret clock cycles
In the second step, the offset clock cycles d[i,1](2 5 i 5 n) are calculated relative to the master channel (the channel number of which is assumed to be I) by thc following foimula:
This formula is executed by adders, as shown in Fig. 1, where c[ 
] controls a selector to output the synchronised data from shift-registers containing the receivcd serial data in each channcl. The circuit operates channel indcpcndently in the first step and docs not need lo operate at high speed in the second step. Therefore, the maximum operating speed of the circuit is limited only by the first step so the number of channels can be increased withont dccrcasing the lransmission ratc. maximum number of clock cycles to be compensated for is seven (2' -1) and the numher of channels was eight. Fig. 2 shows the 622-MbiUs input data of channels 1 (master) and 8 having a skew of three clock cycles. It was compensated for after the rising edge at 25.5ns, as shown in Fig. 3 . Table I shows the circuit operation. We confirmed that the circuit could operate for any numbcr of channels without decreasing the operating speed. Circuil simulafion: To 'estimate the erficiency of thc proposed frame-synchronisation circuil, we performed HSPICE circuit simulation using 0 . 2 5 p CMOS device paranictcrs. The bit-width of the counters and adders was four (k = 3) which means that the
