Abstruct-
I. INTRODUCTION
N the past decade, the evolution of the silicon Bipolar I Junction Transistor (BJT) for digital integrated circuit applications split into two parallel efforts. The first was the continued optimization of bipolar-only technologies for use in very high-speed systems. The second was the merging of bipolar and CMOS into a single BiCMOS technology.
High-performance bipolar-only technologies used sophisticated double-poly self-aligned base-emitter structures with deep trench isolation; see e.g., [1]- [5] . Recent advances in silicon-germanium technology allowed the use of Ge,Sil -, strained layers in the base region of bipolar transistors leading to impressive bipolar-only technologies [6], [7] . However, these latter Ge,Sil -JSi technologies have not found wide applications because the added complexity of Ge,Sil-,/Si did not yield sufficient return at the digital systems level and because bipolar-only technologies were not able to compete with the continued scaling-down of CMOS device size and power.
BiCMOS technologies generated enthusiasm because they promised the density and low power of CMOS. but the speed of bipolar when applied only to critical paths. However, microprocessors continued to use CMOS because their rapid circuit evolution, complexity, and bus-width made the selective use of bipolars difficult. DRAM'S and commodity SRAM's were density-not speed-driven, thus the additional bipolar cost was not justified.
The only place digital BiCMOS technologies found significant application was in Fast Static RAM'S (FSRAM's), where a "good enough" bipolar bought sufficient reductions in memory access time, that the cost of a complex bipolar structure was not justified. Recently, however, the increased speed requirements of FSRAM's motivated us to design a high-performance bipolar device, the Selectively Compensated Collector (SCC) BJT, in a high-density 0.35 pm BiCMOS technology which now meets the performance of many complex bipolar-only technologies. The SCC BJT formation and characterization will be discussed in the next \ections. Like other successful FSRAM BiCMOS technologies. we attempted to minimize the device complexity, while obtaining high performance through a nonconventional well structure and base formation. The low complexity and high performance of the SCC BJT enables the 0.35 pm BiCMOS technology to find applications far outside its initial FSRAM intent, including mixed-mode bipolar RF and high-speed bipolar ECL/CMl I circuits.
THE SCC BJT: OVERVIEW
The bipolar module added to the core CMOS process consists of only 5-6 non-critical implant masks, epi, and an -1 100°C, 20 second RTA. No deep trench, shallow trench, or recessed isolation is added. Neither are selective epi or sophisticated base contact structures introduced. Despite this, the bipolar performance is near state-of-the-art, with only one major disadvantage-the lower packing density of junction isolated bipolar devices, resulting in a N 10 x 10 ,um2 footprint.
Described next are the SCC BJT structure and associated process issues, starting with the well and following sequentially to the base and then the emitter formation.
0018-9383/95$04.00 0 199.5 IEEE Fig. 1 . Cross-sectional drawing of the double-poly SCC BJT formed in a lightly doped p-well. The polysilicon emitter (E) and base (B) electrodes are indicated, as is the collector contact (C). The p-well also provides low junction capacitance for the diffused ECL load resistor.
111. THE SCC BJT: WELL FORMATION Fig. 1 shows a cross-sectional drawing of the double-poly SCC BJT. An n+ buried layer followed by epitaxial silicon growth is required to obtain a low collector resistance and limit the Kirk (base push-out) effect. A p buried layer is required for isolation between adjacent n+ buried layers. As shown, the n+ and p buried layers are set in a p-substrate, and are not self-aligned to allow minimizing the collector to substrate capacitance with a buried layer offset.
For the 0.35 pm generation, the PMOS n well ( -2 x 1017/cm3) and bipolar n well (-5 x IOl6/cm3) doping requirements have diverged sufficiently that they can no longer share a common well. Rather than mask a separate bipolar well implant, the SCC structure combines a 370 keV phosphorus self-aligned collector implant with the masked active base implant, to fully set the collector doping with a single highenergy implant. This technique, when used to augment the collector doping, is often referred to as a pedestal or SIC implant [I] . In our case, the bipolar epi-region (1.6 pm thick, as grown) over the n+ buried layer is lightly p-doped (to -1 x I O l 5 cmP3), so that the collector implant is actually compensating; see Fig. 1 . Since this SCC implant uses the base implant mask, it is self-aligned to the emitter (which is outdiffused from a subsequently deposited second layer of polysilicon) by the pf poly base electrode. The advantage of the lightly doped epi-region includes not only a lower value of CBC, but also a lower value of CC, without the need for deep trench or recessed oxide isolations. Also, as shown, the substrate capacitance of the n diffusion resistors can be kept very low (< 1 fF for a 2.4 pm x 1.6 pm resistor at 3 V bias) by using n-type diffusions in this same p-substrate.
Although the SCC BJT has a reduced value of CBC by localizing the active collector region directly below the emitter, there is a drawback to this structure, as seen in Fig. 2 . Shown is a PISCES-I1 [8] simulation of electron concentration as a function of position through the npn BJT with a 0.60 pm wide selective collector implant. The confinement of electron density (and current) to the region of high collector doping inhibits the 2-D spread of electron current as it approaches the n+ buried layer. This increases the collector resistance and the susceptibility to the Kirk effect. Fig. 3 shows the effect this has on the unity current-gain frequency, f~, at high currents. Shown is the measured f~ of two identically-formed bipolar transistors with differing collector widths, reported in an earlier study [9] . Although the peak values of f~ agree within experimental error, the high-current behavior of the SCC transistor is inferior. Further two-dimensional device simulations indicated, moreover, that for realistic loads of 0.1 pF/pm, that the voltage response of the the SCC BJT is only -3% slower. Setting the collector doping fully with the SCC implant in a nonself-aligned "CMOS twin-well" process saves one masking step; however, a localized collector does not provide isolation between the base and substrate, which would be obtained with a wider n well or a deep trench. In the SCC BJT, this isolation is obtained by fully encompassing the active area of the bipolar transistor with the n+ deep collector (or reach-through) diffusion, as in 
3.0
Vertical Depth (pm) Fig. 5 . SlMS profile of arsenic concentration through a substrate region to characterize arsenic autodoping. 1.5 p m of intrinsic epi was grown on patterned n+ buried layers, and subsequently annealed.The SUPREM-I11 simulation of this same anneal place a dopant spike at the epi-substrate interface using the measured arsenic sheet density, 2.5 x 1011/cm2. ubility. As a drawback, arsenic's high vapor pressure leads to significant autodoping; see Fig. 5 . The as-ingrown autodoping shows a fairly abrupt peak, which broadens during subsequent thermal processing. The presence of this n-type layer can increase the size of the offset n+ buried layer, significantly increasing CCS; see Fig. 1 .
Complete reduction of the buried autodoping spike is not required, however. First, this dopant spike is compensated by the p dopants in the substrate and epi. Second, even if a slight net electron sheet remains at the epi-substrate interface at zero bias, it is depleted under reverse bias. See Fig. 6 , which shows PISCES I1 simulations of electron and hole concentrations for a 3 pm deep collector offset from a 3 pm deep p well. The p well's great depth results from it merging with the p buried layer. The n-type autodoping sheet density for these simulations is the same as measured in Fig . 7 shows the simulated collector-periphery junction capacitance for the structure of Fig. 6 as a function of reverse bias. Note the rapid drop in capacitance until the electron concentration of the autodoping spike is fully depleted. Once this has occurred, the built in field acts to extend the depletion region further than if there were no autodoping. The periphery capacitance for the case of no autodoping is plotted as a comparison, and the hole concentration shown as an insert in Fig. 7 . Note that though moderate amounts of autodoping increase the zero-bias value of CCS, the value of CCS under active bias is actually decreased.
One other concern is the effect that autodoping has on the linearity of the n diffusion resistor targeted at 1600 S2/0, since the n-type autodoping can act as a shunt path, see Fig. 1 . The measured variation in resistance is actually under 2%, and the mechanism responsible is not a parallel shunt path, but rather an increasing resistance with bias due to depletion, even by the lightly doped p-well. Fig. 8 shows that the mismatch between discrete and merged resistors is also under 2%. Finally, these diffusion resistors show excellent temperature stability, with their values shifting <O.l%/"C. 'IMs profile Of boron tB) and arsenic (As) through the extrinsic
VI. BASE FORMATION
As shown by the insert of Fig. 10 , p+ polysilicon strapped with tungsten polycide and a nitride dielectric cap is used as a self-aligned connection to the active base of the bipolar transistor. During the overetch of this electrode. the exposed active silicon region of the bipolar transistor i 4 recessed or "trenched." The link-up to the active base is obtained by the extrinsic base region, formed by outdiffusion from the polysilicon base electrode. The SIMS plot shows a vertical depth of the extrinsic base region of -0.18 pm. As in [12] , we found that WSi is not a good diffusion source for boron, and implanted the boron into the polysilicon prior to WSi polycide deposition.
It is important to place the peak of this masked implant near the poly-to-silicon interface, thus ensuring formation of the extrinsic base region despite segregation of boron to the WSi boron diffusion from the Poly into the w s i . Integration with the CMOS flow introduces two impediments which further justify this <'brute-force" extrinsic base First is the existence of a thin chemical oxide layer between the polysilicon and the silicon substrate, resulting from a defect-V. SCC BJT-SRAM BIT CELL SYNERGISM The following sections focus on the formation of the selfaligned emitter-base contacts. Fig. 9 contrasts a cross-sectional SEM of the 0.35 pm BiCMOS SRAM bit cell with that of the SCC BJT. The 0.35 pm BiCMOS technology was constrained by a dense bit cell design, requiring four layers of polysilicon. The first layer of polysilicon forms the MOS transistors, the second forms a self-aligned landing pad for the bit line contact and supplies VSS to the memory array. The third and fourth layers of polysilicon form the p-channel TET, stacked vertically above the substrate.
To prevent excessive process complexity, the addition of the NPN BJT relied on the first two existing polysilicon layers, both of which were strapped with polycide to reduce their sheet and metal-contact resistance. Poly-1 strapped with WSi, which forms the MOSFET gate, also forms the extrinsic base using the existing buried contact of the bit cell, and poly-:! strapped with TiSi forms the emitter using the self-aligned bit line poly-2 contact; see contacts (B) and (C) in Fig. 1 . Further details of the 0.35 pm BiCMOS technology are given in [ 1 11.
;educing clean which leaves the surfaces hydrophilic. The oxide layer increases the contact resistance and retards the boron outdiffusion required to form the extrinsic base region. Second, the primary thermal cycle used to drive out boron from the base electrode is an RTA of -1 100°C for 20 seconds used to activate the emitter. The shallow junction requirements of the 0.35 pm CMOS devices do not allow a longer or hotter emitter anneal.
VII. BASE RESISTANCE SENSITIVITY
The electrical data presented here is for a 1200 8, nitride, 1000 A WSi, and 550 A p+ poly base electrode stack, with a pf poly implant dose of 5 x 10l5lcm2. The table insert shown in Fig. 11 shows the strong dependence the extrinsic base junction depth has on the p+ poly implant range, where BF2 was used to obtain very short ranges. Reduction of the boron implant range into the polysilicon not only greatly reduces the extrinsic base formation, but also drives the poly-tosilicon contact resistance to unacceptably large values. These Table insert shows extrinsic base depth below the ply-to-silicon interfaceand corresponding contact resistance of this interface as a function ofimplant energy for a dose of 5 x 1015/cm2. Plot shows intrinsic base resistance versus active area silicon trenching. Fig. 13 . Characteristic Gumme1 plot for a BJT with insufficient dopant diffusion from the implanted poly-2 surface to the substrate/poly-2 interface, As indicated on the SEM insert, this insufficient arsenic diffusion can be detected with an HF-nitric junction stain. contact resistances were extracted by forward biasing the basecollector junction of near-identical bipolar transistors, which had varying poly-substrate overlaps. The contact resistance was extracted as the slope of resistance at 1 .O V versus inverse contact area.
Although 20 keV B1l results in a consistently good base contact, the large extrinsic base depth then requires significant silicon trenching during the base (gate) poly-1 overetch for two reasons. First, this trenching establishes a vertical offset between the heavily doped emitter and extrinsic base regions, increasing the base-emitter breakdown voltage BVEBO above 5.0 V, and leading to ideal base-current behavior. This is because the MOSFET spacer width scaled to N 1000 A for this generation technology, which does not form a sufficiently large lateral offset between these heavily doped regions. Second, the 20 keV base poly implant tails down into the active region of the bipolar device which the trenching subsequently removes. For our process, the minimum trenching required is 0.18 pm to ensure that the base poly implant does not augment the base Gummel number set by the light, intrinsic- To ensure consistent bipolar device characteristics, a silicon trench depth of 0.24 pm was used, resulting in a calculated zero-bias RBX as low as 500 f2 pm ratioed to the length of the poly cut. Thus, for a minimum-sized 0.4 pm x 1.1 pm BJT with a poly-cut of 0.7 pm x 1.4 pm, a zero-bias R B .~ of 500/(0.7 + 1.4) = 240f2 is obtained. Fig. 12 shows the SIMS profile of dopant concentration as a function of depth for the intrinsic region of a SCC BJT. The arsenic n+ buried layer and the implanted phosphorus (SCC) collector were discussed previously. The active base is implanted before spacer formation, to ensure a good link-up with the extrinsic base regions outdiffused from poly-I. As a drawback to this approach, the active base is exposed to more thermal annealing, and therefore broadens. The emitter is outdiffused from poly-2, which is previously implanted with arsenic. Similar to the extrinsic base formation, insufficient heat to drive the dopants to the substrate/poly-2 interface severly degrades the bipolar characteristics, leading to poor base current ideality -2 and therefore low current gain and abnormally large and scattered values of BVEBO.
VIII. EMITTER PLUG EFFECT
Interestingly, the emitter resistance as determined by the open collector method did not show any degradation.
As in [14], we found the most reliable indication that the arsenic was not diffusing down the poly-2 "emitter plug"
was an increase of p with increasing emitter width, since widening the emitter reduces the severity of the plug's aspect ratio, reducing its total depth. In extreme cases, the resulting Gummel plots show severe degradation; see Fig. 13 . This device had a tall poly-1 stack height of 4950 A, planarized emitter poly, and an -1050°C, 20 second RTA. The SEM insert shows that in this extreme case a short HF-nitric junction stain can detect the insufficient arsenic diffusion. Even though the accuracy of junction staining severely degrades in the presence of metalized (or polycided) wafers, the absence of arsenic dopants at the bottom of the emitter plug slows the etch rate sufficiently to highlight the emitter plug effect; see illustration arrow. In comparison, the SEM in Fig. 9 , for a good bipolar device (2750 poly-1 stack height, -llOO°C RTA, and unplanarized poly-2) shows no such effect.
IX. SCC BJT CHARACTERIZATION
The emitter plug effects were overcome despite the thermal constraints of the core CMOS process by reducing the poly-2 plug depth both by using unplanarized poly-2 and by reducing the poly-1 stack height. Fig. 14 shows the resulting Gummel plot for the minimum-sized 0.4 pm x 1.1 pm device. Both a discrete device and a 2070 parallel device defect array are shown. The discrete device shows the excellent drive current at an applied bias of 1.2 V: 2.45 mA, corresponding to a current density of 5.6 mA/pm2 with /?I > 33. In addition, the total substrate junction capacitance, C,, of a discrete 2.4 pm by 1.6 pm2 ECL resistor is shown. The tor-substrate (CCY) and base-collector (CRC.) junctions of a 0.4 /tm by 1.1 p m SCC BJT. In addition, the total substrate junction capacitance, C R , of a discrete 2.4 p m by 1.6 pm2 ECL resistor is shown. ECL operating points are also indicated, since the zero-bias value of Cc, can be misleading in comparing trench isolated devices, where Ccs does not decrease much with reverse bias, to junction isolated devices, where CCS does decrease with reverse bias. Note that this decrease is not as dramatic as shown in Fig. 7 , due to the diffusion of the collector and p well regions to form nonabrupt junctions, and the parallel capacitance contribution of the area component. A summary of key parameters is given in Table I .
X. DEVICE SCALABILITY
The alignment tolerances of the SCC BJT structure are relatively relaxed, since this transistor is used in the periphery only. To evaluate the scalability of the transistor, the SCC BJT was shrunk to match the alignment tolerances of the bit cell.
This has the drawback of increasing a number of the parasitic series resistances, but has the advantage of reducing the parasitic junction capacitances. The table insert of Fig. 16 shows this capacitance improvement, with a -30% reduction for the three primary components. These parasitics are comparable to other high-performance bipolar-only technologies.
XI. ECL CIRCUIT PERFORMANCE Fig. 16 also shows the measured stage delay versus switching current for a conventional single-ended ECL gate with AV = 0.5 V. The scaled SCC BJT has a 23% lower powerdelay product at low currents, owing to its lower parasitic capacitances, but this advantage diminishes at higher current levels. The standard device obtains sub-50 ps delays at 225 pA switching current. Although the standard device was also laid out with an emitter area of 0.6 pm x 1.1 pm, the ECL performance was virtually identical to the 0.4 pm x 1.1 pm device. This is because the parasitic capacitances of such small devices are perimeter dominated.
XII. MODIFIED CML LOGIC GATE
Since even aggressive scaling of the SCC BJT showed only a 23% improvement in power-delay product, substantial improvements in this figure of merit are possible only through a substantial structural change, or a modification of the logic gate. Fig. 17(a) shows one such gate, the CurrentMode Logic (CML) gate with a conventional bipolar current source. The current of this gate is lower than that of Fig. 16 because a separate emitter follower stage is not required, and the switching voltage (and therefore current) can be reliably reduced from 500 mV to 200 mV because the output is differential. Furthermore, since this gate does not stack bipolar diode drops, the minimum operating voltage is lower, allowing additional power reduction. The operating voltage can be further reduced by using a readily available n-MOS current source; see Fig. 17(b) . As shown by the accompanying simulations, the n-MOS current source operates down to voltages 0.2 to 0.3 V lower than the conventional bipolar source. It is important to note that this is a nearly "fair" 
XIII. MEASURED CML PERFORMANCE
The CML performance reported here is for the scaled BJT. Even for this scaled device the layout area is larger than that for a MOSFET, so that the modified CML gate with the smaller MOSFET and no resistor R3 represents a significant layout area saving. As seen in Fig. 18 , this modification extends by up to 0.3 V the minimum operating voltage of the gate, allowing operation down to 1.0 V. Because standard CML logic usually involves bipolar stacking, this additional "voltage headroom" is already of importance at a 3.3 V supply.
At a supply voltage of 1.1 V, and at 40 pA switching current, the minimum power-delay product of the CML gate is a silicon-substrate bipolar record 4.5 fJ. This extremely low value is an order of magnitude lower than the ECL value we last reported [15] , and compares to a CML value of 8.6 fJ at a supply of 1.8 V for a 0.2 pm x 1.6 pm BJT in an SOI-based, deep-trench bipolar-only technology [ 161.
XIV. RF RESULTS
A common benchmark circuit for RF applications is the dual modulus t 4/5 prescaler. For this purpose, the 0.4 pm x 1.1 pm SCC BJT was used, and a conventional bipolar design was implemented with a switching current of 200 p A and a differential signal swing of 600 mV. At the output of the prescaler, a chain of eight divide-by-two stages divide the output signal frequency by 256 to simplify the characterization. In addition, input and output buffers provided signal conditioning. At a supply voltage of 3.3 V and at room temperature, a maximum input frequency of 3.3 GHz was achieved. The input signal sensitivity of the prescaler is shown in Fig. 19 ; an input signal of -15 dBm results in a frequency operating range from 170 MHz to 3 GHz. When the supply voltage is reduced to 2.7 V, the performance of the prescaler, although acceptable, is degraded; at 2.5 GHz, the minimum input signal level increases from -33 to -25 dBm.
The RF device of Fig. 20 is substantially larger than minimum and has nine 19.2 pm-long interdigitated emitter fingers, resulting in a calculated RB M 2.80. Under the active bias of VC, = 1.0 V and 0.5 mA, the small-signal equivalent circuit extracted using s-parameter measurements resulted in RB = 3.80. The total base resistance, RB, the minimum noise figure, F,i,, and associated gain, G,,,, were measured including packaging. As a comparison, data for the bipolar-only process MOSAIC 5 [3] is also presented, showing equivalent performance. All measurements presented here were taken at V& = 1.0 V and a frequency of 900 MHz. Fig. 21 shows the de-embedded minimum noise figure, FK,in, and associated gain, G,,, at F = 900 MHz and VCE = 1.0 V for 4 different bipolar topologies with nearly the same drawn emitter area -60 pm2. The W = 0.4 pm device of the previous figure is labelled with its emitter length of L = 19.2 pm. By using more but shorter ( L = 15.5 pm) emitter fingers, the noise figure can be further reduced. This results in a 0.54 dB noise figure at 900 MHz and 0.5 mA with an associated gain of 14.7 dB. Since the base poly is strapped by WSi, the parallel metal-1 strapping of the base which is usually employed to drive down RB for low-noise RF performance is not required when short (6.6 pm) emitter fingers are used. This increases the gain by -1.5 dB due to the reduction in base area and therefore CBC (W = 0.4, WSi).
XV. CONCLUSION
We presented the process development and device characterization of the Selectively Compensated Collector (SCC) BJT specifically designed for high-density deep-submicrometer BiCMOS SRAM technologies. This double-poly BJT takes advantage of the self-aligned polysilicon layers of the SRAM bit cell to obtain high performance without adding excessive process complexity. Furthermore, although an NPN device, the SCC BJT is formed in a lightly doped p-well in which the collector is formed with a single 370 keV phosphorus implant to minimize parasitic junction capacitances. The suitability of this bipolar structure outside of its original FSRAM intent is proven with its potential for bipolar logic and mixed-mode 
