Abstract-In this paper, the on-wafer measurement of junction depletion capacitance is examined. This work provides an in-depth discussion of possible probing configurations which can be used. It outlines a method to consistently measure the junction capacitances accurately. The results from this method compare favorably with those extracted using S-parameter measurements. Additionally, methods are formulated to reduce the number of data points required for parameter extraction while at the same time maintaining a high model accuracy.
The recent increase in communication services and higher bandwidth applications has spawned a multitude of high speed data communication systems. Bipolar current mode logic (CML) is commonly used in these applications. It has been shown that the minimum delay of a CML gate can be approximated as [2] : (2) Also the collector-substrate capacitance is important in the modeling of mixed-signal circuits [3] and as the operating frequency of integrated circuits (ICs) is increased and higher resistivity substrates are being used, more complex substrate models are required [4] , [5] . Thus, an accurate value for must be obtained.
Hence, it is essential that the capacitance-voltage (C-V) characteristics of bipolar devices be measured and interpreted correctly. While other methods have been proposed, small-signal measurements using an LCR meter are still the most common. S-parameters is another technique being proposed while a dc method has also been proposed [6] . The latter method requires a double-base test structure for the measurement, an additional test structure to de-embed parasitic capacitances and the method has only been demonstrated for the measurement of the baseemitter depletion capacitance.
Although there are many excellent texts on the setup of capacitive measurement test systems [7] , [8, Ch. 12] , very little has been published in the area of depletion capacitance measurements for bipolar devices. Fig. 1 shows two possible probing configurations which are commonly used to measure the junction capacitance (in this case, the base-collector). For the twoprobe configuration the "Hi" and the "Lo" terminals of the C-V meter are connected to the collector and base, respectively. In the case of the four-probe configuration the Hi and the Lo connections are the same but now the emitter and the substrate are connected to the ground of the C-V meter. Included in this figure is a test structure called a "dummy" which is used to remove on-wafer parasitics. A set of equivalent circuits representing each probing configuration is also shown. Fig. 2(a) shows the 0894-6507/03$17.00 © 2003 IEEE measured capacitance (the parasitics have been removed) for a transistor 1 using the two configurations. Fig. 2(b) shows the measured capacitance for a dummy structure using the two configurations. It is clear that there is a considerable difference between the results. The two probe measurement is commonly used in industry.
This work shows that the two probe configuration produces inaccurate results. In addition, it shows how the four-probe measurement is more accurate and also does not require dummy structures for the measurement of the base-emitter and the basecollector capacitances. The scaling of capacitance with area is also shown to be excellent with this method. The results for this work are verified on devices from both BiCMOS processes and dedicated bipolar processes.
For volume data collection it is important to minimize data set size while maintaining the accuracy of the extracted parameters. This work provides a formal procedure to achieve this goal. It presents a novel technique based on bootstrapping which determines the minimum required data set for parameter extraction. 1 The data is from structures with arrays of 10 devices.
II. MEASUREMENT RESULTS
In this work, an Agilent LCR meter was used [9] . In order to characterize the repeatability of the measurement system, the base-emitter depletion capacitance was measured 1000 times at a single bias voltage. The standard deviation of the measured data was found to be 0.1fF which shows that the LCR meter produces very repeatable results.
A. Calibration and De-Embedding
Calibration is the term given for the correction of the measurement system errors and de-embedding is the term given to the removal of on-wafer parasitics.
In order to account for errors introduced by the measurement system such as cabling, needles, etc., an open calibration is performed. Previously, this open calibration has been performed with the probe needles down on the dummy structure. It was assumed that this would allow both the calibration and the de-embedding to be performed simultaneously. This method has a number of disadvantages.
1) Due to large variations in substrate characteristics, the calibration/de-embedding must be performed at each site [10]. The calibration itself takes over 1 minute so this method is very time consuming. 2) For some processes, there can be silicon depletion under the bondpad, which will cause a bias dependence on the bondpad capacitance as shown in Fig. 3(a) . The open calibration will not account for this, and thus, causes errors in the measurement of the bias dependence of the device depletion capacitance as is shown in Fig. 3 (b). Hence, it is preferable to perform a calibration with the probes in the air and then to measure the dummy structure (including its bias dependence) at each site for de-embedding purposes.
B. DUT Parasitics
In Section II-A, the separation of the calibration and de-embedding procedures was outlined. Fig. 1 showed the different probing configurations that are possible and Fig. 2 showed the different results that are obtained for the base-collector dummy measurement. The bondpad capacitance for the base-emitter dummy was measured using both configurations. Fig. 4 shows a simplified cross-sectional diagram of the dummy structures in ) it is clear that a path to ground has been formed through the substrate for the four-probe configuration and so the measured capacitance is zero. When the DUT is subsequently connected, this will still be true. Thus, the four-probe configuration will still remove the parasitic elements and so for this junction it is not necessary to measure the dummy structures to account for the DUT parasitics using the four probe method. In the case of the two-probe measurement, the measured capacitance is not what is expected (79fF was measured but the expected value was 466fF based on bondpad area and oxide thickness). This is because the capacitance of the wafer chuck is coupled into the measurement. A model for the chuck was constructed and this model fitted the measured data very well. It is clear from the measurement of the bondpad capacitance that the four-probe method is more robust in terms of measurement setup and also required less de-embedding.
C. Device Measurements
Even when dummy structures are used for the two probe method it will still produce erroneous results as was shown in Fig. 2 . The reason is that the bondpad capacitance can still be coupled into the measured capacitance via another junction when two probes are used (This has been found to be the case with the collector-substrate capacitance and in some processes the base-collector capacitance. In all the processes considered to date this has not been the case for the base-emitter capacitance). Fig. 5(a) shows the measurements that are obtained for both configurations for the collector-substrate capacitance. 2 Again, it is clear that there is a considerable difference between the two methods for the measurement of . In order to explain these differences a three-terminal device is considered for simplicity. It has a junction capacitance to be measured, which has associated stray capacitances and , as shown in Fig. 1(d)-(g) [11] . If the stray capacitances are allowed to float (two-probe) as shown in Fig. 1(f) , then the C-V meter will measure the series combination of and in parallel with . If, however, the stray capacitance are connected to ground (four-probe) as shown in Fig. 1(d) , the true capacitance will be measured. Hence, it is expected that the two-probe method will overestimate the capacitance. This is what has occurred in the measurement of shown in Fig. 5(a) . This result has been validated on several processes.
The model is applied to the measurement and is shown in Fig. 1 . It has a junction capacitance to be measured, which has associated stray capacitances (bondpad capacitance) and , as shown in Fig. 1 (e) and (g). This equivalent circuit predicts that the substrate-collector capacitance measured with two probes will overestimate the true value , according to the following expression: (3) where, we use the symbol to denote a series combination, to denote a parallel combination and and are the true values of capacitance. As and are measured independently using the four-probe technique, this relationship can be tested. This is shown in Fig. 5(b) for two devices with different areas. It is clear that an excellent fit is obtained.
To further validate the method, was extracted from S-parameter measurements. The device under test used ground-signal-ground structures and an open structure was used for de-embedding. The network analyzer was calibrated using short open load thru (SOLT) calibration on a Impedance Standard Substrate [10] , [12] . Transforming the data to Y-parameters, can be determined from . Fig. 6 shows the measured over voltage bias. Also included in this figure are the results obtained using the two-probe and the four-probe methods. It is clear that the S-parameter method agrees well with the four-probe method while the two-probe method over-estimates the actual capacitance of the junction. Fig. 7 shows that the extracted zero-bias junction capacitances scale very well with emitter area. For greater accuracy each of the three depletion capacitances can be divided into their area and perimeter components. These components can be extracted by using appropriately scaled test structures. It is recommended that the devices have a large difference in their area to perimeter ratios. Using the techniques which have already been outlined, very accurate scalable models have been obtained. In addition to these components sidewall spacers can make a significant contribution to the junction capacitances [13] . This is modeled in some of the latest bipolar models such as HiCUM [14] and can be extracted by extending the method for the area and perimeter capacitances.
III. SCALING OF THE CAPACITANCES

IV. BOOTSTRAPPING TECHNIQUE
Once the parameter extraction strategy has been formulated, the next step is to determine the minimum size data set which can be used while maintaining the accuracy of the extracted parameters. This work provides a formal procedure to achieve this goal.
The bootstrapping technique is a data-based simulation method for statistical inference [15] . The basic principle is that when a regression fit is performed, the fitted error at each point is recorded. These are then resampled (with replacement) to provide "new" data and parameter values are extracted for each set of new data. In this way a confidence interval for the extracted parameter value can be determined. This method uses iteration to replace what may otherwise be a very complex mathematical problem. For our purposes the bootstrapping technique has to be adapted as the number of iterations in the original technique is dependent on the data set size, e.g, for 3 points, there are only 3 combinations.
The simplest way to explain the modified technique is by outlining how it is applied to a specific parameter, for example, the zero bias base-collector depletion capacitance, . In this example a confidence interval for a data set size of 5 is estimated. First, a large data set ("base line" dataset) is measured (540 points in this example). This is then split into five ranges as shown in Fig. 8 (only a subset of the data points measured are shown for clarity). Random sampling is then employed to create many "new" data sets, each with a size of 5. A constraint is placed on the sampling to ensure that there is a reasonable spacing between sampled points (there must be at least a space of half a range between selected points). Fig. 8 highlights one data set which was selected. The fit which was obtained using the sampled data is shown in this graph.
This procedure was run 500 times and the distribution of extracted values using five data points is shown in Fig. 9  (a normal distribution curve is superimposed) . From the distribution, the standard deviation normalized to the mean was found to be 0.44%. This is quite small and indicates that five data points should be enough points to provide consistent estimates of . This procedure can easily be repeated for sampled data sets of different sizes and the results of this are shown in the boxplot of Fig. 10 .
It is also possible to calculate the mean and standard deviations for these distributions. These are shown in Fig. 11 (the standard deviation has been normalized w.r.t. the extracted mean for each data set size used in the bootstrapping procedure). It can be seen from this figure that while the mean parameter value is constant, the standard deviation of the extracted values decreases dramatically as the data set size is increased. Hence, the bootstrapping technique can be used to determine the data set size required for parameter extraction methods. Assuming , TABLE II  COMPARISON BETWEEN THE STANDARD DEVIATION "MEASURED" BY THE BOOTSTRAP TECHNIQUE AND THAT CALCULATED FROM STATISTICAL THEORY  (ASSUMING THE VARIATION DECREASES WITH THE SQUARE ROOT OF THE NUMBER OF AVERAGES TAKEN) for example, that the devices being characterized come from a process with an expected variability of 10%, then we should ensure that any variability introduced by the parameter extraction method is well below this level, e.g., less that 1%. The bootstrapping technique just outlined can be used to choose a measurement data set size to meet this requirement.
In the SPICE Gummel-Poon model the classical depletion capacitance equations are used. In order to obtain the parameters , and , an optimization must be performed. It is seen when performing this optimization that and are strongly interdependent. Therefore, more robust results should be obtained by setting to an appropriate constant value and then extracting and using linear regression. Fig. 12 shows that this is indeed the case, i.e., or four data points the variability shows an improvement by a factor of 7 compared to the case where is unconstrained. A dataset of three bias points was measured a thousand times and the associated parameters were extracted each time. Table I shows the normalized standard deviation which was obtained using this measured data (for ) and compares them with the results from the bootstrap method. It is clear that the bootstrap method predicts the variation in extracted parameters very well. In addition, a further verification method was used. The LCR meter allows internal averaging to achieve greater accuracy, i.e., if averaging is set to 3, the average capacitance value of three measurements is returned. For further verification of the bootstrap method, three baseline measurements (540 points) were taken with averaging set to 1, 2, and 3. Fig. 13 shows the normalized standard deviation obtained using the bootstrap technique for different averaging conditions on the LCR meter. From statistical theory, the standard deviation should decrease with the square root of the averaging number. For example, the standard deviation for averaging set to 3 should be times smaller than that of the averaging set to 1. Table II shows the results from the bootstrap and that predicted by this theory (there is no "theory" column for the "Averaging 1" case because it is the basic measurement on which the other cases are based). It is clear that the results are very similar (only a 1.3% error between the bootstrap result and the theory for the case of four data points and averaging set to 3), further verifying the validity of this method.
V. CONCLUSION
The on-wafer measurement of junction depletion capacitance has been described in detail and the errors which can occur have been identified and explained. This has led to a greater understanding of measured results. A four-probe method has been adopted which gives consistent results for devices from different processes and shows very good agreement with results from high frequency measurements. In addition, a novel technique which determines the minimum required data set for parameter extraction has been described and verified in detail.
Dermot MacSweeney (M'02) was born in Cork, Ireland. He received the B.E. and M.Eng.Sc. degrees in electrical engineering from University College Cork (UCC), Cork, Ireland.
He is currently working in the New Product Development (NPD) Division of Cypress Semiconductor where he is a senior design engineer. His main interests are in the design of high-speed systems for application in broad-band communication, specifically for SONET and Ethernet applications. He also works on the modeling of SiGe bipolar devices at very high speeds. Previously, he worked in UCC where his main interests were in compact modeling and parameter extraction. His research involved the development of subcircuit models, the analysis of high frequency effects, efficient parameter extraction techniques, and statistical analysis of bipolar devices.
