A 12-bit high-speed column-parallel two-step single-slope (SS) analog-to-digital converter (ADC) for CMOS image sensors is proposed. The proposed ADC employs a single ramp voltage and multiple reference voltages, and the conversion is divided into coarse phase and fine phase to improve the conversion rate. An error calibration scheme is proposed to correct errors caused by offsets among the reference voltages. The digital-to-analog converter (DAC) used for the ramp generator is based on the split-capacitor array with an attenuation capacitor. Analysis of the DAC's linearity performance versus capacitor mismatch and parasitic capacitance is presented. A prototype 1024 × 32 Time Delay Integration (TDI) CMOS image sensor with the proposed ADC architecture has been fabricated in a standard 0.18 μm CMOS process. The proposed ADC has average power consumption of 128 μW and a conventional rate 6 times higher than the conventional SS ADC. A high-quality image, captured at the line rate of 15.5 k lines/s, shows that the proposed ADC is suitable for high-speed CMOS image sensors.
Introduction
Due to their benefits of low power, low cost and flexible system integration with on-chip circuits, CMOS image sensors (CIS) have been experiencing explosive growth in recent years and have made themselves competitive to charge-coupled devices (CCD), particularly in high-speed videography. There exist three analog-to-digital converter (ADC) architectures utilized in CMOS image sensors: the single-channel ADC, the column-parallel ADC and the pixel-level ADC. The column-parallel ADC is the most widely used architecture because it provides a better tradeoff among readout speed, silicon area and power consumption. Successive approximation register (SAR) ADC [1, 2] , cyclic ADC [3] , and single-slope (SS) ADC [4] are usually employed as the column-parallel ADC. SAR ADCs have been utilized in high-speed image sensors, but they occupy a large silicon area. Cyclic ADCs occupy less area while providing high speed, but the high-speed operation amplifier (op-amp) in each column consumes more power. SS ADCs have been most widely applied in CMOS image sensors because of their simplicity, low power consumption, high linearity, and small area. Moreover, they can ensure uniformity between columns and thus minimize column fixed-pattern noise (FPN). However, SS ADCs have a disadvantage of low conversion speed. Each n-bit conversion requires 2 n clock periods. Although high-speed SS ADCs have been recently reported [5] , they use very high clock frequency which in turn leads to high power consumption.
To solve the problem of low speed, two-step SS ADCs have been recently reported in [6] [7] [8] [9] [10] . In these types, the analog-to-digital (AD) conversion is carried out using two steps: a coarse conversion and a fine conversion. In [6] , multiple ramp voltages are used for the fine conversion to increase the conversion rate. However, this architecture has increased area and power consumption due to the multiple ramp generators. In [7, 9] , it stores the analog coarse voltage on a capacitor inside the column, requiring only one ramp voltage. However, high pixel density will require the ramp generator to drive a huge overall column capacitance, substantially affecting the settling accuracy and speed and consuming high power. This paper proposes a new two-step SS ADC using a single ramp voltage and multiple reference voltages. The proposed ADC has a significantly increased conversion rate compared with the conventional SS ADC, and costs less power and area than the multiple-ramp ADC. The remainder of this paper is organized as follows. Section 2 describes the operation principle of the proposed ADC and error calibration. Section 3 discusses the implementation details of a prototype Time Delay Integration (TDI) CMOS image sensor. Section 4 presents the experimental results, and Section 5 provides the conclusions.
Proposed Two-Step SS ADC Architecture

Operation Principle
The basic concept of the proposed ADC is dividing the n-bit AD conversion into p-bit coarse and q-bit fine conversions where n is the sum of p and q in an ideal case. The block diagram of the proposed two-step SS ADC is shown in Figure 1 . A ramp and reference generator, a counter and a control block are shared by all column circuits. The ramp and reference generator outputs one ramp voltage and K reference voltages to drive all the columns, where K = 2 p . As shown in Figure 2 , these K reference voltages divide the entire input range, Vref = Vrefp − Vrefn, into K sections evenly, each of which has a range of ∆V C = V ref /K. And the ramp voltage spans 1/K of the input range, from Vref,K−1 to Vrefp. For simplicity, it is assumed that Vrefp and Vrefn are connected to Vref and ground, respectively. Therefore, Vref,k and Vramp can be expressed respectively in the ideal case as:
where c is the counter value. Ramp and reference generator 
In addition, each column circuit consists of a comparator, a set of switches, logic gates and memory. The switches S0 ~ SK−1 are used to connect one of the K reference voltages to the input of the comparator.
Compared with a SS ADC, the proposed structure only requires a number of additional switches and some extra digital circuitry in each column.
In Figure 3 , the operation of the proposed ADC is illustrated with a timing diagram. The AD conversion is divided into coarse and fine phases. In the coarse phase, the counter value decreases from K−1 to 0, and the analog switches SK-1 ~ S0 are turned on in sequence, connecting Vref,K−1 ~ Vref,0 to the IN1− terminal of the comparator respectively. At the same time, the ramp voltage Vramp, which is connected to the IN1+ terminal, outputs the maximum voltage Vrefp. The waveforms of IN1+ and IN1− terminals are shown in Figure 3a . As a consequence, the differential voltage between IN1+ and IN1−, VR = VIN1+ − VIN1−, is a coarse ramp signal with the step of ∆V C , as shown in Figure 3b . When VR exceeds the input differential signal Vsig = VIN2+ − VIN2−, the comparator output changes to high level, as shown in Figure 3b ,c. Then the counter value is stored in the column memory as the coarse conversion result C. In the case shown in Figure 3 , C equals 1. 
Coarse Phase
Fine Phase
The fine conversion phase is then performed. The coarse conversion result C is fed back into the analog switches which connects the correct reference voltage to the comparator. Thus SC is closed, and the IN1− terminal is connected to Vref,C, as shown in Figure 3a . Meanwhile, Vramp outputs a ramp signal, which spans ∆V C from Vref,K−1 to Vrefp. As a result, VR becomes a fine ramp signal which will intersect Vsig. When VR exceeds Vsig, the comparator changes its output to low high, and the counter value is stored in the column memory as the fine conversion result F. Ignoring the quantization error, the comparator triggers when:
where ( 1) 
⋅ . Therefore, the final digital output is obtained by:
The conversion time of the proposed ADC is reduced to 2 2 p q + clock cycles for a n-bit AD conversion, while the conventional SS ADC requires 2 n clock cycles. So the speed of the proposed structure increases greatly compared with conventional SS ADCs.
In [6] , the multiple-ramp ADC employs eight ramp voltages, each of which needs a buffer with high power consumption to drive all the columns. And in this proposed ADC structure, only one ramp voltage with such a buffer is utilized. Although each reference voltage also needs a buffer, the buffer requires low unity gain bandwidth and costs low power because what it buffers is a DC voltage. Therefore, the total power cost by the buffers is much less in the proposed structure than in the multiple-ramp ADC.
Error Calibration
In practice, offsets among the reference voltages and the ramp voltage will cause serious performance degradation. As shown in Figure 4 , the solid lines represent the ideal reference voltages, and the shadows represent the probable range of the real reference voltages with offset. Assuming that the offset of Vref,k is Voffset,k and that of the ramp voltage is Voffset,ramp, Equation (3) 
Then, the relation between the digital output Dout and input voltage Vsig can be obtained as follows
As we can see, the offsets would introduce errors into the digital output, and deteriorate the linearity of ADC seriously. Furthermore, the offsets will lead to uncertain shift of VR during the fine phase, resulting in dead bands in the final digital output. 
This problem is corrected by the following auto-calibration algorithm. Another column circuit called the calibration column is added to the original structure. Its differential input is directly connected to Vramp and Vref,0, as illustrated in Figure 5 . A calibration block, in which the auto-calibration algorithm operates, is employed and placed on the readout bus. Besides, in order to avoid dead bands, the range of the ramp is extended, which spans from Vref,K−1 − ΔVex to Vrefp. This extension corresponds to the introduction of one bit redundancy in the fine phase. 
In the sampling phase, the ramp generator will output a test voltage Vtest,m, which corresponds to the middle of each fine conversion subsection, expressed as:
The calibration column will sample the difference between Vtest,m and Vref,0. The conversion process of the calibration column is the same as that of the other columns as mentioned above. In the coarse phase, the coarse result C will be equal to K -m − 1. The calibration column circuit will select Vref,C after the coarse phase. The subsequent fine phase will become a comparison between Vtest,m − Vref,0 and Vramp − Vref,C. When the comparator triggers,
Similar to Equation (6), the digital output of the calibration column can be expressed as:
When a general column and the calibration column have the same coarse result C, with the aid of Equations (6) and (9), the expression is obtained as follows:
The left side of the equal sign is regarded as the corrected digital result. The right side shows that Voffset,C is removed compared to Equation (6) . Although the right side still consists of the term −Voffset,ramp + Voffset,0, the introduced error is constant in the entire range, and only causes offset of the overall ADC curve, which means no harm to the linearity of ADC. Besides, since these two columns have the same C, the final result can be calculated further as:
where F and Fcali are the fine ADC results of a general column and the calibration column respectively. The auto-calibration algorithm is based on Equation (11). The calibration column samples a different Vtest,m per conversion, where m varies from 0 to K−1 in sequence. The results of coarse and fine conversion are transmitted to the calibration block, where the results corresponding to each Vtest,m are averaged and stored in a lookup table. Meanwhile, the results of the general columns are accessed and transmitted to the calibration block one by one. Then according to the coarse result, the calibration block will find the corresponding fine result of the calibration column from the lookup table. Finally, according to Equation (11), the auto-calibration procedure works out the final output.
Although the error caused by offsets of the multiple references is corrected by the proposed scheme, the quantization error of the calibration column still affects the linearity. Taking into account the quantization error, Equation (10) becomes:
where ecol and ecali,k are the quantization error of the general column and the calibration column respectively, which both range from −0.5 LSB to 0.5 LSB in an ideal case. Hence, the quantization error of the final output is given as eout = ecol − ecali,k. The influence of ecali,k can be illustrated in Figure 6 . Figure 6a shows the ideal curve of the quantization error ecol versus input voltage. Because ecali,k has an uncertain value for each k, the part of the curve corresponding to different subsection will have a uncertain and slight shift, as shown in Figure 6b . In the worst case, when ecali,k is 0.5 LSB, eout may reach up to ±1 LSB, and INL will increase by 0.5 LSB. Furthermore, when ecali,k−1 and ecali,k are ±0.5 LSB and ∓0.5 LSB respectively, DNL will get worse by 1 LSB. 
Implementation
Proposed Image Sensor Architecture
A prototype 32-stage TDI image sensor with the proposed ADC architecture has been implemented in a standard 0.18 μm CMOS process. The TDI camera is a special type of line-scan camera, which captures images through an array of pixels operating in line-scan mode [11, 12] . Due to the special integration process, the camera could produce high-quality and low noise images with high scanning speed, even under low illumination conditions. In Figure 7 , a block diagram of the imager is depicted. The prototype consists of a 1024 × 32 pixel array, column-parallel analog accumulators, column-parallel ADCs, a ramp and reference generator, logic controllers, horizontal shift registers and calibration blocks. Each column has an analog accumulator [11] to perform the 32-stage TDI operation. Then the accumulator's output is quantized by the proposed column-parallel ADC. The ADC resolution is 12-bit. Finally, all the conversion results will be shifted out by horizontal shift registers and transmitted to the calibration block.
As mentioned previously, the AD conversion is divided into p-bit coarse and q-bit fine conversions. The choice for p and q needs to be considered seriously. The minimum conversion time occurs when p = q = 6 for 12-bit resolution. However, such a choice for p and q implies that 64 reference voltages are required. Some practical problems make it difficult to implement such a large number of references. First, each reference voltage needs a signal line connected to all columns and a switch in each column. Therefore, too many references would occupy too much area. A second limitation stems from the fact that, offsets between multiple references may result in dead bands in the digital output. This problems can be solved by extending the range of the ramp and creating some overlap. The amount of overlap is fixed and depends on the expected magnitude of offsets. As a result, increasing the number of references will increase the proportion of the overlap to the ramp. Based on these limitations, the 12-bit conversion is divided into 3-bit coarse and 10-bit fine conversions in the prototype, which means that eight references are used and one bit redundancy is introduced to the fine phase for calibration.
Ramp Generator
The performance of the ramp generator determines the accuracy and linearity of the proposed ADC. The digital-to-analog converter (DAC) architecture used for the ramp generator is based on the binary-weighted split-capacitor array with an attenuation capacitor [13, 14] , and is illustrated in Figure 8 . The capacitive-array DAC needs zero quiescent current, and capacitors match better than resistors. By employing the attenuation capacitor, the total capacitance is reduced dramatically. The n-bit capacitive-array DAC is split into p LSBs and q = n − p MSBs by the attenuation capacitor. Thus, the total capacitance is reduced to 2 
where C0 is the unit capacitance. Unlike the conventional split-capacitor array in [13] , the attenuation capacitor is a unit capacitor instead of fractional capacitance, which will match well with other capacitors. Figure 9 shows an alternative to the structure mentioned above [14] . The unary-weighted split-capacitor array adopts the thermometer code instead of the binary code to control the switches of the capacitor array. This structure has the advantage of low differential nonlinearity (DNL) and also guarantees the monotonicity of the ramp. In addition, there is only one-bit change every time except when LSBs code changes from 111…11 to 000…00, so the jitter of the ramp voltage is relatively small. The capacitor mismatch and the parasitic capacitance, which affect the linearity characteristics such as integral nonlinearity (INL) and DNL, are dominant factors for the medium-resolution DAC. Therefore, first a comparative analysis of the linearity due to capacitor mismatch will be presented, and the standard deviation of INL and DNL will be calculated versus the standard deviation of the capacitance variation. Then, the effect of parasitic capacitance on the linearity behavior will be discussed.
For the binary-weighted structure, the output voltage of the DAC is calculated from:
where CL and CM are the total capacitance of the LSB and MSB array respectively: 
where CpL and CpM are the parasitic capacitance in the LSB and MSB array respectively.
When ignoring the capacitor mismatch and the parasitic capacitance, Equation (14) becomes:
In order to analyze the effect of the capacitor mismatch on the linearity performance of the capacitive-array DAC, the parasitic capacitance is ignored, and each capacitor is modeled as the sum of nominal capacitance value and an error term [15, 16] . Therefore, for the binary-weighted structure, each capacitor is obtained from:
where δi is a random variable with a zero mean and a variance of:
where σ0 is the standard deviation of the unit capacitance.
Using the method presented in [15] , the variance of the INL and DNL can be calculated from:
Similarly, for the unary structure, the output can be also obtained from Equations (14) and (16) . The variance of the INL and DNL can be calculated from:
In summary, from Equations (19)- (22), it can be concluded that the variance of INL in the binary structure is the same with that in the unary structure, while the variance of DNL in the unary structure is smaller by a factor of 2 in comparison with the binary one. Therefore, the unary structure has a better linearity performance than the binary one.
In order to analyze the effect of the parasitic capacitance on the linearity characteristics, the capacitor mismatch is ignored. For the binary-weighted structure, INL can be calculated as: CpL can be given as:
where α is the ratio of the parasitic capacitance to a capacitor, which is mainly dependent on the process. Therefore, Equation (25) 
For the unary structure, the effect of the parasitic capacitance on the linearity characteristics is identical to that in the binary-weighted array, indicated by Equations (27) and (28).
According to Equations (14), (25) and (28), it should be noted that CpL degrades the linearity performance whereas CpM only causes a gain error without affecting the linearity. Therefore, the top plate of the attenuation capacitor, which has less parasitic capacitance, is connected to the LSB array to reduce CpL. Besides, from Equations (27) and (28), it can be concluded that the linearity is dependent on p. Figure 10 shows the behavioral simulation results of DNL and INL caused by the parasitic capacitance versus p for a 12-bit DAC when α = 1%. By reducing p, the nonlinearity effect can be alleviated at the cost of larger capacitance. Figure 11 shows the total number of unit capacitors versus p. Thus, the distribution of bits in MSB and LSB arrays should take into account the tradeoff between linearity tolerance and total capacitance. Based on the above discussion, a 12-bit unary-weighted split-capacitor array with an attenuation capacitor is employed in this implement, which is split into 6-bit LSB and 6-bit MSB arrays. Thus, only 127 unit capacitors are required.
The minimum value of the unit capacitor C0 is determined by the matching requirement of capacitors, which can be estimated through Monte Carlo simulation. Each unit capacitor is taken to be independent identically distributed Gaussian random variable with a standard deviation of σ0/C0 which is regarded as the capacitance mismatch. For different σ0/C0, 10,000 Monte Carlo simulations were performed to figure out DNL and INL. Figure 12 shows the INL histogram of 10,000 simulations for some different σ0/C0, and Figure 13 shows the DNL histogram. The design yield is defined as the probability of the DAC complying with INL < 1 LSB or DNL < 0.5 LSB. The design yield as a function of the unit capacitance standard deviation is shown in Figure 14 . It can be seen that with a standard deviation of 0.1%, a 99% Total Capacitance p yield for INL < 1 LSB and 100% yield for DNL < 0.5 LSB can be guaranteed. Therefore, the unit capacitor of 2 pF is selected corresponding to the standard deviation of 0.1%. 
Reference Generator
The reference generator is implemented based on a resistor string, as shown in Figure 15 . A string of eight equal resistors, connected between two reference voltages, acts as voltage dividers to generate eight reference voltages. Then the reference voltages are buffered by folded cascade op-amps to drive all the column circuits. Though the mismatch of resistors and the offset of these buffers give rise to offsets between the reference voltages, the offsets is corrected by the calibration algorithm mentioned above. Therefore, the matching of resistors doesn't need to be considered seriously. Figure 15 . The reference generator based on a resistor string.
Though the reference generator will take more chip area and power consumption compared with SS ADC, the reference generator is shared by all columns. As a result, the average area and power consumption of each column is quite low. Besides, the total power cost by the buffers is much less in the proposed structure than in the multiple-ramp ADC. In [6] , the multiple-ramp ADC employs eight ramp voltages, each of which needs a buffer with high power consumption to drive all the columns. In this structure, the buffer of each reference voltage requires low unity gain bandwidth and costs low power because what it buffers is a DC voltage.
Column Circuits
In Figure 16 , a simplified block diagram of the column-level circuitry is depicted. Each column consists of a comparator, a set of switches, logic gates and memory. Eight reference voltages Vref,0 ~ Vref,7 are connected to the comparator via a 3-to-8 decoder. The output of the comparator is connected to the memory. Compared with a SS ADC, the column circuit of the proposed ADC only requires a number of additional switches and some extra digital circuitry.
In the coarse phase, the Phase Select signal is set to high level, and the multiplexer (MUX) connects the coarse counter to the 3-to-8 decoder. The analog switches are turned on in sequence corresponding to the counter value, generating a decreasing coarse ramp signal at node X. The results of the coarse conversion are stored in the 3-bit column memory. Then, the fine conversion is performed. The Phase Select signal changes to low level, and the coarse result in 3-bit memory is fed back into the analog switches via the MUX and the decoder. As a result, each comparator is connected to the correct reference voltage. Then when the comparator triggers, the results are stored in the 10-bit column memory, where an extra bit is used for calibration. Figure 16 . Simplified block diagram of the column circuitry. The column comparator uses three cascaded low-gain amplifiers as the preamplifier and a latch at the output, applying a dynamic offset cancellation technique [14, 17] . The timing diagram of the comparator is shown in Figure 17 . During Stage 1, S1, S3 and S4 are closed. The comparator is auto-zeroed, and in the meantime the input voltage Vin1+ − Vin1− is sampled on C1. During Stage 2, S2 is closed to perform the comparison. Allowing for offsets of every stage, the transfer function of the three-stage preamplifier can be given as:
where A1, A2 and A3 are gains of the three cascaded amplifiers respectively, and VOS3 is the input-offset voltage of the third amplifier. Thus, the equivalent input-offset voltage of the preamplifier is given by: ( 1)
As a result, the input offset of the comparator is quite small. In general, the three stages of the preamplifier are designed with the same schematic [14] , as shown in Figure 18a . Differential input transistors M2 and M3 are loaded with diode-connected transistors M8 and M9, which can avoid common-mode feedback (CMFB). Two current sources M6 and M7 are used to enhance the gain. Transistors M4 and M5 are used to isolate the input from rapid changes in the latch output and reduce the Miller capacitance at the input. However, for the first stage, the common-mode input range is not large enough. If the input voltage exceeds the range, its gain will be degraded to even less than 1. For this reason, the first stage is designed as a folded amplifier, which has a larger common-mode input range, as depicted in Figure 18b . V in-
M2
Through simulation, the gain of the preamplifier is 58 dB, and the input-offset voltage is only 128 μV which is less than 0.5 LSB. The simulation results of the comparator are listed in Table 1 . 
Experimental Results
The prototype 32-stage TDI CMOS image sensor with the proposed two-step SS ADC architecture has been fabricated in a standard 0.18 μm one-poly four-metal 1.8 V/3.3 V CMOS process. A photograph of the image sensor and the chip partial microphotographs are shown in Figure 19 . The chip size is 18.5 mm × 11.9 mm. The pixel array has 1024 × 32 4T pixels with a size of 15 μm × 15 μm and a fill-factor of 67%. The 4T pixels can achieve a higher signal-to-noise ratio (SNR) and sensitivity with a large pixel size and fill-factor under low illumination condition. The 12-bit column-parallel ADC is divided into 3-bit coarse and 10-bit fine conversion. Each column has a layout pitch of 30 μm. Simulations of the ADC are carried out. The signal to noise and distortion ratio (SNDR) and effective number of bits (ENOB) with the proposed calibration are 72.02 dB and 11.67-bit, respectively. The ADC clock frequency is 20 MHz, and the conversion time is 36 μs. The conversion rate of the proposed ADC is improved by a factor of 6 compared with the conventional SS ADC. The average power consumption of a single column ADC is 128 μW. The power consumption of a ramp's buffer and a reference's buffer is 9.24 mW and 1.72 mW respectively. Therefore, for the multiple-ramp ADC with eight ramps, the total power consumption of all ramp's buffers is 73.92 mW, while it is significantly reduced to 23 mW for the proposed structure.
In order to verify the TDI CIS, a prototype test platform is used, as shown in Figure 20 . It is comprised of a conveyor belt, on which original photos are placed. The conveyor belt runs at a high speed when capturing images. The image data is acquired by using a logic analyzer. Figure 21 shows a sample image at the line rate of 15.5 k lines/s. However, the resolution of normal monitors is always 8-bit, so it's difficult to test the 12-bit resolution of the CIS with a monitor. In order to overcome the problems, measurements were carried out under different illumination conditions.
When measuring under a relatively low illumination condition, all the digital outputs ranged from 0 to 2 The non-linearity of the CIS is measured according to the method in [18] . Figure 24 shows the CIS output versus illumination and measured INL for an output range between 20% and 90% of saturation. The non-linearity comes from the pixel, the analog accumulator and the column-parallel ADC. Figure 25 shows the measured SNR of the sensor versus illumination. In addition, the measured ADC column FPN is 0.15%, which is close to the conventional SS ADC. The performance of the proposed imager is listed in Table 2 , and the comparison with other types of column-parallel ADC is listed in Table 3 . 
Conclusions
A 12-bit high-speed column-parallel two-step SS ADC for CMOS image sensors is proposed. The proposed ADC employs a single ramp voltage and multiple reference voltages, and the conversion is divided into coarse phase and fine phase to improve the conversion rate. An error calibration scheme is proposed to correct error caused by offsets among the reference voltages. A prototype 1024 × 32 TDI CMOS image sensor with the proposed ADC architecture has been fabricated in a standard 0.18 μm CMOS process. The proposed ADC has average power consumption of 128 μW and conversion time of 36 μs. The proposed ADC is much faster than the conventional SS ADC, and costs less power and area compared with the multiple-ramp ADC. Simulation results indicate that the SNDR and ENOB with the proposed calibration are 72.02 dB and 11.67-bit, respectively. The measured FPN is 0.15%. A high-quality image, captured at the line rate of 15.5 k lines/s, shows that the proposed ADC is suitable for high-speed CISs.
