This paper presents a 176x144 (QCIF) HDR image sensor where visual information is simultaneously captured and adaptively compressed by means of an in-pixel tone mapping scheme.
INTRODUCTION
High Dynamic Range (HDR) is required in applications like automotive, surveillance, scientific experiments, etc. In order to solve this issue, High Dynamic Range (HDR) image sensors usually codify illuminations in the scene non-adaptively, using either long bit-words per pixel (e.g. mantissa exponent
2 ) obtained from the combination of images captured at different exposures, 3 or a fixed compressive function (e.g. logarithmic approach 4 ) among many other possibilities. 5 These non-adaptive approaches usually lead, non-exclusively, to either high computational costs for the post-processing of the images (in the long bit-words case) or the loss of details and lack of contrast (e.g. logarithmic sensors) due to the fixed compression. In order to avoid these drawbacks, the proposed system produces an adaptive compression of illuminations by using only 7-bit per pixel.
TONE MAPPING ALGORITHM
Tone Mapping techniques are used to compress the information of HDR scenes in Low Dynamic Range (LDR) output representations while preserving the details contained in the scene. These techniques usually belong to the computer graphics field, where the input data is an image with long bit-word per pixel, due to the high computational effort that is required. Tone Mapping algorithms are usually divided depending on the operator applied to the input HDR representation. The main subdivision of Tone Mapping Operators (TMO) are: Global and Local. Global TMO apply the same function to the whole array, named Tone Mapping Curve (TMC). Local TMO apply a different function depending on the pixel.
In order to achieve simultaneous capture and in-pixel compression, a global tone mapping algorithm with very reduced computational effort has been developed taking advantage of the commonly available focal-plane processing circuitry in integration pixels. Moreover, it is adaptive, therefore the global TMC will change in every frame depending on the probability of illuminations in order to optimize the use of the final LDR representation.
The in-pixel operation of the algorithm is based in capturing digital data when the intersection time between the pixel discharge signal V ph and a global analog reference V re f occurs. This analog reference V re f will be fixed during most of the exposition time and will ramp up as fast as possible at the end, which allows poorly illuminated pixels to intersect. In the intersection event, the current status of two digital global references are stored in-pixel: the 4-bit Time Stamp Code (TSC) and the 7-bit Tone Mapping Code (TMC). TSC data is an auxiliary subsampled image used to calculate a histogram, which serves as a illumination probability indicator. TMC defines the global tone mapping function whose non-linear temporal evolution is calculated from our tone-mapping algorithm 1 , 6 using the probability extracted from the TSC data of the previous frame. The signals involved in the in-pixel operation are shown in Fig. 1 , where the example has been simplified with TSC and TMC with only 2-bits and 3-bits, respectively. This operation performs two kind of compression: the Analog and the Digital compression. The analog compression is derived by the intersection of the discharge pixel signal and the analog reference. Two ratio of analog compression are applied depending on if the pixel signal crosses during the fixed or the ramp-up analog reference. The relation of the crossing time with the current (photocurrent + dark current) that discharges the pixel node V ph in these regimes are expressed in Eq. 1 for fixed reference and Eq. 2 for voltage ramp-up reference:
where C ph is the capacitance of the integration node, I pix is the current that discharges the node, V rst is the reset voltage that is the starting point of discharge, V f ixed is the value of the analog reference during the time that it is constant, m is the slope of the ramp and T f ixed is the time when the fixed voltage ends and starts the ramp. It must be observed that, the limit m → ∞ of T cross ramp is T f ixed . Therefore, the higher the slope the closer is the analog compression of the ramp to a linear behavior as indicated in Eq. 3, which is the equation of the discharge signal of integration pixels. This behavior is typically used in non-HDR image sensors, which perform a linear acquisition of the illumination of the scene, by using single slope ADCs.
Therefore, once the analog reference is defined, for a given exposition time, the analog compression is also determined. Then, the digital compression is applied to the crossing time by means of the TMC digital reference, which will alter the analog compression to expand the digital codes of high populated illuminations and compress low populated ones.
In order to generate TMC, first, the data of TSC is needed. The TSC<3:0> digital reference is generated dividing the exposure time into 16 non-linearly distributed windows, each of them having a different TS value. The duration of the first 15 temporal windows have been selected so that they are compressed towards the higher illumination bands (shorter intersection times), mimicking the natural compression (1/I pix ) of the intersection time expressed in Eq. 1. The last and 16 th temporal window is determined by the ramp duration, which can be as fast as 153.6μs. These temporal windows are named "bins", as the TSC data is used to perform a histogram.
The temporal evolution of TMC<6:0> is created from the histogram of the TS image captured in the previous frame. Due to this fact, we consider the information of TSC as an indicator of probability rather than as an exact evaluation, and so it may fail when the exposure time is too long as compared to the rate of changes in the image. TMC<6:0> varies linearly within each temporal window, spanning over a number of LSBs which is proportional to the weight of this temporal window in the histogram of the TS image. The number LSB spanning during a temporal window is called "levels per bin". Just for illustration purposes, if the histogram shows that half of the pixels crossed V re f during temporal window TSC<3:0>=3, the TMC<6:0> curve will span over 64 codes during this temporal window. It is to say that, the levels per bin, for bin=3, is 64. Finally, since the duration of temporal windows are non-linearly distributed in time, the obtained profile for the TM curve is non-linear in time as well (or piece-wise linear to be more precise). The TMC data will start at the maximum value to decrease non-linearly during the exposition time till the minimum value. This will perform a codification of the higher illuminations with high TMC codes and lower illuminations with low TMC codes, as it is usual in image representation standards.
Once the levels per bin have been obtained, the TMC evolution will be generated decreasing one code any time the data retrieved from a Look Up Table ( N) . These positions are then flipped left-to-right to always obtain a code decrement in the first position of all levels except for position 0, which corresponds to the case that no codes are assigned to a bin, this allows to distinguish the codes between bins. Moreover, the first code decrement order needs to be discarded as it is not necessary.
The general algorithm steps to be executed are:
1. Define the duration of the 16 bins (temporal windows) ⇒ define TSC reference.
2. Perform a first acquisition ⇒ the TMC data (final image) can be discarded, as the role for the first frame is storing the TSC data to perform the first histogram.
3. Accumulate TSC data to obtain the histogram.
Calculate the levels per bin ⇒ dividing the 16 values that compose the histogram by
Number o f pixels with T S C data T otal number o f levels to be assigned = (176·144)/4 128 = 49.5.
5. Floor round results in order to have an integer number of levels per bin, which avoid distributing more than the available 128 TMC codes (levels).
6. The unassigned levels are distributed among the bins according to the higher remainders of the previous division by 49.5.
7. During exposition time, in order to create a piece-wise linear distribution of TMC codes, in every evaluation (128 evaluation · 16 bins = 2048 evaluations) a 1-bit word of the LUT is retrieved which indicates if one TMC code decrement must take place or not. This 1-bit word is retrieved depending on levels per bin and index of the evaluation inside the bin, 1 to 128.
8. Download TSC and TMC data ⇒ TMC data is the final image and TSC will be used in step 3 to continue the process.
More details about the algorithm is provided in reference "High-dynamic range tone-mapping algorithm for focal plane processors". 
PIXELS
Pixels have been arranged in two categories:
• Basic Pixels: including only TMC sampling circuitry.
• Time Stamp (TS) Pixels: including both TMC and TSC sampling circuitry.
The block level schematic of both pixels are shown in Fig. 2 . The sensor, a 3 × 3μm 2 Nwell/Psubs diode * , works in photocurrent integration mode. It uses an auto-zeroing technique to establish the reset voltage through the combined action of a buffer (which in operation isolates the photodiode capacitor from comparator's kickback noise), an analog comparator (where V re f =V rst during reset phase) and a PMOS feedback switch P 1 . Additionally, digital circuitry is included to control read and write operations of the SRAM cells. Signal ROW controls the external write of data row by row (for evaluation and initialization purposes), signal EVAL activates internal write operation and signal READ enables external readouts, which are obviously synchronized with the ROW signal. TS pixels contain 7(TMC)+4(TSC)=11 bits of SRAM, whereas Basic Pixels (BP) do only include 7(TMC) SRAM modules. Pixels are physically arranged as shown in Fig. 3(a) . Notice that each TS pixel takes, conceptually, some area from its 3 BP neighbors which is used to allocate the 4 SRAM modules for TSC storage. Indeed, all pixels have 8 (7TMC+1TSC) SRAM blocks. TSC modules are grouped in the middle of the 2×2 arrangement, as shown in Fig. 3(a) , and controlled by signals produced in the TS pixel only. The layout of a group of 2×2 pixels is shown in Fig. 3(b) . Observe that we have grouped the SRAM modules in the central vertical region, sharing global control, digital power and ground lines. This increases the attainable pitch and reduces the noise from digital switching in the analog blocks. * Aperture in metal structures over the diode is 9.75 × 7.30μm 2 . Due to this, carriers created within this area can also contribute to the photogenerated current by reaching the photodiode through diffusion, increasing the effective fill-factor. 
Auto-zeroing Technique
A crucial issue in the operation of the imager is the usage of an auto-zeroing technique to cancel out most offset contributions from the two amplifiers in the pixel. During reset phase, the voltage V rst is applied to the V re f input in Fig. 2(a,b) , and transmitted to the photodiode's integrating capacitor through the negative feedback loop created by the two amplifiers and the reset switch. If we consider that amplifiers can be efficiently modeled to this purpose by their inputreferred offset voltage V ox and a finite DC gain A x -where x = b for the Buffer and c for the Comparator-, one finds after simple calculations -considering high gains-that the reset value is approximately established to:
During the integration period (exposure), the feedback switch P 1 is OFF and the photocurrent discharges the integration node from this previous reset voltage. If we consider that turning off the feedback switch introduces a feedthrough error ΔV f t in the integration node, we can express the temporal evolution of this node as:
If the effective differential input of the comparator is observed, considering very large gains in buffer and comparator and the approximate expression in Eq. 5, it does not contain any reference to neither the comparator nor the buffer inputreferred offset voltages. Needless to say, this ideal behavior will not occur in practice, where the output will obviously exhibit some dependency on the offset of these two amplifiers. The residue of the autozero operation at the effective differential input voltage of the comparator is found to be given by:
Clearly, most of errors (except the feedthrough, which is the main error contribution at the end) vanish as the amplifiers gain is sufficiently high. This, in practice, is translated into a small residual contribution due to the impossibility of designing very large gain low-power amplifiers (each amplifier consumes 50nA) within such small area. The pixel design has been made under the 3 sigma constraint for all added non-idealities. 
CHIP-LEVEL ADDITIONAL FUNCTIONALITIES

Analog Reference Generation
A dynamic biasing mechanism has been developed in order to transmit the V re f signal to the array. As shown in Fig. 4(b) , V re f must drop very quickly from V rst to its constant value during most of the exposure (V f ixed = V bot ), and, in the last window, move from V bot to V top in 128 steps to perform a kind of single slope AD conversion of pixels not crossing V re f previously. V top is not V rst as it can be lowered in order to reduce dark signal contribution.
Every row is provided with an analog buffer that receives V re f , from an on-chip DAC, and drives all the corresponding nodes in its row. Clearly, there will be slight differences in the final voltage reached by each row due to offset, and other non-idealities. The next step is to switch-off the amplifiers and short-circuit all V re f i nodes - Fig. 4(a) -to the DAC's output. This forces all nodes to reach the same final voltage by redistribution of charges in a shorter time than only using one driver at the output of the DAC (due to RC effects in wires driving the signal to the different points in the array).
Dark Signal Contribution Attenuation
Dark current effects are specially noticeable in dark pixels, that may look very noisy in long-exposure shots. In order to attenuate the visual degradation produced by this undesired contribution, we have experimentally measured average dark signal contribution I DC and standard deviation σ(I DC ) for different exposure times and operating temperatures. These measurements allow us to diminish the visual effect of dark current in pixels crossing V re f during the last temporal window simply by lowering V top as shown in Fig. 5 , where I dark = I DC + 3σ(I DC ). It is worth mentioning that the optimum V top level is automatically generated by the FPGA controlling the chip using exposure time, DC measurements and the input from an on-chip PTAP sensor. 
CHIP ARCHITECTURE
The architecture of the chip is shown in Fig. 6 , with its core array of 148 × 180 pixels (QCIF + 2 dummy rows and columns on each side). Pixels functionality is supported by additional periphery blocks. An 8-bit DAC generates the reset voltage V rst during reset, the fixed voltage V bot during the exposition time and finally the 128 levels ramp signal from V bot to V top during the last temporal window. 148 buffers (one per row) enhance the dynamics of distributing V re f to the array. Digital control signals also employ per-row distributed digital buffers (including clock-tree generation). TSC<3:0> and TMC<6:0> are generated by a Code Generator in gray format. This coding reduces switching at the pixel level to only one SRAM module at a time (instead of 7) for Basic Pixels and 2 (instead of 11) for TS Pixels. Read and write operations from the array are accomplished by a bank of sense amplifiers. Image is retrieved row by row and stored in a read buffer (1 row) which outputs images through a high-speed 36-bit bus (4 TMC codes + 2 TSC codes at a time -eq. to 43MBytes/s). Fig. 7 shows a microscope's capture. It must be remarked that, for flexibility and being the first prototype of the idea, the calculations of levels per bin and the LUT have been implemented in an external FPGA. However, the code has been developed, in Verilog, in a way that can be easily implemented by automatic digital synthesis to conform a whole System-On-Chip (SoC) in a future evolution.
EXPERIMENTAL RESULTS
We present here a comparison of images captured from 3 commercial systems and our chip (see Fig. 8 ). The Sony Cybershot DSC-W80 7 -which includes an enhanced sensitivity CCD sensor (S uper HAD T M CCD), the Iphone4 camera -which allows HDR Mode 8 (since iOS 4.2) by using a combination of 3 pictures, and the Photonfocus MV-D752E-40-U2-12, 9 which employs the Lin-Log technology. Noticeably, despite using only half of the codes (128 vs. 256) for image representation, our approach produces an image which is -visually-competitive with the other approaches. The LinLog sensor shows little more details within lamp areas at the expense of a higher noise. The DSC-W80 produces much lower noise but it shows both over and under exposed areas. Finally, the HDR mode in the Iphone4, shows some similar performance in the darker areas but fails to produce details in the brighter ones. Table 1 summarizes the most important characteristics of the chip. It could be observed comparing the DR (SNR1) of a linear acquisition (obtained using only the intersection times with the ramp) and the DR (SNR1) (obtained with this method) that the DR increment is about 114.2dB. 
CONCLUSIONS AND FUTURE WORK
We have presented an imager that automatically adapts to compress the HDR scene in a 7-bit format using a ToneMapping algorithm with information from the previous frame. Pixels include auto-zeroing and in-pixel SRAM storage which allows for long exposure shots. An automatic dark signal contribution mitigation scheme has been implemented to enhance the visual quality in dark areas. Global analog reference to the pixels is dynamically distributed to allow for low-power, fast, and precise operation.
In order to increase fill factor, resolution and light sensing performance, a possible evolution of the system can be an implemention as part of a 3D integrated system. The idea in shown in Fig. 9 where the dies are connected Through Silicon Vias (TSV). The die on top (Tier 0) could contain the photodiode, which allows to use a die of a Back Side Illuminated technology to improve sensing capabilities and improve fill factor to near 100% also permitting to reduce the pixel size. The die in the middle (Tier 1) could contain the rest of the circuitry of the pixel, which allows to use a technology enhanced for mixed circuitry. The die at the bottom (Tier 2) could contain the circuit to post-process over the final tone mapped image, which allows to use a die with a technology enhanced for digital circuitry that usually offer very high integration density. The levels per bin calculations and the LUT could be contained in the last tier, as it could contain only digital processing, or it can be included laterally in tier 1.
