I. INTRODUCTION
P HOTOACOUSTIC (PA) imaging is an emerging medical imaging modality based on optical excitation and acoustic detection. As shown in Fig. 1 , PA imagers employ a short laser pulse to illuminate a tissue sample. In regions with high absorption, the incident energy is converted to heat, leading to localized thermoelastic expansions and pressure waves that can be detected by an ultrasound (US) receiver (e.g., using standard US probes [1] , [2] ) outside the sample. This approach combines the sharp contrast of optical imaging and the low scattering of US to reveal detailed physiological tissue properties. PA imaging is therefore widely used in a variety of clinical research applications, such as the study of cancer progression [3] . This paper focuses on the design of the US readout electronics, with a specific emphasis on the dense integration of the signal conditioning and delay-and-sum (DAS) beamformer (BF) in Fig. 1 . The most significant challenge that we address lies with the DAS operation, which requires small step size and wide delay range. The timing resolution of the required delay lines ( t in Fig. 1 ) is inversely proportional to the carrier frequency [4] , amounting to ∼10 ns for the 5-MHz transducer center frequency used in this paper. On the other hand, the maximum delay is proportional to the array size. For example, a 100-element 1-D array requires a delay of 16 μs. Due to these requirements, it is most common to push the delays into the digital domain by placing an analog-to-digital converter (ADC) before the DAS block. The so-constructed commercial US systems typically employ 10-12 bit ADCs, running at >65 MS/s to provide both sufficient timing and signal resolution [5] , [6] . Due to the ADC area and power overhead, the backend of the readout electronics is often separated from the probe head, and is connected to the transducer array using micro-coaxial cables.
While this solution is acceptable for current 2-D imagers with a 1-D transducer array (shown in Fig. 1 ), it is unsuitable for the next-generation systems that support 3-D volumetric imaging using 2-D transducer arrays with thousands of elements. To address this issue, prior work has already demonstrated the close integration of the transducer array and receive (RX) electronics using flip-chip bonding [7] or direct transducer integration [8] . The key idea in the latter approach is to perform local data reduction via subarray beamforming, which applies the DAS operation to a group of pixels. The final beamforming operation is then pushed off chip using a more manageable number of leads. For example, if the subarray beamforming is applied to a group of 16 pixels, the signal lead count is also reduced by a factor of 16. With the cable issue eliminated, the burden is now pushed onto the subarray beamforming electronics, which must be designed in a pixel pitchmatched style and within a very small per-channel area (250 μm × 250 μm in this paper). To meet these constraints, prior work has implemented the delays using sample and hold (S/H) circuits [8] or analog filters [9] . However, such analog approaches tend to sacrifice performance and typically suffer from a combination of issues related to restricted delay range, coarse delay resolution, and/or limited SNR.
The goal of our work was to demonstrate the pixel pitchmatched integration of an ADC-based US receive-chain with on-chip digital subarray beamforming, specifically leveraging the immense integration density available in modern CMOS. Our proof-of-concept system was designed using an STMicroelectronics's (ST's) 28-nm Fully Depleted Silicon On Insulator (FD-SOI) technology and supports one single subarray of 4×4 pixels (see Fig. 2 ). The CMOS die is flip-chip bonded to a capacitive micromachined US transducer (CMUT) chip (similar to [7] ). 1 Each pixel contains inverter-based signal conditioning stages and an inverter-based delta-sigma modulator ( M), enabling a compact analog design with small passives (due to oversampling). Since the US path in PA imaging is RX only, a transmit interface is not integrated in this paper. However, in a large-scale array implementation, it is conceivable to add this functionality using a subset of the pixels for transmit, as done in [8] .
The remainder of this paper expands the descriptions of our conference contribution [10] and is organized as follows. Section II describes the system architecture and the implementation of the digital subarray BF. Section III provides the circuit details of the pixel-size receiver, including the signal conditioning and the third-order single-bit M. Section IV presents the experimental results, followed by conclusions in Section V.
II. SYSTEM ARCHITECTURE Fig. 3 compares the block diagrams of prior art and our system. Fig. 3(a) represents BF approaches using analog filters [9] or S/H circuits [8] , [11] . Creating a time delay with an analog filter requires the approximation of a linear phase characteristic, which suffers from a strong tradeoff between filter bandwidth and maximum delay. For a bandwidth of 5-10 MHz, the delay of an analog filter is typically limited to a few nanoseconds, necessitating extensive cascading to achieve delays in the microsecond range. The S/H approach provides a longer delay up to ∼1 μs [11] , which is sufficient for a subarray. However, the S/H cells become large with increasing SNR due to kT/C noise requirements, limiting the number of delay cells for a given area. In addition to the accumulation of noise, previous work also reports SNR degradation due to charge injection and clock feedthrough errors in the memory cells [8] . Generally, from the results seen in the present literature, it is clear that making large delays with high SNR using analog blocks (such as filters or S/H stages) is challenging. For this reason, commercial US systems have converged toward the scheme in Fig. 3(b) . Each channel of a 1-D array is digitized using a Nyquist ADC and the DAS operation is realized in the digital domain, yielding superior SNR, delay range, and programmability. However, as mentioned previously, it has been difficult to extend this scheme toward pitch-constrained 2-D arrays. In search of a solution, the work of [12] multiplexes a single ADC between eight channels to amortize the ADC area, but the result is a per-channel footprint that is approximately eight times larger than our pixel. Similarly, the work of [11] combines analog and digital Nyquist-rate BF, but still results in an area of approximately five times our pixel footprint.
To enable area-efficient digital BF, this work leverages approach [13] - [15] to perform DAS operations on the singlebit outputs of oversampling modulators [see Fig. 3(c) ]. The oversampling of the M naturally provides sufficient BF time resolution and further leverages the high sampling rate for noise shaping. This stands in contrast with Nyquist-based systems, where some undesired amount of oversampling is employed just to meet the required time granularity. For example, a timing resolution of 10 ns corresponds to a 100 MS/s Nyquist-based system, suggesting 5× over-design in sampling rate since the required signal bandwidth is merely 10 MHz. The shown three-stage BF is similar to [16] and was optimized for power and area. The first stage consists of a cascaded integrator comb (CIC) filter, followed by DAS and second-and third-stage decimation filters (DFs), which are shared within one subarray. Typically, the order of the CIC filter should be at least one order higher than that of the M; however, the noise transfer function (NTF) of our modulator shows a second-order slope at high frequencies (>100 MHz), justifying the use of a third-order CIC filter in this paper. For the sake of simplicity, only the CIC filter and the DAS operation are implemented on chip, while the remaining (non-critical) operations are performed in software.
A known issue for BF is the raised noise floor with dynamic (time varying) focusing, which causes omission or repetition of bits in the sequence and consequently disturbs the synchronization between the Ms and the decimation process. Dynamic focusing essentially generates frequency-dependent aliases and thereby causes out-of-band noise to leak into the signal band [17] . To avoid such bit distortion, we employ block-based BF [18] , [19] , which represents a sample using a sequence of bit streams that are shifted as a complete block during dynamic focusing. Although the block-based BF approach is more complex, it leads to higher fidelity images, which is crucial for medical applications.
Within the on-chip digital block [dashed box in the center of Fig. 3(c) ], the BF is placed after the CIC filter as shown in Fig. 4(a) , which was identified as the preferred option due to the lower FIFO clock speed and the commensurate reduction in power (see Table I ). Conventional block-based BF share a single CIC filter by performing bit-wise summation on blocks of data [18] , [19] as shown in Fig. 4(b) , which simplifies the adder (fewer bits) and leads to a smaller gate count. However, we found that the savings are insignificant due to the relatively low complexity of the CIC filter. The overall footprint and power is largely dominated by the shift registers (FIFO), which have similar sizes in both implementations, as they merely exchange bit width and throughput. The advantages of the DF first option are expected to become more pronounced for larger arrays, where early clock rate reduction is critical. To ensure sufficient timing resolution, the CIC filter has a decimation factor of 8, providing an output rate of 120 MS/s. The implemented FIFOs have a depth of 2 7 , supporting the maximum timing delay (∼940 ns) across the diagonal of our 4×4 subarray with ∼10% margin. Fig. 5 shows the floor plan of the overall system, in which the 16 pixel-size receivers are aligned in a 4 × 4 grid and abutted with the synthesized global digital block. A global clock of 960 MHz is provided externally and distributed to the digital block and the pixel-size receivers. The non-overlapping clock phases for the contained switched-capacitor (SC) circuits are generated locally to manage delay and clock skew, thereby relaxing the matching requirements on the global clock network. The output bitstream from each channel is routed to the digital block with distributed buffers, which are sized to meet the setup/hold time requirement at the input of the digital block. For this design, no data and clock recovery are needed in the DF and BF block since the delay from the single-bit output of the modulators is small. For large array implementation, D-flip-flops can be inserted into the path to ensure well-defined data propagation. As indicated in Fig. 5 , four buffer cells are required in each pixel for a 4 × 4 array. Thanks to the employed fine-line process, the area and power of the distributed buffers is insignificant compared to the pixel circuitry. Compared to the implementation of analog BF or digital BF using Nyquist ADCs, the number of required distributed buffers is much reduced due to the single-bit M, which not only preserves signal integrity, but also simplifies the task of combining the signals in a global digital block outside the array. More importantly, the pitchmatched implementation of the receiver blocks will allow for a relatively straightforward extension to a large array.
III. PIXEL-SIZE RECEIVER
As shown in Fig. 2 , both the signal conditioning circuits and the M are embedded inside the pixel-size receiver. Their specifications are determined by the signal characteristics. When acoustic waves and light propagate through the tissue, the signal suffers from energy loss due to scattering and absorption, leading to depth-dependent attenuation. For acoustic waves, the attenuation due to absorption is ∼1 dB/cm/MHz for most soft tissues [20] . On the other hand, the optical properties vary among tissues; in general, the attenuation and scattering of light are more severe than those of acoustic waves, limiting the imaging depth to few centimeters in clinical trials [21] . In order to compensate the depthdependent attenuation, the front-end gain is increased with time, commonly known as time-gain control in US system. In this paper, a 30-dB variable gain is designed for an imaging depth of around 2 cm [22] . At 1-cm depth, the laser-induced pressure signal received by the sensor is of the order of a few kilopascals, 2 largely depending on the absorption coefficient of the target and surrounding media. On the other hand, the noise floor of the transducer is around a few pascals [22] ; therefore, the instantaneous dynamic range (DR) (essentially the SNR of the M) is designed for ∼60 dB. Together with 30-dB variable gain, this leads to an overall input DR of 90 dB. Besides the area-demanding M, both the high DR and variable gain range impose challenges for the signal conditioning design within the pixel area. Dedicated circuit techniques are applied to meet these requirements, as discussed in this section. Fig. 6 shows the schematic of the signal conditioning circuit, which includes a preamplifier, low-pass filter (LPF), and variable gain amplifier (VGA). To cover the wide variable gain, the tuning range is distributed among the preamplifier and the VGA based on a coarse and fine gain structure. The preamplifier is a transimpedance amplifier (TIA) that converts the current generated from the CMUT into a voltage using five different gain levels (6-dB steps). The TIA output is taken against a replica circuit to facilitate supply-noise cancellation as the succeeding LPF performs single-ended to differential conversion. While device variability affects the operating point voltage and inter-channel offset at the TIA input, this has a little impact due to the relatively large bias voltage (20 . . . 30 V) across the CMUT and the bandpass nature of the desired signal. In order to perform single-ended to differential conversion, the LPF, implemented as an active RC filter, needs to have good common-mode (CM) rejection, and therefore uses a single-stage fully differential amplifier with resistive load as shown in Fig. 6 . The CM feedback is implemented using a self-biased diode connection for its simplicity [23] . Both the TIA and LPF are designed using 1.5-V-thick oxide devices (for large DR), while all other circuits use core devices with a 1-V supply. The VGA uses a Padé approximation [24] to provide a fine linear-in-dB gain tuning (5-11 dB in 18 steps) to ensure signal continuity during gain transitions. It is implemented using an SC approach and is designed with a slightly extended gain range to compensate for gain errors due to process variations and non-idealities, as for instance the finite ON resistance of the switches and finite loop gain in the TIA. Both the TIA and SC VGA are pseudodifferential and employ inverter-based amplifiers to achieve a compact design.
A. Signal Conditioning
The TIA is optimized for DR and noise figure (NF). Fig. 7 shows several popular TIA architectures: common gate (CG), resistive feedback (RF), and capacitive feedback (CF). A higher TIA gain improves the NF at the expense of reduced input current, causing degradation in the DR due to output swing constraints. Fig. 8 illustrates the tradeoff for these three architectures quantitatively (see [22] for further details), assuming that the CMUT contributes an equivalent noise of a 68-k resistor at the source with a 5-MHz center frequency. The CF TIA outperforms the other two options since the current amplifying stage formed by C 1 and C 2 does not contribute noise and attenuates the noise of R L [25] . However, the area required by C 2 grows significantly and becomes unrealizable under the pixel area constrains. Therefore, the RF TIA was considered as the best choice for this work. Nevertheless, to maintain an input DR of 90 dB, a resistive TIA with fixed gain leads to a poor NF performance (>12 dB, outside the range of Fig. 8 ) due to the reduced voltage swing imposed by our fine-line CMOS process. With variable TIA gain control, the instantaneous DR of the TIA is reduced to 66 dB, avoiding significant NF degradation. The resultant (simulated) NF of the analog front end is 7.8 dB at the highest gain setting (32 k ) of the TIA. The inverterbased amplifier of the TIA has the dominant pole at the input and relies on the compensation effect of the g m load and the feedback zero to achieve stability. Fig. 9 shows the block diagram of the single-bit, discretetime architecture [26] used in this work. The coefficients of the loop filter are well defined by capacitor ratios. Besides, it benefits from the oversampling ratio and the noise shaping, making the architecture less sensitive to process variation. The M features a third-order NTF to achieve 60-dB peak signal-to-noise-and-distortion ratio (SNDR) over a 10-MHz signal bandwidth with an oversampling ratio (OSR) of 48. The sampling rate is 960 MHz. Additional feed-forward paths relax the output swing and slew rate requirements in the first and second integrators [27] . The signal transfer function (STF) and NTF of this architecture are expressed as
B. Delta-Sigma Modulator
(1)
Fig . 10 shows the complete pseudo-differential implementation of the modulator with its clock phases [28] . The circuit uses a conventional discrete-time common-mode feedback [26] , not shown in the figure. To maximize the signal DR, the input and output common-mode voltages are set to mid-rail. The size of the sampling capacitors is determined by the thermal noise requirement, which for this design amounts to 75% of the total noise budget. The sampling capacitance of the first and second integrators can be estimated using a similar approach as presented in [29] C S1 ≈ 4.25 · 10 where
where x 2 = g m2 R on2 . (4) k = 1.38 × 10 −23 J/K is Boltzmann's constant, T is the absolute temperature in Kelvin, and V 2 nT is the total noise budget at the given resolution and full-scale input. R on1 and R on2 are the ON resistance of the switches in the first and the second integrators. The noise contribution of the third integrator is negligible due to the second-order noise shaping of its input signal. For each of the integrators, the amplifier noise is dominated by the first stage inverter, and its transconductance (g m1 , g m2 ) is optimized for both power and noise. Equations (6) and (7) include an additional design parameter P n , which represents the fraction of noise from the first integrator. With P n = 78.8%, (6) and (7) achieve their lowest value, minimizing the total capacitance area required by the modulator. For this design, C S1 = 60 fF and C S2 = 30 fF, which includes some design margin to mitigate the impact of wiring parasitics. As illustrated in Fig. 10 , unit capacitors C U 1 = 30 fF and C U 2 = 10 fF set the coefficients of the modulator. A double-tail latch-type voltage sense amplifier similar to [30] is used as the comparator. It enables a fast response to support the chosen sampling frequency (960 MHz) and is well suited for 1-V operation.
C. Inverter-Based Amplifiers
The amplifier blocks of the SC M and VGA rely on inverter-based topologies, which have gained increasing attention in fine-line CMOS due to their compactness and low-voltage compatibility. A variety of inverter-based amplifiers have been proposed to implement active elements in high-performance, power-efficient ADCs. Chae and Han [26] introduce a single-inverter structure for a discrete-time modulator. The inverter can operate as a class-AB or class-C stage when operated at the boundary between weak and strong inversion. This amplifier provides a power-efficient solution; however, the voltage gain of a single inverter is usually small, preventing the use of minimum-length devices. The amplifier in [31] enhances the gain using a three-stage architecture using single-ended common source stages, but is relatively inefficient due to class-A operation. The ring amplifier [32] represents an interesting alternative for SC circuits. It is created by splitting a ring oscillator into two paths and embedding different offsets in each path to preserve the bias condition of the last stage. This architecture enables a high gain through the cascade of three stages and at the same time reaps the benefits of efficient slew-based charging with inherent rail-to-rail output swing. A modified version of the ring amplifier was introduced in [33] . It reduces the number of inverters in the second stage and eliminates the external biases; however, it employs high V T devices in the last stage to extend the stable offset range and relies on a resistor to define the bias point of the output transistors. In this paper, a different variant of a power-and areaefficient inverter-based amplifier was developed. Fig. 11 shows its half-circuit (the full circuit is pseudo-differential), along with the integrator in which it is utilized. Similar to the aforementioned solutions, it employs three gain stages to achieve large voltage gain with minimum gate length, and it is designed to slew for most of the clock period. The large swing at the third stage input during slewing leads to small devices and a compact layout. As illustrated in Fig. 11 , the input signal is sampled onto C S with respect to the self-bias voltage of the first inverter during φ 1 . At the same time, the input bias of the third stage is established using diode replicas and stored on C B . In comparison with [33] , this obviates the need for special high V T devices and resistors. The currents for the N/P diode replicas originate from the same current reference, providing the same bias current at default. For testing and experimental purpose, they are made independently adjustable; no calibration is performed on individual channel during operation. During φ 2 , the charge is redistributed between C S and C FB to perform integration. The auto-zeroing capacitor 3 C AZ suppresses the amplifier's offset and flicker noise [34] . A similar clock sequence is used within the SC VGA of Fig. 6 .
Near the end of the settling process, the employed triinverter amplifier exhibits the characteristics of a third-order linear system. To ensure that the loop stabilizes after slewing, the settling performance of the amplifier is optimized based on its open-loop damping factor [35] , which for this design is set to about one [22] . To adjust the damping, the triinverter amplifier contains g m loads at the output of the first two inverters (see Fig. 11 ). These compensation devices 3 The auto-zeroing capacitors have the same size as the sampling capacitors, due to area constraints. are ratiometrically defined using scaled versions of the main inverters, and are thus insensitive to process variation. The effectiveness of the added g m is illustrated in Fig. 12 . This plot compares the transients of the last stage's input and output (V G and V O ) with and without compensation and illustrates the fast settling with the g m compensation present. A larger g m load improves stability by pushing the nondominant poles at the outputs of the first and the second stages to a higher frequency while reducing the loop gain and hence loop gainbandwidth product [22] . As a final detail, note that the internal V G node overshoots significantly, even with g m compensation.
In a bulk CMOS process and for very large signals, this could lead to a forward-bias condition for the switch junctions. However, in the employed FD-SOI process, this was not a concern due to the oxide-isolated junctions.
IV. EXPERIMENTAL RESULTS
The 4 × 4 US receiver prototype was fabricated in ST's 28-nm Ultra-Thin Body and Buried oxide FD-SOI process. Fig. 13(a) shows the die micrograph, including the floor plan of a single pixel. Fig. 13(b) depicts the chip stack, in which a diced 4 × 4 2-D CMUT array is flip-chip bonded (same approach as in [7] ) onto the 28-nm chip. Besides the 16 RX pixel array, an additional test pixel is used to separately evaluate the performance of the TIA-LPF and the SC VGA-M cascades. The test structure has the same layout as the functional pixel, but with the signal path between the LPF and Fig. 15(a) . Fig. 15(b) shows the measured output spectrum of the VGA-M test structure (with the entire chip in full operation), achieving SNR peak = 59.9 dB and SNDR peak = 58.9 dB for a 2-MHz input sinusoid, while Fig. 15(c) shows that this performance is maintained up to f in = 10 MHz. Fig. 16(a) shows the gain sweep of a complete pixel, achieving a variable gain range of 29.1 dB with 0.33 dB steps, which are close to the given specifications. A control code sequence is selected from the default gain sweep (gray) to produce the calibrated output curve (black). The default gain sweep is performed for individual channels by measuring the output signal amplitude with a fixed-amplitude input sinusoid under different control code settings. As shown in Fig. 16(b) , the differential nonlinearity (DNL) and integral nonlinearity (INL) after foreground calibration are within 0.46 LSB and 0.65 LSB, respectively. Unfortunately, gain degradation was observed in the measurement of the full signal chain due to a chip fabrication issue, which created a low-impedance load at the output of the LPF, hindering the circuit to operate at the designed bias condition. The measured SNDR peak of a complete pixel is thus degraded to 41.9 dB from the simulated value of 58 dB after post-layout. Nevertheless, using the highest gain setting for each pixel still led to satisfactory imaging results and overall system validation as described in the following.
To evaluate the functionality of the full chip, the 13-bit BF output is measured with different delay code configurations stored in on-chip programmable memory, while a synchronized sinc-like current pulse is injected into the array from a function generator. Fig. 17 shows the two different delay code configurations and the corresponding output results. In the first test, a single pulse is measured at the BF output since all channels receive the same delay code. The measurement of the second test shows five pulses, corresponding to the five different delay codes that were applied (see test 2 delay code map). The fourth and fifth pulses are halved in amplitude since only two (instead of four) elements are summed with these delays. The maximum delay supported in this paper is 1.06 μs as illustrated by the distance between the first and the fifth pulses in the second test.
The receiver was also tested within a PA imaging setup, where the acoustic signals are induced by laser pulses as illustrated in Fig. 18 . The device is mounted on an evaluation board and immersed in an oil tank for acoustic coupling. A laser pulse (λ = 740 nm) is applied from the side of the oil tank, providing an average fluence of 20 mJ/cm 2 with a 10-ns pulsewidth and pulse repetition rate of 10 Hz. A phantom with three embedded metal wires is inserted into the lower part of the oil tank, whose shape is designed to accommodate other components on the evaluation board. The signal processed by the silicon chip assembly is captured by a logic analyzer and averaged 30 times for each image data point to compensate the SNR degradation in the conditioning circuit (caused by a fabrication issue). Both the laser and the logic analyzer are triggered by the same pulse signal for synchronization. Fig. 19(a) shows the measured raw data captured from one pixel, while Fig. 19(b) shows the reconstructed image with dynamic focusing. The cross-sectional view from the yz plane shows three parallel wires at different depths, while the view from the xz plane captures their diagonal placement. The spreading of the image in the xz plane is due to the small subarray size in this design. Table II compares this work to the state of the art. Since the signal conditioning circuit differs from the designed specifications due to the chip fabrication issue, this comparison mainly focuses on the BF performance, which considers only the M and digital blocks. Relative to the hybrid analog/digital BF approach of [11] , this paper has comparable delay resolution and power dissipation, while achieving 7.4 times smaller area and 8-dB improvement in a single-channel SNR. The maximum delay range is lower due to the different requirements imposed by the 4 × 4 array, but it can be extended through a longer FIFO. More recent work using a nonuniform sampling approach [36] demonstrates similar performance as the hybrid design while dissipating 50% less power, showing the advantage of a fully digital BF approach. Compared to [36] , our work consumes more power due to a much higher SNR target for PA applications. A direct comparison to the analog BF ICs [8] , [9] is more difficult to make, due to the significantly different performance parameters. If the SNR and delay range are reduced to 40 dB and 200 ns, respectively, the power of the M and BF is reduced by approximately eight times and five times. This would yield a BF power of 2.99-mW/channel, which lies between the values seen for [8] and [9] . It is worth noting that the power consumption of the S/H BF [8] is an order of magnitude lower than our projection for a reduced-SNR version of our approach, highlighting the power efficiency of analog approach when the SNR requirement is less demanding.
To extend this work to a large array, it will be necessary to work on further power reductions. Based on the first results from digital synthesis, a 20% reduction could be achieved by replacing the low threshold voltage devices in the digital block with regular threshold voltage devices. To improve the power efficiency of the triinverter amplifier, a diode connected transistor could be added to the first inverter stage, lowering the effective power supply voltage [33] . Furthermore, as described in [37] , a power-down mode can be added to the inverter-based amplifier. The small parasitics of the internal nodes allow fast transitions between during ON-OFF power cycling. For a PA system, the imaging speed is often limited by the laser pulse repetition rate, which is around 10 Hz for the high-power nanosecond laser used in our basic laboratory experiment. Low-power nanosecond lasers support a higher repetition rate in the range of few kHz. While the selection of nanosecond lasers depends on application and imaging depth, the signal period of interest (e.g., ∼32 μs for a 5-cm-deep image) is usually about 10× to 1000× smaller than the repetition period, implying a potential for over an order of magnitude power reduction for a duty-cycled system. V. CONCLUSION We presented the first proof-of-concept, pixel pitch-matched subarray BF IC for future 3-D PA imaging systems. Digital beamforming is enabled by employing a BF architecture, which substitutes Nyquist ADCs with Ms and provides both fine delay resolution (<10 ns) and large (∼1 μs) delay range. Dedicated signal conditioning circuits and modulators are optimized for both area and performance. The preamplifier and the VGA realize a coarse/fine gain tuning architecture to accommodate the large-signal DR as well as the wide variable gain required by the application. By using inverters as the main amplifiers and operating them mostly in the slewing regime, the designed SC M achieves the smallest area among published works with similar bandwidth and SNDR. Although the overall signal conditioning circuit fails to meet the designed performance, the demonstration of in-pixel A/D conversion and efficient BF are considered as the most important aspects of this paper. The presented approach demonstrates the potential for larger arrays with pitch-matched electronics, high-fidelity readout, and digital subarray BF in fine-line CMOS technologies. Italy, where he developed data converter interfaces for low-voltage digital input quad-bridge class-D power amplifiers. From 2013 to 2015, he joined the Murmann Mixed-Signal Group, Stanford University, Stanford, CA, USA, as a Post-Doctoral Research Fellow, where he was involved in the design of a portable 3-D medical ultrasound receiver in a 28-nm FD-SOI CMOS technology and the implementation of low noise electronic transducers for nano-electrochemical sensing systems used for label-and probe-free detection of biological molecules. From 2015 to 2016, he was with the Microcontroller Division, Atmel Corporation, San Jose, CA, USA, as a Senior Analog Design Engineer, where he was involved mainly in the design of low power data converters. Since 2016, he has been an ASIC Design Engineer with the TID AIR Integrated Circuits Department, SLAC National Accelerator Laboratory, Menlo Park, CA, USA, where he is currently involved in the design of lownoise current readout ASICs operating at cryogenic temperatures. He has authored and co-authored over 25 She is an elected member of the IEEE SSCS Adcom for the term January 2015 to December 2017, and an active member of the IEEE SSCS Women in Circuits group.
