A variety of emerging applications in medical ultrasound rely on 3D volumetric imaging, calling for dense 2D transducer arrays with thousands of elements. Due to this high channel count, the traditional per-element cable interface used for 1D arrays is no longer viable. To address this issue, recent work has proven the viability of flip-chip bonding [1] or direct transducer integration [2] . This shifts the burden to a CMOS substrate, which must provide dense signal conditioning and processing before the massively parallel image data can be pushed off chip. A common approach for data reduction is to employ subarray beamforming (BF), which applies delay and sum operations within a group of pixels. To implement such functionality within the tight pixel pitch, prior works have implemented the delays using simple S/H circuits [2] or analog filters [3] , and typically suffer from a combination of issues related to limited delay, coarse delay resolution and limited SNR.
A variety of emerging applications in medical ultrasound rely on 3D volumetric imaging, calling for dense 2D transducer arrays with thousands of elements. Due to this high channel count, the traditional per-element cable interface used for 1D arrays is no longer viable. To address this issue, recent work has proven the viability of flip-chip bonding [1] or direct transducer integration [2] . This shifts the burden to a CMOS substrate, which must provide dense signal conditioning and processing before the massively parallel image data can be pushed off chip. A common approach for data reduction is to employ subarray beamforming (BF), which applies delay and sum operations within a group of pixels. To implement such functionality within the tight pixel pitch, prior works have implemented the delays using simple S/H circuits [2] or analog filters [3] , and typically suffer from a combination of issues related to limited delay, coarse delay resolution and limited SNR.
This work leverages the integration density of modern CMOS to demonstrate a pitch-matched digital subarray beamforming receiver (RX) with signal conditioning and ΔΣ modulator (ΔΣM) integrated within a pixel area of 250×250μm 2 (see Fig. 27 .5.1). Our proof-of-concept IC supports a subarray of 4×4 pixels and is flip-chip bonded to a Capacitive Micromachined Ultrasound Transducer (CMUT) chip that is similar to the one used in [1] . Since our application is photoacoustic imaging (receive-only using external laser pulses), we did not integrate a transmitter interface. However, in a large-scale array implementation of our concept, it is conceivable to add this functionality using a subset of the pixels for transmit [2] .
Figure 27.5.2 compares our approach with prior art: analog BF [2] [3] and digital BF using a per-channel Nyquist ADC. The latter approach is popular for 1D arrays, but difficult to integrate within a pitch-constrained 2D array. In addition, the Nyquist ADC must typically oversample to provide sufficient timing resolution, which further exacerbates the integration issue. The work of [4] combines analog and digital Nyquist-rate BF, but the area per element is ~5× larger than our pixel size. To enable area-efficient digital BF, this work uses a ΔΣ approach similar to [5] . The oversampling of the ΔΣM naturally provides sufficient timing resolution for BF, enables low-complexity analog design with small passives, and simplifies the signal routing (1b outputs). In our chip, the 16 bitstreams are routed to a global digital block for decimation filtering (DF1) and beamforming (BF = FIFO + summation), followed by a final decimation filter (DF2) off chip. Within the onchip block, the BF is placed after DF1, which was identified as the preferred option due to the lower FIFO clock speed and the commensurate reduction in power (see Fig. 27.5.3 ). Placing the BF before DF1 (as in [5] ) would lead to a slightly lower gate count (since DF1 is shared), but the savings are insignificant due to the relatively low complexity of the employed cascaded integrator comb (CIC) filter. We expect the advantages of the DF-first option to become more pronounced for larger arrays, where early clock rate reduction is critical. Despite the decimation by DF1, the delay resolution is still 8.33ns, which is sufficient for a 5MHz CMUT center frequency. The implemented FIFOs have a depth of 2 7 , providing the required delay range for our 4×4 subarray (1.06μs).
Figure 27.5.4 shows the analog front-end. The transimpedance amplifier (TIA) provides five gain levels using a programmable R network. The TIA output is taken against a replica to facilitate supply noise cancellation as the succeeding lowpass filter (LPF) performs single-ended to differential conversion. Both the TIA and LPF are designed using 1.5V thick oxide devices (for large DR), while all other circuits use core devices (1V supply). The VGA uses a Padé approximation to provide fine linear-in-dB gain tuning. The 1b ΔΣM (see Fig. 27 .5.5) uses a 3 rd -order architecture with an OSR of 48 to provide 60dB peak SNR in a 10MHz BW. The employed inverter-based SC integrator is similar to [6] . It uses three gain stages to achieve the required gain with minimum L, and it is designed to slew for the most of the clock period. The large swing at the 3 rd stage input during slewing leads to small devices and a compact layout. The input bias of the 3 rd stage is established using diode replicas and stored on C b . In comparison to [6] , this obviates the need for special high V T devices and resistors. As illustrated in Fig.  27 .5.5, the designed ADC is the smallest published among designs with similar BW and SNDR.
Our chip is fabricated in a 28nm UTBB FD-SOI CMOS process. The 16 RX pixels occupy 1mm 2 and consume 358mW, while the synthesized digital block occupies 0.4mm 2 and consumes 173mW. The ΔΣM occupies 1/4 th of the pixel area and consumes 6.65mW. The ΔΣM was measured in isolation (test pixel), showing SNR peak = 59.9dB and SNDR peak = 58.9dB for a 2MHz input. To evaluate the entire RX, a diced 4×4 2D CMUT array is flip-chip bonded onto the 28nm chip. The receiver is tested within a photoacoustic imaging setup, where the acoustic signals are induced by light absorbing wire targets (see Fig. 27.5.6) . The cross-sectional view from the y-z plane shows three parallel wires at different depths, while the view from the x-z plane captures their diagonal placement.
Figure 27.5.7 shows the top view of the chip stack and the RX chip, along with a comparison to the state of the art (focusing on BF performance). Relative to the hybrid analog/digital BF approach of [4] , our work has comparable delay resolution and power dissipation, while achieving 7.4× smaller area and 7dB improvement in single-channel SNR. Our maximum delay range is lower due to the different requirements imposed by our 4×4 array, but it is straightforward to extend it through a longer FIFO. A direct comparison to analog BF ICs [2] [3] is more difficult to make, due to the significantly different performance parameters. If we relax the SNR to 40dB and reduce the delay range to 200ns, we estimate an 8× and 5× power reduction for our ΔΣM and BF, respectively. This would yield a BF power of 2.99mW/channel, which lies between [2] and [3] . In summary, we view the demonstration of in-pixel A/D conversion and efficient ΔΣ BF as the most important aspects of this work. We believe that the presented approach offers a viable path toward larger arrays with pitch-matched electronics, high-fidelity readout and digital subarray BF. DIGEST 
