This paper presents a power-and area-efficient front-end ASIC that is directly integrated with an array of 32 × 32 piezoelectric transducer elements to enable the next-generation miniature ultrasound probes for real-time 3-D transesophageal echocardiography. The 6.1 × 6.1 mm 2 ASIC, implemented in a low-voltage 0.18 μm CMOS process, effectively reduces the number of cables required in the probe's narrow shaft by means of 96 sub-array beamformers, which have a compact element-matched layout and employ mismatch-scrambling to enhance the dynamic range. The ASIC consumes less than 230 mW while receiving and its functionality has been successfully demonstrated in a 3-D imaging experiment.
2
ASIC, implemented in a low-voltage 0.18 μm CMOS process, effectively reduces the number of cables required in the probe's narrow shaft by means of 96 sub-array beamformers, which have a compact element-matched layout and employ mismatch-scrambling to enhance the dynamic range. The ASIC consumes less than 230 mW while receiving and its functionality has been successfully demonstrated in a 3-D imaging experiment.
Challenges & System Approach Transesophageal echocardiography (TEE) utilizes an ultrasound transducer mounted on the tip of a gastroscopic tube to make ultrasonic images of the heart from the esophagus. For real-time 3-D imaging, a 2-D array of 1000+ independent transducer elements is needed, presenting an interconnection challenge due to the limited number of cables that fit in the tube (Fig.1a) . Integrating the transducer array with an ASIC that locally processes the signals is an efficient way to reduce the channel count. Its power consumption is limited by self-heating to about 0.5 mW/channel [1] , while the circuits per element should fit within the λ/2 element pitch (150 μm for our 5-MHz probe) required to minimize grating lobes [2] . Both requirements are beyond the state-of-the-art [3] [4] [5] . Here, we present an ASIC that is optimized in both architecture and circuit implementation to fulfill these stringent constraints.
We use an array of 32 × 32 PZT elements, with separate transmit and receive elements (Fig. 1b) , which directly connect to a matrix of bondpads on the ASIC using an interconnect layer [6] (Fig. 1d ). An 8 × 8 central sub-array is wired out to transmit channels in the external imaging system using metal traces in the ASIC that run underneath 96 un-connected elements to bondpads on the chip's periphery. All other 864 elements connect directly to 96 sub-array receiver circuits, whose outputs are fed to the system's receive channels. Despite the missing elements in the receiver aperture, the point spread function (PSF) is comparable with a fully-populated receiver, as shown by simulations in [7] . This configuration allows the use of a dense low-voltage IC technology, thus saving power and area. Compared to [4] , which uses the majority of elements to transmit and a sparse array to receive, it achieves better receiving sensitivity and lower side-lobes. Moreover, it also helps to reduce the overall in-probe heat dissipation, as transmit circuits normally consumes more power [5] .
Each of the 96 sub-array receivers interfaces with a 3 × 3 transducer sub-array of 450 µm × 450 µm, and delays and sums the associated received signals, thus realizing a 9-fold channel reduction (Fig.1c) , similar to the approach used in [6] for a much smaller array with a larger pitch. The delays are programmable in steps of 30 ns up to 210 ns, allowing the sub-array's directivity to be steered across a range of ± 37°. Fig. 2 shows the schematic of a 3 × 3 sub-array receiver, which includes 9 LNAs, 9 analog delay lines, a time-gain-compensator (TGC) and a cable driver. The LNA is a revised version of the design described in [8] , which utilizes a compact inverter-based OTA to achieve a high power efficiency. The OTA's bias point is set during the transmit phase when the LNA is not needed. A switchable capacitive feedback network to provide 3 gain levels for dynamic range enhancement. Albeit single-ended, its capability of supply and ground noise rejection is enhanced by sharing a positive and a negative regulator with other 8 LNAs in the same sub-array.
Circuit Implementation
The LNA output is AC-coupled to a flipped source follower that drives the analog delay line. This consists of pipeline-operated S/H memory cells running at a sampling rate of 33 MHz. The outputs of all 9 delay lines are joint together to form charge summation. A delay stage index rotator determines the sequence in which the memory cells are used. It consists of an 8-stage shift register (D1-D8) in which the 4-bit binary indices of memory cells (1) (2) (3) (4) (5) (6) (7) (8) are rotated. Upon startup, register D n is preset to n. D 1 stores the index of the memory cell used for sampling the input signals, while D 2 -D 8 store the indices of candidate memory cells for readout. A 3-bit selection code, provided by a built-in SPI interface, decides which of these candidates is used, allowing the delay depth of the individual delay line to be programmed. One-hot codes expanded from the selected 4-bit indices are re-timed by non-overlapped clocks to control the S/H switches in the memory cells. The SPI interfaces in all sub-arrays are normally loaded in parallel, but can also be configured as a daisy-chain to load different delay-patterns to individual sub-arrays, which enables near-field focusing. The S/H memory cells suffer from charge injection and clock feed-through errors, the mismatch of which introduces a ripple pattern with a period of 8 delay steps (30 ns) at the output of the delay lines. This limits the dynamic range of the signal chain and manifests itself as tones in the output spectrum (Fig. 3b) . To mitigate this interference, we propose a mismatch-scrambling technique (Fig. 3a) by adding an extra memory cell and a redundant index register D9. A pseudo-random number generator (PRNG) generates a bit sequence (PRBS) that decides whether the index of D 8 or D 9 shifts into D 1 , while the other index shifts into D 9 . Thus, delay cells are randomly taken out and inserted back into the sequence. This randomizes the ripple pattern and converts the interfering tones into broadband noise (Fig. 3c) .
The TGC amplifies the summed signal at the joint delay-line outputs with programmable gain steps that interpolate between the gains steps of the LNA. Finally, a class-AB super source follower drives a cable capacitance up to 300 pF. Fig. 4 shows the photographs of the ASIC and the fabricated prototype with integrated PZT matrix transducer.
Measurement Results
The measured electrical performance of the ASIC is summarized in Table I. Table II gives a system-level comparison with prior works on ASICs for 3-D ultrasound imaging. This work achieves the best power-efficiency in receiving and the highest integration density, with an element-match layout with a <λ/2 element pitch. To demonstrate the 3-D imaging capability of the prototype, a pattern of seven needles was placed at a distance of approximately 16 mm in front of the transducer array (Fig. 5a,  b ; the dotted circle depicts a needle which was slightly behind the other needles). A spherical wave was transmitted and a 3-D volume image was re-constructed. This volume dataset was rendered to a frontal view (Fig. 5c) , clearly showing the needle points. The measured input-referred noise with the mismatch-scrambling function enabled varies with different delay patterns because of a systematic mismatch in the layout of S/H delay lines, which could be optimized by a better layout. 
