Abstract-In real-time catheter-based 3-D ultrasound imaging applications, gathering data from the transducer arrays is difficult, as there is a restriction on cable count due to the diameter of the catheter. Although area and power hungry multiplexing circuits integrated at the catheter tip are used in some applications, these are unsuitable for use in small sized catheters for applications, such as intracardiac imaging. Furthermore, the length requirement for catheters and limited power available to on-chip cable drivers leads to limited signal strength at the receiver end. In this paper, an alternative approach using analog time-division multiplexing (TDM) is presented, which addresses the cable restrictions of ultrasound catheters. A novel digital demultiplexing technique is also described, which allows for a reduction in the number of analog signal processing stages required. The TDM and digital demultiplexing schemes are demonstrated for an intracardiac imaging system that would operate in the 4-to 11-MHz range. A TDM integrated circuit (IC) with an 8:1 multiplexer is interfaced with a fast analog-to-digital converter (ADC) through a microcoaxial catheter cable bundle, and processed with a field-programmable gate array register-transfer level simulation. Input signals to the TDM IC are recovered with −40-dB crosstalk between the channels on the same microcoax, showing the feasibility of this system for ultrasound imaging applications.
elements in the 2-D arrays is either impossible or very limiting due to the size of the catheter, especially in the case of small sized catheters for intracardiac echography (ICE) [2] . It is partially because of these limitations that the current 3-D ICE probes do not have a large field of view [3] . Furthermore, current ICE and TEE catheters are used in conjunction with harmful X-ray imaging (fluoroscopy) for navigation [4] .
To realize an ICE catheter that can be used under MRI guidance avoiding ionizing radiation, the cable count needs to be reduced significantly so as to reduce RF-induced heating of the metal conductors [5] . Therefore, the cable reduction techniques with electronics complexity, area, and power requirements suitable for integration at the tip of an ICE catheter would have a significant impact in the catheter-based ultrasound imaging applications, both in terms of implementing large field of view ICE catheters, and by eliminating the need for fluoroscopy. These techniques can then be implemented using monolithic capacitive micromachined ultrasonic transducer on complimentary metal-oxide-semiconductor (CMUT-on-CMOS), or ICs integrated with CMUT or piezoelectric transducer arrays in a multichip package [6] [7] [8] [9] .
In this paper, a time-division multiplexing (TDM) approach with relatively simple electronics on the catheter tip along with a direct digital modulation scheme at the back end for signal recovery is demonstrated. After a review of channel multiplexing methods motivating the need for this approach, the details of the TDM system are presented along with the IC implementation in CMOS. Real-time direct digital demodulation as realized in a field-programmable gate array (FPGA) is described, and finally, the experimental results on the overall system are presented.
II. CHANNEL MULTIPLEXING METHODS
In developing the proposed TDM scheme, several methods for reducing cable count were considered. Using on-chip digital multiplexing techniques would provide a method of reducing cable count while maintaining the signal integrity as a digital output would be less prone to noise than transmitting the observed echo responses directly. However, such a system would require analog-to-digital conversion of the signals individually for every receiver element. While research into lower power and smaller size analog-to-digital converters (ADCs) continues [10] , even with the current state-of-the-art technologies, for example, a 35 mega samples This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ per second (MSPS) SAR Analog to Digital Converter (ADC) consuming 54 mW of power and using 0.239-mm 2 area [11] , having one ADC per element on-chip for a 96 element transducer would still use upward of 22-mm 2 area and consume over 5 W of power. In a size and power constrained system, such as an imaging catheter, this would be simply unfeasible to implement.
Another approach, which has been demonstrated in [12] [13] [14] [15] [16] , involves performing partial beamforming (μ-beamforming) with analog delay chains in the transducer. By performing delay-sum beamforming in the transducer, a fewer cables are required, as the raw signals from every element do not need to be transferred to front end for processing. This approach requires a large number of capacitors and switches for each channel in order to achieve the required analog delays, which makes it unsuitable for systems with the size restrictions. This method would also be incompatible with systems, which make use of the advanced imaging techniques requiring all of the echo raw data.
A third approach for cable reduction is frequency division multiplexing (FDM) [17] . One such method of FDM, using analog modulation (AM), has been shown to be feasible for ICE applications and could be applicable to other ultrasound applications. This design uses AM to multiplex multiple signals on to each cable allowing all of the raw data to be transferred by making better use of the channel bandwidth. However, this approach requires multiple analog filters and mixers to produce the multiplexed output. Compact implementations of these electronic circuits on silicon are sensitive to silicon process variations, which means that using the technique in applications with hundreds of elements would be difficult to ensure uniformity across all channels. Furthermore, FDM would also require complex digital signal processing hardware and software to demodulate and recover the signals prior to image reconstruction, especially if real-time imaging or high element counts are required.
Multiplexing the channels in time is also a possibility for reducing the cable count for the system. The approach allows the multiple channels to share the same cable by assigning each channel a time slot in which to transmit. TDM can be performed using many schemes, the versatility and the simplicity of which have resulted in an extensive use in communications and telephony applications [18] [19] [20] , and also for the applications in ultrasound systems.
An approach with potential for small area, low element count ultrasound applications is to convert the analog data directly into a digital pulsewidth modulation (PWM) signal, encoding the analog voltage as the duty cycle-essentially performing on-chip digitization without the need for an ADC [21] . The downside of this approach is the massive bandwidth requirements for the transmission channel when scaled to larger element counts. The bandwidth is directly proportional to not only the number of elements, and the sampling frequency, but also increases exponentially to the number of bits of analog resolution required. A system with 16 transducer elements each sampled at 20 MSPS with a 10-b resolution would require over 300 GHz of bandwidth to send down a single cable, which is simply not feasible.
Analog TDM requires a relatively small amount of hardware at the transmitter compared with the other schemes discussed here. At the bare minimum, only an analog multiplexer and simple digital counting logic is required to multiplex analog data streams. The sequencing approach of the transducer elements can be varied, allowing for different options in assigning time slots based on the application.
In one approach, the time slots are assigned on a per-firing basis whereby each element in the array is connected to the cable for a full firing. On each subsequent firing, the next element is connected, and so on until data for each element have been collected. This approach has been demonstrated in several systems [22] [23] [24] . While this approach does allow for cable reduction, it requires multiple firings to collect the information, as only one element per cable can be sampled in a given firing. For very high element counts, this will reduce the maximum possible frame rate for imaging, and can lead to more prominent motion artifacts, as the time over which the image is collected is increased.
In cases where data must be collected for all channels simultaneously, perhaps due to high element counts, or if high frame rates are required, the multiplexing scheme can be designed, so that the time slots are made short enough that each channel can be sampled at above the Nyquist rate. In this approach, rather than changing channel on each firing, instead the multiplexer scans through every element in turn using a sampling clock, which is at least n times the Nyquist rate (where n is the number of channels). This approach was briefly presented in [25] , in which 12 channels were multiplexed onto a single cable using this technique. The approach has since been demonstrated in [26] and [27] , in which 8:1 reduction and 4:1 reduction in cable counts were achieved, respectively.
III. SIMULTANEOUS COLLECTION TDM SCHEME
In this paper, the TDM approached for the simultaneous collection of data from all the elements is discussed in detail, building on the system demonstrated in [26] . There are two key areas of focus, the multiplexing electronics that would be located with the transducer, and the direct digital demultiplexing design that makes up the receiver.
A minimal TDM transmit system requires only an analog multiplexer and a buffer for each channel and so requires much less space to implement than the other channel reduction schemes considered. At the receiver end, the signals then need to be demultiplexed, which require additional hardware. If done in the analog domain, this would require the high-frequency multiplexed signals to be passed through a synchronous demultiplexer, filtered, and then passed into an ADC dedicated to each channel. Each of these stages would add noise to the system, would present additional complexity in matching the circuits across all the channels, and would require a large number of ADCs.
As there is now increasingly widespread availability of very high-speed ADCs with an effective number of bits on the order of ten or more [28] , [29] , and also FPGAs capable of performing complex DSP functions on many channels in parallel, much of the receiver hardware complexity can be eliminated by moving all of the demultiplexing requirements from the analog domain into an FPGA and allowing demultiplexing to be done directly in the digital domain.
The clock frequency of the multiplexer will be much higher than the sampling frequency for each channel, and in an eight channel 25 MSPS design as an example, the clock frequency would be 200 MHz. The analog signals are modulated onto pulses, which must rise and stabilize fast enough that the amplitude can be extracted by the ADC correctly. The cables connecting the multiplexer and ADC need sufficient bandwidth to limit the amount of distortion of the high-frequency TDM signals.
In Fig. 1 , an overview of the simplified TDM scheme is shown. The receiver end consists only of an LNA and ADC in the analog path, which is then followed by an FPGA to perform the digital demultiplexing. The transmitter end consists of an analog front end (AFE) followed by a sample and hold (S/H) capacitor for each channel, each of which is then connected to an analog multiplexer. The time-gain compensation (TGC) in the AFE can be controlled via a single cable to digitally cycle through fixed levels, or alternatively controlled using an analog reference signal for finer adjustment.
The multiplexing is controlled by digital circuitry, which generates the sample clocks for each channel, and control signals for the multiplexer. This circuitry is controlled by a clock signal generated by the receiver and transmitted over an additional cable. The ADC clock is synchronized to this same clock signal to ensure that each ADC sample corresponds exactly to one channel in the multiplexed data. As a result of this synchronization, demultiplexing theoretically becomes a simple task of separating the data into groups consisting of every nth sample.
IV. MIXED SIGNAL MULTIPLEXER DESIGN
The multiplexing circuitry is to be placed with the transducer to receive the signals and connect them with a reduced number of connections to the back-end electronics. To directly interface with the transducers, an AFE is required, which is suited to the technology. For example, a transimpedance amplifier-based AFE would be suitable for CMUT designs [22] . An antialiasing filter prior to the signals being sampled is also required. Additional electronics, such as integrated TX/RX switches, may also be needed in the front end. Following the AFE is the sampling circuitry. This consists of an S/H buffer, an analog multiplexer, and sequencing logic. An overview of this is shown in Fig. 2 . The sequencing logic generates sample clock and the multiplexer selects signals for each channel based on a counter, which keeps track of which channel is to be connected. The sequencing signals are shown in Fig. 3 .
When a given channel is not being driven on to the cable, the S/H capacitor is connected via a pass transistor to the output of the TGC. This gives time for the capacitor to match the signal voltage and sample the signal [ Fig. 3(a) ]. Once it is time for that channel to be connected to the output, the S/H capacitor is disconnected from the input, and after a short dead time, is then connected via a buffer and the multiplexer to the cable driver [ Fig. 3(b) ].
The S/H circuitry has been included in the transmitter to ensure that the signals are correctly sampled and transmitted. Sampling in the transmitter ensures that during each time period on the cable, a constant value is sent. This means that the high-speed cable driver can settle to a constant value after a short switching transient rather than having to precisely follow a time varying signal during each period. This should ensure that an accurate sample is quantized by the ADC for that time period.
A key part of the TDM scheme is alignment between the ADC and the multiplexer. Two types of alignment are required to ensure that the samples are correctly digitized: channel alignment and phase alignment. The former is required to ensure the sample for channel 1 on the multiplexer is known to be channel 1 when demultiplexing-without this alignment, the elements could get mixed up during demultiplexing. The latter is used to ensure that the TDM and ADC clocks are correctly phase aligned to account for the propagation delays in the cabling. This phase alignment is critical to ensure that the ADC is taking samples when the signal on the cable has stabilized rather than during switching transients.
Both alignment requirements are satisfied during an initialization sequence, which consists of a training pattern being generated by the multiplexer and analyzed in the FPGA. During the training sequence, the inputs to the multiplexer are internally tied to bias voltages, with the first channel connected to one voltage level, and all other channels connected to a second voltage level, as shown in Fig. 4 . By analyzing the converted data, it is possible to correctly align the system.
The alignment starts with channel 1 being identified by the way of the quantized codes being significantly different in one sample than the others, as shown in Fig. 4(a) . After this, the phase of the TDM clock is adjusted to find the optimal phase shift between the ADC and TDM clocks. The TDM clock phase can be adjusted using a phase-locked loop (PLL) in the FPGA to determine the optimum alignment, as shown in Fig. 4(b) . Initially, the phase shift is increased to locate the point at which the difference between channel 1 and the others is reduced by half, L . The phase difference is then reduced until the point at which the difference drops to the same amount, R . These two points identify at which phases the switching transients are occurring, so the optimal phase where the sampling should be performed, O , must be the midpoint.
V. MULTIPLEXER IMPLEMENTATION IN CMOS
A prototype TDM multiplexer stage for eight channels, comprised of a single 8:1 multiplexer, has been fabricated in a 0.35-μm 2P4M Taiwan Semiconductor Manufacturing Company (TSMC) process and operates from a 3.3-V supply. The implemented circuitry, shown in Fig. 5 , occupies an area of 0.80 mm × 0.26 mm (without the bond pads). The chip consumes on average 3.9 mW of power, assuming a 10% duty cycle on time. The circuitry was designed to run at 200 MHz, sampling each channel at 25 MSPS. Furthermore, as it was designed solely for testing the TDM scheme only, there is no AFE present in the design. Instead, the input stage is a unity gain buffer.
The input buffer drives the signal into the S/H capacitor via a pass transistor, which allows the input to be sampled when connected, and then disconnected to hold the value ready to be multiplexed. The S/H capacitor has a capacitance of 400 fF and is implemented as a poly-to-poly device, 19 μm × 19 μm in size. Each input buffer also has a controllable resistor, either pull-up or pull-down, which is enabled when the sequencing logic is switched into the link training mode in order to generate the required pulse sequence. Next, the S/H capacitors are buffered and connected to an analog multiplexer, which consists of a pair of pass transistors for each channel. A digital Gray code counter is implemented, which generates the control signals for these pass transistors to the sequence selection of each channel. Dead time is generated between the channels using digital delay chains in the sequencing logic to delay the rising edge of the control signal separately from the falling edge. The final output stage is a current feedback buffer adopted from [30] , which has an output bandwidth of 450 MHz when driving a 75-||15-pF load.
VI. DIGITAL DEMULTIPLEXER DESIGN
Demultiplexing is performed using an FPGA to replicate in the digital domain, all of the filtering that would be required in a traditional analog TDM scheme using the DSP techniques. For imaging purposes, a large number of receive elements are used, which means that multiple TDM channels would be required, and so to achieve real-time processing of the data, a high performance FPGA is required. Fig. 6 shows an overview of the operations which the FPGA is configured to perform with the incoming data from the ADCs.
The firmware has been designed to interface with the high-speed ADCs required for the TDM scheme, to complete link training, and to perform demultiplexing. The firmware must also beamform the data streams and to transfer images to a display. In the case of this design, the ADC interface is performed using the JESD204b protocol, which is a recently standardized protocol [31] becoming common with high-speed ADCs. By using a standard interface, then if a higher sample rate or channel count is required, faster ADCs (or more of them) can be simply substituted into the design. The transfer of image data is also done using a standardized protocol, and in this case, a PCIe link to a workstation computer from which images can be displayed, or further processing, such as 3-D rendering, can be performed.
One of the key roles for the FPGA is demultiplexing the TDM streams. This on the face of it should be a simple task, basically a case of taking each sample from the ADC and splitting it apart on a cyclic basis, so that every n samples contains one sample for each of the different channels. However, the task is actually somewhat more complex due to the way in which the channels are sampled by the multiplexer. Referring back to Fig. 3(a) , it is apparent that the samples for each channel are not taken simultaneously. While each channel is sampled at the same rate, there is a 45 phase shift from channel to channel-in fact for any multiplexing factor n, the phase shift between the channels will be 360 divided by n. This is done to minimize the multiplexer circuit size and to ensure that each sample once taken is immediately driven to the cable rather than being held in a leaky capacitor for longer than necessary.
Correction of the channel-to-channel phase shift must be performed before beamforming can take place. If the data streams were directly summed, the phase shift would act as an unwanted beamforming delay, which would be different for each channel. The phase shift must be removed for correct reconstruction; however, to do so would require the data to be shifted in the digital domain by less than one sample, a fractional delay, which is difficult to remove [32] .
The method used in this design approach is to perform the interpolation of the data up to the TDM clock rate using a multirate interpolation finite impulse response (FIR) filter. Once at the TDM clock rate, the required phase shifts become an integer number of samples, where the length depends on the channel-the first channel is delayed by one cycle, the second by two cycles, and so on. With the delays in place, all the channels in the interpolated stream will have been realigned and the phase shift from the multiplexer removed. The resulting data stream can then be demultiplexed and, if required, decimated to reduce the volume of data.
The demultiplexer design has been developed primarily using Verilog Hardware Description Language and Qsys (Altera). From this design, a register-transfer level (RTL) simulation model has been generated to allow for design verification and also to allow test processing of data without the need for physical FPGA hardware. The design has also been synthesized to verify that it can be implemented in an FPGA. The design has been built around a Stratix V GS FPGA (5SGSMD5K2F40C2N) from Altera. Using a test design for a 96 channel receiver (12 cables), all of the required hardware can be fit into the design, utilizing roughly 70% of the resources of the FPGA, and capable of achieving a maximum clock frequency of ∼350 MHz-more than sufficient for a 200-MHz TDM clock rate. The TDM clock PLL can run with a voltage-controlled oscillator frequency of 1.6 GHz, which would afford a phase resolution of 78.125 ps when adjusting the clock phase during link training.
VII. EQUIPMENT SETUP Dynamic TDM was demonstrated in an earlier paper [26] for the same sampling rate and channel count. However, the earlier setup used only postlayout simulation data, which was fed from an arbitrary waveform generator. To properly characterize the TDM multiplexer design and verify that the approach is viable, data from manufactured silicon are required. In this paper, physical silicon results have been obtained. The setup used is shown in Fig. 7 .
The TSMC multiplexer design described earlier in this paper has been fabricated and was used in the experimental setup. The silicon was diced and wire bonded to a chip carrier and connected to a generic printed circuit board (PCB), which brings each of the ICs connections out to SMA-type connectors. The various bias voltages and power supplies required by the IC were connected to carefully decoupled power supplies to minimize noise. As the silicon design has no AFE, each of the inputs must also be biased to avoid saturation of the S/H buffers.
The output of the multiplexer was connected to a 1 m length of 48 AWG μCoax cable with 0.15 mm outer diameter, as could be used in a catheter application, to ensure the results take into account the realistic effects of a bandwidth limited channel. Furthermore, as the TDM clock signal must also be routed to the multiplexer and may potentially present crosstalk issues, the clock in this experiment was fed through a μCoax cable in the same bundle as the analog signal.
At the receiving end, a Texas Instruments LMH5401 LNA is used. This amplifier is connected using an evaluation module, which has been modified to act as an active balun with the 12-dB voltage gain. The amplifier input is ac coupled and configured to be single ended with input impedance approximately matched to the characteristic impedance of the cable. The impedance matching was performed in order to limit signal reflections, which were causing large crosstalk issues in the earlier paper. The output of the amplifier was configured to be fully differential and connected directly to the input of a high-speed ADC-a TI ADC16DX370EVM board. The ADC has an input bandwidth of 800 MHz, and a 16-b resolution of which approximately 10-b of resolution is used, though this will depend on the amplitude of the transducer signals.
As the hardware for interfacing the FPGA to the ADC has not yet been implemented, the data from the ADC have been captured using an evaluation module, which interfaces the ADC with a PC via USB. The link training was performed by manually adjusting the phase between the ADC and TDM clocks, which were produced by two high-speed signal generators with a common reference clock. To complete the chain, the demultiplexing and the processing of the data have been performed using the simulation model that was generated from the FPGA firmware design. The simulation model will exhibit exactly the same behavior as the targeted FPGA will, which makes the model a perfectly valid tool for processing experimental data.
For this design, a 200-MHz clock signal is used, which means that each of the eight channels is sampled at 25 MSPS prior to being multiplexed. The frequency was chosen as a tradeoff between the sampling rate and the bandwidth requirements of the cable.
VIII. RESULTS AND COMPARISONS
Using the setup explained in Section VII, several experiments were performed in order to characterize the TDM silicon and the system as a whole. Of specific interest are two key points, first can ADC be synchronized correctly to allow the signals feeding the input of the TDM IC to be recovered using the direct digital demultiplexing approach, and second to determine how much crosstalk is present between the channels as a result of distortion of the multiplexed signal through the cable.
During each test, signal generators were used to feed signals into either one channel, or multiple channels (with other channels receiving dc). In particular, the tests involved using either a simple tone signal, or a more realistic sinc pulse. Using a tone allows for any crosstalk to be easily identified by analyzing the spectrum of each recovered signal-a spike at the frequency of the tone indicates that crosstalk is present. The sinc pulse allows for seeing the response of the system over the full bandwidth of each channel. Fig. 8 shows the results from one of the experiments, in which only channel 1 of the eight was fed a signal, and the others tied to a dc bias. Fig. 8(a) and (b) shows the recovered signal for the case of a 7-MHz tone, both in the time and frequency domains-the results are from the output of the FPGA simulation, which, after the interpolation and decimation performed by the FPGA, are sampled at 50 MSPS. Fig. 8(c) and (d) shows the recovered signal for the case of a sinc pulse with a bandwidth of ∼1 MHz, again both in the time and frequency domains.
As shown in Fig. 8(a) , the tone has clearly been recovered in the demultiplexing process, with the fast Fourier transform (FFT) of the signal showing a clear spike at 7 MHz. There is some crosstalk occurring between the channels as evidenced by the 7-MHz spike in Fig. 8(b) on the other channels, which should have no signal. The level of the crosstalk signals is less than −40 dB when compared with the signal on channel 1. It should be noted that this is a measurement of the crosstalk between the signals in the same μCoax cable as a result of the multiplexing scheme. The electrical crosstalk between the two separate μCoax cables in a catheter bundle was measured to be below −60 dB at frequencies up to 600 MHz, and therefore, it is the TDM crosstalk, which represents the limiting factor in this scheme.
The frequency domain also shows clear spikes present at 3, 4, and 11 MHz, with smaller spikes at 8 and 10 MHz also. It seems plausible that they are caused by aliasing within the multiplexer. For example, if the 7-MHz tone feeding the IC had some signal present at the second and third harmonics, this would account for two of the spikes (4 and 11 MHz) as a result of aliasing. Investigation of into the cause of these spikes was not complete at the time the paper was submitted.
Observing the signal recovered from feeding in a sinc pulse [ Fig. 8(c) ], the signal has also clearly been recovered correctly, with a fairly flat response across the passband of the spectrum. The signals are bandpass filtered in the FPGA with the cutoff frequencies of 3 and 10 MHz resulting in the relatively empty spectrum outside this band. It is difficult to determine from the frequency domain in Fig. 8(d) what level of crosstalk is present between the channels, however, by measuring the peaks in the time domain, this crosstalk is roughly −36 dB-similar levels to the tone test.
These results show a significant improvement over those of the earlier work [26] . The source of this improvement is almost certainly the result of changes made at the receiving end, specifically the inclusion of an amplifier that has an input impedance closely matched to the characteristic impedance of the cable. This matching helps to reduce the amount of signal reflection in the cable, which is one of the major sources of crosstalk between the channels.
The crosstalk levels still appear to be large. However, it is important to place these results in the context of the application. In a practical ultrasound imaging system, the input of the multiplexer would be connected to a transducer array in which each channel on the same multiplexer is an adjacent element. The variation in signal on the adjacent channels of the multiplexer is much less than in these tests, and as a result, the channel-to-channel crosstalk can be significantly less. Furthermore, even at the presented crosstalk levels, the system would still provide suitable performance for ultrasound imaging catheters [33] .
IX. CONCLUSION
To produce 3-D ICE catheters with large element counts, cable reduction is a key requirement not only to minimize the catheter diameter but also to allow operation under MRI, where the risk of burns from induced heating in cables is a concern. While there are currently a large variety of cable reduction strategies, many of these are not suitable for this application due to size, power, or other limitations.
TDM coupled with direct digital demultiplexing presents an alternative reduction strategy, which is well suited to the requirements of ultrasound catheters. This TDM approach has been developed and demonstrated in this paper. In the system design, a cable reduction ratio of at least 8:1 can be achieved while allowing for simultaneous sampling of echo signals by sampling a time-multiplexed signal at a very high sample rate. The signals were then in the digital domain using the DSP techniques in the RTL simulation of firmware for a high-performance FPGA.
The experimental results have been obtained using a CMOS multiplexer fabricated in a 0.35-μm process coupled with a 1 m length of narrow diameter μCoax cable and a high-speed ADC, and have been presented. These results demonstrate that the design is able to achieve a crosstalk between the channels on the same cable of about −40 dB. This suggests that the approach could provide suitable performance in an ultrasound-imaging catheter while significantly reducing cable count.
