Introduction
Ultrasound scanners are widely used in medical applications since it is a very effective and fast diagnostic technique. The traditional static ultrasound scanners are large devices which are plugged into the grid and therefore they have no power consumption limitation. Consequently, the design tendency is to keep increasing their complexity to obtain better picture quality. The electronics used in static ultrasound scanners are typically discrete components due to their low cost. These components are over-designed and tend to consume considerably more power than needed for a specific application. Nonetheless, this is not an issue due to the practically limitless amount of power available.
Even though static ultrasound scanners are very effective, they have some drawbacks. Firstly, due to size and complexity, the amount of diagnosis that can be performed per unit of time is limited. Furthermore, the amount of devices available per hospital is also limited by the cost per scanner. In order to overcome these drawbacks, portable ultrasound devices are being developed. These devices have a much lower cost and allow a significant increase in the amount of diagnosis per unit of time. However, portable scanners have power consumption, heat dissipation and area limitations, hence the design approach of a portable ultrasound scanner is to utilize the power budget and area available in the most effective way in order to achieve the best picture quality possible. The electronics for the scanner need to be custom designed requiring an application specific integrated circuit (ASIC) solution. In the last decade, high integration has enabled portable ultrasound scanners to have a sufficient picture quality, even comparable to the performance of the low end traditional static ultrasound scanners, making them usable for medical applications.
Portable ultrasound scanners consist of hundreds of channels and each of them has a transducer, a high-voltage trans- mitting circuit (Tx) and a low-voltage receiving circuit (Rx). The Tx provides the high-voltage pulses that the transducer needs to generate ultrasonic waves and the Rx amplifies and digitizes the low-voltage signal induced in the transducer. There are several types of transducers, and the most commonly used are the piezoeletric transducers. However, recent studies have shown that capacitive micromachined ultrasonic transducers (CMUTs) have several advantages respect to the piezoelectric ones such as wider bandwidth, better temporal and axial resolution, and also better thermic and transduction efficiency [1] . Furthermore, CMUTs have high integration compatibility with electronics since their fabrication process is similar to the standard silicon processes used for integrated circuits [2] .
CMUTs are composed of a thin movable plate suspended on a small vacuum gap on top of a substrate. They have two terminals, one connected to the substrate and the other connected to the movable plate. By applying a voltage difference between the two terminals of the CMUT, the thin plate deflects due to an electrostatic force. The ultrasound is generated when applying high-voltage pulses in one of the terminals of the CMUT which makes the thin plate vibrate [3] . This paper is an extended version of the work [4] published in 13th IEEE International NEW Circuits And Systems (NEWCAS) conference in 2015. The transmitting circuit design is a new and improved version of the work presented in [5] . Due to the high-voltage necessity of the transducers, the circuitry is implemented in AMS 0.35 µm highvoltage CMOS process. Designing in high-voltage processes is a challenge because of the very strict design rules in order to avoid breakdown voltages and the use of high-voltage devices, which are more complex than the standard CMOS process ones. 
Transmitting circuit specifications
The transmitting circuit needs to drive a particular CMUT, therefore its specifications come from the inherent transducer characteristics. The CMUT used in this project has been designed and modeled at Nanotech department at the Technical University of Denmark (DTU), and even though the driving requirements are described here, the electrical equivalent model of the CMUT is confidential, therefore it is not presented in this paper. A picture of several of these CMUTs collected in an array is shown in Fig. 1 . Each CMUT, which is mainly a capacitive load, has an equivalent capacitance of approximately 30 pF and has a resonant frequency of f t = 5 MHz. In receiving mode, the transducer needs a bias voltage of 80 V and during transmission, the CMUT requires high-voltage pulses from 60 V to 100 V toggling at its resonant frequency and a driving strength corresponding to a slew rate (SR) of 2 V/ns. Ultrasound scanners transmit for a short period of time, 400 ns, and receive for a much longer period of time, 106.4 µs, hence the operation transmitting duty cycle is 1/266 in this particular application.
Design and implementation of the Tx
The structure of the transmitting circuit designed in this paper is shown in Fig. 2 . The Tx consists of a three-level highvoltage output stage that drives the ultrasonic transducer, which is controlled with high-voltage signals provided by the level shifters. The low-voltage signals needed for the level shifters' operation are generated by the control logic block. The design approach is to minimize the area and power consumption therefore no reconfigurability features have been added. The Tx is designed to drive a specific CMUT with the characteristics described in Section 2.
In the next subsections the design of each block of the Tx circuit is presented. The MOS devices used in all the schematics are devices with different maximum drain-source (V DS,max ) and gate-source (V GS,max ) breakdown voltages. A summary table with the symbol of each device is shown in Fig. 3 . Note that NMOSi stands for an NMOS which is located in its own P-well, therefore its bulk terminal can be tied to a different voltage potential than the p-substrate. 
Differential output stage
CMUTs are non-polarized devices, therefore they can be single-ended driven by pulsing one of the plates and biasing the other or differential driven by pulsing both terminals, which is the approach used in this design. The most common approach is to use single-ended driving [5] , [6] . This topology is shown in Fig. 4 and it consists in MOS devices used as switches that connect the output node to three different voltage levels, high (V H ), middle (V M ) and low (V L ). There are several drawbacks when using this topology. Firstly, the size of the circuitry is large since more than one transistor per voltage level is needed. Two transistors are required to connect the output node to V M , an NMOS to pull down from V H and a PMOS to pull up from V L , which occupy extra area. Furthermore, two extra diode-coupled MOS devices are needed in order to avoid short circuiting voltage supplies through the body diode of the MOS transistors connected to V M . Apart from extra capacitance and area, these diode-coupled MOS devices also add a small voltage drop that caused a small offset from the V M level in the output node.
In order to solve the aforementioned problems and improve the area and power consumption of this block a new differential output stage topology was designed and its schematic can be seen in Fig. 5 . It consists of two two-level output stages, each of them connected to one of the terminals of the transducer, that combined can generate a total of three differential voltage levels. A time diagram of the control signals of the MOS devices and the differential voltage across the CMUT (V CM U T ) is shown in Fig. 6 . There are several advantages of this topology. Firstly, the number of transistors used is only four, instead of the six used in the single-ended version, which translates into less area and also less parasitic capacitance. The two diode-coupled MOS devices are not used anymore so there is no voltage offset from the voltage supplies to the output node connected to the CMUT. Secondly, since CMUTs are mainly capacitive loads, the two sides of the output stage are DC voltage isolated, therefore the voltage swing that each side needs to handle is only a drain-source voltage of 20 V instead of the single-ended version where some of the MOS devices of the output stage needed to handle the full pulse swing. Since the voltage requirements are lower, the MOS devices can also be smaller and with less parasitic capacitance which improves the area and power consumption. Thirdly, since the CMUT is driven differentially, the slew rate required in each side of the output stage is reduced to 1 V/ns, which is half of the slew rate specified in Section 2. The slew rate required is related to the size of the MOS devices, hence reducing the SR requirements will allow for smaller device parameters. This topology also presents potential advantages such as four level pulsing, which can be achieved by choosing adequate V 1 , V 2 , V 3 and V 4 in the Tx. If the voltages are chosen so that (V 1 -V 2 ) = (V 3 -V 4 ) four different levels across the CMUT can be obtained. Increasing the number of voltage levels can be beneficial for the power consumption, as shown in [6] , and it will be investigated in the future. There is one consideration to be made regarding the differential topology, which is the need of an extra pad in the integrated circuit since it needs to be connected to the two terminals of the CMUT instead of one. In principle, this would require a full extra high-voltage ESD protected pad, which occupies an area of approximately 0.11 mm 2 . However, the output stage transistors are significantly large, hence the inherent ESD protection is estimated, through simulations, to be enough in order to protect the integrated circuit. Consequently, in the full ultrasound scanner system, the ESD protection would not be present since they occupy extra unnecessary space. For the purpose of reducing the risk of having a non-functional integrated circuit, it was decided to include two complete differential Tx circuits in the die, one with ESD protected pads and one with only two small pad openings. These small pad opening of 0.025 mm 2 are placed on the top of the output stage occupying no additional area. In case that the non-ESD protected version would not work, an ESD protected version could be measured, and some information could be taken out of the integrated circuit.
In order to select the devices for the output stage the breakdown voltages |V DS,max | and |V GS,max | need to be determined. As it can be seen from Fig. 5 , the |V DS,max | for all the devices is 20 V, however, the |V GS,max | comes determined by the swing of the gate signal. The higher tolerable |V GS,max |, the bigger the transistor and also, the more parasitics it will have. For this reason, devices with a |V GS,max | of 5 V are chosen, which is the lowest |V GS,max | available in this process for high-voltage devices. This device choice also sets the maximum gate signal swing to 5 V.
The MOS devices M 1 , M 2 , M 3 and M 4 are sized in order to achieve a minimum SR of 1 V/ns in each side of the 
Improved pulse-triggered level shifters
The output stage contains four MOS devices, M 1 , M 2 , M 3 and M 4 and they need to be driven with signals with different high (V HI ) and low-voltage levels (V LO ). Each MOS device requires a level shifter which needs to be optimized and designed for that specific voltage as shown in Table 1 . A low-power pulse-triggered topology is used for the three high-voltage level shifters and a conventional cross coupled low-voltage topology is used for the 5 V level shifter since its power consumption and area are negligible (not shown here due to its simplicity).
The pulse-triggered level shifter topology is a well known topology which is very power efficient since current is consumed only during transitions [7] , [8] , [9] . It consists of input branches that control a latch in the output using current pulses. Even though this topology is used in circuits with low-power requirements [5] , it can present some problems such as large area due to the high gate-source voltage range, unregulated current pulse magnitude that controls the state of the latch and latch start-up state issues when ramping the high-voltage domain of the level shifter. In order to overcome some of these problems an improved version of the pulse-triggered level shifter presented in [10] is used and its schematic is shown in Fig. 8 . For all the level shifters, M 5 and M 6 should be selected to be able to handle their respective |V DS,max | = V HI . Furthermore, in the V HI = 100 V version, two cascode transistors were added on top of M 5 and M 6 for operation consistency.
The first design consideration is to minimize the gatesource voltage swing V HI -V LO . In [5] a V HI -V LO = 12.5 V was used, however, by reducing this voltage to 5 V, MOS devices with thinner gate oxide can be used which are smaller and have less parasitic capacitances. Furthermore, using these devices, now the floating current mirror and the latch can be collected in a single deep N-well reducing significantly the area of the design. The second improvement of the common topology is the addition of a current mirror formed by M 1a , M 1b , M 1c and M 1d that controls the magnitude of the current pulse that changes the state of the latch. This allows for a smaller magnitude of the current pulse as it can be controlled from a bias generator with reduced process, voltage and temperature dependence, hence there is no need to overdesign it for the worst case process corner. In order to guarantee that the drain of M 1c does not exceeded the V DS,max of M 1c and M 1d , the maximum gate voltage of M 5 and M 6 is set to 3.3 V. In case that both M 5 and M 6 are off, the drain of M 1c could theoretically raise above 3.3 V due to leakage current of M 5 and M 6 . However, the bias current flowing through M 1c and M 1d is higher than the leakage current, making sure that the drain of M 1c does not exceed 3.3 V. The last improvement in the level shifters is the addition of common mode clamping transistors M 7 and M 8 to reduce the common mode current transferred to the latch when the Fig. 9 Layout of the improved pulse-triggered level shifters. high-voltage domain of the level shifter is ramping [11] . Using these two extra MOS devices the design is more robust to high-voltage ramping. It is worth to mention that since each level shifter is designed for a different voltage level, the delay from the input to the output of each of them is different. Consequently, the delays need to be compensated in the low-voltage control logic block, to avoid shoot through in the output stage.
The on-chip area occupied by all four level shifters is approximately 0.059 mm 2 and the corresponding layout is shown in Fig. 9 .
Low-voltage control logic
The low-voltage control logic, which is supplied at 3.3 V, consists of three parts which are shown in Fig. 10 : Synchronization, delay compensation and pulser. Firstly, the input signals, s i , are synchronized to avoid any effect of external routing and also ensure 50% pulsing duty cycle even if the input signals s i are not exact. The synchronization is performed on-chip using standard cell flip-flops clocked at double frequency of the pulses, f clk = 2f t = 10 MHz. Secondly, the synchronized signals s i are separately delayed in order to compensate for the different delays of the level shifters and also a common delay is added as dead time to avoid shoot through in the output stage by having two MOS devices on at the same time. The delays are implemented with standard cell inverters for area reduction and power consumption purposes. Finally, the synchronized and delay-compensated signals, s i , are converted into pairs of set/reset signals, s set,i and s reset,i , to properly drive the pulse triggered level shifters. The pulsing circuit used is the same mentioned in [5] . During the design process of the low-voltage control logic, both corners and mismatch simulations were performed to ensure the correct functionality of the block.
Measurement results
The transmitting circuit was taped out in AMS 0.35 µm highvoltage process, and the fabrication report received from the factory shows that the 20 received dies are around the typical corner. A picture of the integrated circuit die taken with a microscope can be seen in Fig. 11 . The low-voltage control logic is located in area a) with an area of 0.01 µm 2 , the level shifters are situated in area b) with an area of 0.059 mm 2 and the differential output stage is located in c) and occupies an area of 0.055 mm 2 . The total area of the transmitting circuit accounting also for the routing is 0.18 mm 2 . As previously mentioned, two full transmitting circuits were included in the die, one with ESD protected pads and a second one with just pad openings. Some initial ESD evaluation tests were performed on the non ESD protected version obtaining very robust results and consistent performance, even through reckless integrated circuit manipulation. Consequently, all measurement results were made with the non-ESD protected Tx, since the ESD protection would not be part of the ultrasound scanner system. The complete ESD evaluation is going to be performed in the future.
For the purpose of assessing the performance of the transmitting circuit, a PCB was built to test it. The measurement setup used is shown in Fig. 12 . Two Hewlett Packard E3612A voltage supplies were used to generate 20 V and 100 V, and from those voltages the on-board linear regulators generate the rest of the voltage levels used in the Tx, 5 V, 15 V, 80 V, 85 V and 95 V. During the current measurements, only the current from each voltage level fed into the chip was accounted, hence the current sunk by the linear regulators was not considered. The low-voltage input signals and the low-voltage supply were generated using an external Xilinx Spartan-6 LX45 FPGA with a maximum clock frequency of 80 MHz and 3.3 V operation. The voltage outputs of the Tx connected to the transducer and the current consumption were measured using a Tektronix MSO4104B oscilloscope and a Tektronix TCP202 current probe.
Using the described setup, the integrated circuit was tested with pulses from 60 V to 100 V, frequency of 5 MHz, a receiving bias voltage of 80 V and ultrasound scanner transmitting duty cycle of 1/266. The measured voltage of the two terminals of the CMUT and the differential voltage be- tween the plates of the CMUT can be seen in Fig. 13 . The bias voltage is stable at 80 V when receiving and it toggles according to the input signals supplied between 60 V and 100 V at a measured frequency of 5 MHz when transmitting.
The transmitting circuit power consumption is characterized with no load, with the equivalent capacitance of the CMUT connected and with the full electrical model of the CMUT connected. In order to measure the power consumption of the Tx for these three load scenarios, the currents from all the voltage sources supplying the integrated circuit were measured for each case. The measurements are shown in Table 2 . The currents measured from the 5 V, 15 V, 85 V and 95 V supplies were negligible compared to the ones measured in the other voltage supplies, so they are accounted as zero and are not shown in the table. Using these current measurements, the power consumption can be calculated obtaining 0.056 mW for the non-loaded Tx, 0.754 mW for the Tx with the equivalent capacitance of the CMUT connected and 0.936 mW for the Tx with the electrical model of the CMUT connected. These numbers highly correlate with the results of the simulations with parasitics of 0.052 mW, 0.712 mW and 0.894 mW respectively. The minimum SR measured in the high-voltage terminal of the Tx is SR H = 0.91 V/ns and the SR measured in the low-voltage terminal is SR L = 1.12 V/ns. The resulting differential SR seen from the CMUT load is 2.03 V/ns. These results are a bit below the simulated values with parasitics, which for the typical corner were SR H = 1.09 V/ns and SR L = 1.23 V/ns. This slightly reduced slew rate is attributed to the external PCB routing and the capacitance of the probes used to measure, which affect the total load capacitance that the Tx has to charge and discharge. For the purpose of comparing the simulation results and measurements accurately, the equivalent capacitances of the probes were added to the simulation testbench of the Tx in the typical corner and extracted parasitics. SR H and SR L were simulated again obtaining 0.97 V/ns and 1.17 V/ns respectively, which are now much closer to the measured results. This simulation can be performed again through the corners leading to SR H = 0.76 V/ns and SR L = 0.94 V/ns for the slowest corner and SR H = 1.15 V/ns and SR L = 1.40 V/ns for the fastest corner. According to these numbers, the dies received seem to be very close to typical corner as it was reported from the factory.
Even though the received dies are around the typical corner, the local process variations generate a spread on the performance of each die. In order to assess this variation on fabricated dies, the minimum SR H and SR L of the 20 fabricated integrated circuits were measured and compared to the expected variation from the simulations. In Fig dom points. The equivalent capacitances of the measuring probes were also included in the simulation. The measured results from the 20 dies, which are also close to the typical corner, are plotted on top of the simulated distribution.
Even though the measured sample size is not big enough to take direct conclusions, it can be seen that for both SR H and SR L the samples fall around the expected values. However it is still unclear how the simulated and measured distributions differ.
Typically, when analyzing samples, it is common to show the ±3σ limits without taking into account the number of samples used and directly compare them with the expected distribution. This approach is highly problematic due to several false assumptions as it is suggested in [12] . In order to show this information more precisely, the approach suggested in [12] is used resulting in Fig. 15 . The SR H and SR L of the 20 measured samples (N = 20) and their respective median range M and percentiles P 15.87 and P 84. 13 for a confidence level of 95% are shown. For the purpose of comparing the measured results with the simulation results, the same information is plotted for the 100 Monte Carlo iterations. As it can be seen, there is a good correlation between results. However, the measured M ranges are 6% to 10% lower than the simulated ones, which is very likely due to external PCB routing and fabrication not being exactly in the typical corner. Furthermore, the measured M ranges are wider due to the lower number of samples compared to the simulations. The percentiles are similarly spread around M for the SR H , but for the SR L , the P 84.13 percentile is much narrower. These results could be caused by variance due to small sample size. Overall, there is a high correlation between the expected results from simulations and measurements. 
Discussion
The design presented can not be compared directly with state of the art transmitting circuits since the references found either do not specify the driving conditions, area and power consumption or only the full channel consumption, including the receiving circuitry, is stated [13] , [14] , [15] . Nevertheless, a comparison with the single-ended driving topology in [5] can performed since both area and power consumption with a capacitive load are stated. The operating conditions in [5] are different: The pulse voltage swing is 50 V, the duty cycle is 50% and a load is 15 pF. In order to compare the topologies accurately, the same operating conditions should be defined. The conditions chosen are the ones closest to the operation of an ultrasound scanner such as the ones defined in this paper: pulse voltage range of 40 V, pulsing frequency of 5 MHz, a transmitting duty cycle of 1/266 and an capacitive load of 30 pF, which is the equivalent capacitance of the CMUT. Adjusting the power consumption of [5] to the operation conditions of an ultrasound scanner, a comparison can be performed and a summary is shown in Table 3 . The differential Tx presented in this paper achieves a very significant area reduction of 80.8% and the power consumption is reduced by 58.2%. The measurements performed show a good correlation with the simulated results, which increases the reliability of the simulations. Even though the measured sample size is limited to the amount of dies received, the design shows to be solid and functional through local process variations. It can probably be expected that the Tx will behave according to the simulations in other process corners, however, in order to prove that, the design should be fabricated with the specific corner conditions desired to test. Nevertheless, due to the good correlation between simulations and measurements, any future tapeout with an improved Tx has a lower risk to generate a non-functional integrated circuit.
The next step for the Tx would be to implement on-chip voltage regulation. As mentioned before, the number of voltage levels required in the Tx is significantly high and a lot of external extra circuitry is required to generate them. Only one high-voltage supply would be needed with internal voltage regulation, furthermore, the high-voltage ramping of all the level shifters would be better controlled.
Conclusions
In this paper a differential integrated high-voltage transmitting circuit for CMUTs is successfully designed and implemented in AMS 0.35 µm high-voltage process. The circuit supplies pulses with a frequency of 5 MHz, voltage levels of 60 V, 80 V and 100 V and a measured slew rate of 2.03 V/ns with the load connected. The transmitting circuit is measured under the operation conditions of an ultrasound scanner in order to accurately assess the performance of the circuitry. The total operating power consumption measured on the integrated circuit is 0.936 mW and the circuit occupies an on-chip area of 0.18 mm 2 obtaining a small and efficient the transmitting circuit very suitable for portable ultrasound scanner applications. The design shows to be robust through local process variations and a high correlation between measurements and simulations is found.
