Department of Bioengineering

Imperial College London

# Front-End Receiver for Miniaturised Ultrasound Imaging

**Graham Peyton** 

June 2018

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of Imperial College London and the Diploma of Imperial College London I herewith certify that all material in this dissertation is a product of my own work or, if not, it has been appropriately referenced.

Graham Peyton

The copyright of this thesis rests with the author and is made available under a Creative commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work.

### Abstract

Point of care ultrasonography has been the focus of extensive research over the past few decades. Miniaturised, wireless systems have been envisaged for new application areas, such as capsule endoscopy, implantable ultrasound and wearable ultrasound. The hardware constraints of such small-scale systems are severe, and tradeoffs between power consumption, size, data bandwidth and cost must be carefully balanced.

To address these challenges, two synthetic aperture receiver architectures are proposed and compared. The architectures target highly miniaturised, low cost, B-mode ultrasound imaging systems. The first architecture utilises quadrature (I/Q) sampling to minimise the signal bandwidth and computational load. Synthetic aperture beamforming is carried out using a single-channel, pipelined protocol in order to minimise system complexity and power consumption. A digital beamformer dynamically apodises and focuses the data by interpolating and applying complex phase rotations to the I/Q samples. The beamformer is implemented on a Spartan-6 FPGA and consumes  $296 \, mW$  for a frame rate of  $7 \, Hz$ . The second architecture employs compressive sensing within the finite rate of innovation (FRI) framework

to further reduce the data bandwidth. Signals are sampled below the Nyquist frequency, and then transmitted to a digital back-end processor, which reconstructs I/Q components non-linearly, and then carries out synthetic aperture beamforming.

Both architectures were tested in hardware using a single-channel analogue front-end (AFE) that was designed and fabricated in AMS 0.35  $\mu m$  CMOS. The AFE demodulates RF ultrasound signals sequentially into I/Q components, and comprises a low-noise preamplifier, mixer, programmable gain amplifier (PGA) and lowpass filter. A variable gain low noise preamplifier topology is used to enable quasi-exponential time-gain control (TGC). The PGA enables digital selection of three gain values (15 dB, 22 dB and 25.5 dB). The bandwidth of the lowpass filter is also selectable between 1.85 MHz, 510 kHz and 195 kHz to allow for testing of both architectural frameworks. The entire AFE consumes  $7.8 \, mW$  and occupies an area of  $1.5 \times 1.5 \, mm$ . In addition to the AFE, this thesis also presents the design of a pseudodifferential, log-domain multiplier-filter or "multer" which demodulates low-RF signals in the current-domain. This circuit targets high impedance transducers such as capacitive micromachined ultrasound transducers (CMUTs) and offers a 20 dB improvement in dynamic range over the voltage-mode AFE. The bandwidth is also electronically tunable. The circuit was implemented in  $0.35 \,\mu m$  BiCMOS and was simulated in Cadence; however, no fabrication results were obtained for this circuit.

B-mode images were obtained for both architectures. The quadrature SAB method yields a higher image SNR and 9% lower root mean squared error with respect to the RF-beamformed reference image than the compressive SAB method.

Thus, while both architectures achieve a significant reduction in sampling rate, system complexity and area, the quadrature SAB achieves better image quality. Future work may involve the addition of multiple receiver channels and the development of an integrated system-on-chip.

### Acknowledgement

The past four years has been one of the most wonderful, eventful, trying and rewarding periods of my life. While the successful completion of a PhD is an immense personal victory, I am deeply humbled and thankful for the support of those who have helped me along the way.

I cannot express enough thanks to my supervisors, Manos Drakakis and Martyn Boutelle. Manos, you have been a father to me throughout my time at Imperial. I arrived in the UK "wet behind the ears", and through your wise supervision, I avoided many pitfalls and succeeded in fulfilling my dream of completing a PhD in engineering. I remember the hours of head-scratching over tough engineering problems, and also the occasional cogitation over ancient Greek history and postmodern philosophy in your office! You added a human aspect through difficult times, and because of you, I will always look back with fond memories on my time at Imperial. Martyn, thank you too for patiently directing me in the beginning, and for your continual support, guidance and encouragement throughout. I am grateful for the many occasions when you offered wise counsel when facing difficult decisions. I am indebted to the donors of the Imperial College PhD Scholarship Fund for financial support throughout my PhD. This work would not have been possible without their generous contributions.

To my colleagues: thank you for the many hours of patient assistance in every aspect of my work, and for your friendship and company. Special thanks goes to Behzad Farzaneh, my "analogue mentor", for teaching me the tricks of the trade, and for helping me to successfully tape out my first chip! Thanks also goes to Ilias Pagkolas who also provided much assistance in learning to use Cadence. Finally, to Hamid Soleimani, my "digital mentor" - thank you for not only illuminating the magical world of digital design, but for being a brother, friend and companion through the highs and lows.

I wish to dedicate this thesis to my parents, Derek and Kathy. I am deeply thankful as I remember my dear mother's unflinching love that transcended continents. I sorely miss her as I remember how much she desired to see the completion of this work. Dad, thank you for your love and encouragement not only during my PhD, but from the very first day of my life! I am what I am today by the grace of God shown through your and mom's guidance, encouragement, counsel and support.

To my beloved wife, Megan: we embarked on this great adventure three years ago as newlyweds, and you have been a pillar of support to me through the struggles and joys of life in the UK. You knew just how to encourage my weary soul after the long and frustrating days at work! We now embark on a new adventure with *three* of us! Lastly, and most importantly, I am ever grateful to God, not only for the opportunity to study at Imperial College London, but also for enabling and sustaining me through the long journey that is a PhD.

#### SOLI DEO GLORIA

### **Publications**

- **G. Peyton**, M.G. Boutelle, E.M. Drakakis, "Front-End Receiver Architecture for Miniaturised Ultrasound Imaging", The 3rd World Congress on Electrical Engineering and Computer Systems and Science, Rome, Italy, June 2017.
- G. Peyton, B. Farzaneh, H. Soleimani, M.G. Boutelle, E.M. Drakakis, "Quadrature Synthetic Aperture Beamforming Front-End for Miniaturised Ultrasound Imaging", IEEE Transactions on Biomedical Circuits and Systems, Accepted May 2018.
- G. Peyton, M.G. Boutelle, E.M. Drakakis, "Comparison of Synthetic Aperture Architectures for Miniaturised Ultrasound Imaging Front-Ends", BioMedical Engineering Online, Accepted June 2018.

# Contents

| Al | ostrac | et                                          | iii |
|----|--------|---------------------------------------------|-----|
| Ac | know   | vledgement                                  | vi  |
| Pu | ıblica | tions                                       | ix  |
| 1  | Intr   | oduction                                    | 1   |
|    | 1.1    | Background and Problem Statement            | 1   |
|    | 1.2    | Aims and Objectives                         | 3   |
|    | 1.3    | Contributions                               | 4   |
|    | 1.4    | Thesis Organisation                         | 6   |
| 2  | Ultr   | asound Fundamentals                         | 8   |
|    | 2.1    | Ultrasound Imaging Basics                   | 8   |
|    | 2.2    | Beamforming                                 | 11  |
|    |        | 2.2.1 Overview of Beamforming Architectures | 12  |
|    |        | 2.2.2 Synthetic Aperture Beamforming (SAB)  | 17  |
|    |        | 2.2.3 Frame Rate                            | 20  |
|    |        | 2.2.4 Spatial Compounding                   | 21  |

|   |      | 2.2.5    | Second Harmonic / Multi-Frequency Imaging                | 22 |
|---|------|----------|----------------------------------------------------------|----|
|   | 2.3  | Summ     | ary                                                      | 23 |
| 3 | Synt | thetic A | perture Imaging Architectures                            | 25 |
|   | 3.1  | Propos   | sed Beamforming Architectures                            | 25 |
|   | 3.2  | Archit   | ecture 1: Quadrature Synthetic Aperture Beamforming      | 27 |
|   | 3.3  | Archit   | ecture 2: FRI Compressive Synthetic Aperture Beamforming | 31 |
|   |      | 3.3.1    | Introduction to Compressive Sensing within the FRI       |    |
|   |      |          | Framework                                                | 32 |
|   |      | 3.3.2    | Sampling Signals with Finite Rate of Innovation          | 35 |
|   |      | 3.3.3    | FRI Compressive Sensing Simulations                      | 39 |
|   | 3.4  | Summ     | ary                                                      | 51 |
| 4 | Digi | tal Bear | mforming Implementation                                  | 52 |
|   | 4.1  | Digital  | Beamforming Algorithm                                    | 53 |
|   | 4.2  | Digital  | l Design Tradeoffs                                       | 60 |
|   | 4.3  | FPGA     | Implementation                                           | 62 |
|   | 4.4  | ASIC     | Implementation                                           | 63 |
|   |      | 4.4.1    | Synthesis                                                | 64 |
|   |      | 4.4.2    | Physical Implementation                                  | 65 |
|   |      | 4.4.3    | System-Level Power Estimation                            | 67 |
|   | 4.5  | Summ     | ary                                                      | 68 |
| 5 | Ana  | logue F  | ront-End of Ultrasound Receiver                          | 69 |
|   | 5.1  | Overvi   | iew and Requirements Analysis                            | 70 |
|   |      | 5.1.1    | Design Overview                                          | 70 |

|   |      | 5.1.2                                                                                  | Requirements Analysis                                                                                                                                               |
|---|------|----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   | 5.2  | Pream                                                                                  | plifier                                                                                                                                                             |
|   |      | 5.2.1                                                                                  | Prior Art                                                                                                                                                           |
|   |      | 5.2.2                                                                                  | Design and Simulations Results                                                                                                                                      |
|   | 5.3  | Mixer                                                                                  |                                                                                                                                                                     |
|   |      | 5.3.1                                                                                  | Prior Art                                                                                                                                                           |
|   |      | 5.3.2                                                                                  | Performance Specifications                                                                                                                                          |
|   |      | 5.3.3                                                                                  | Design and Simulation Results                                                                                                                                       |
|   | 5.4  | Progra                                                                                 | mmable Gain Amplifier (PGA) 92                                                                                                                                      |
|   | 5.5  | Image-                                                                                 | Reject Filter                                                                                                                                                       |
|   | 5.6  | Biasing                                                                                | g Circuitry                                                                                                                                                         |
|   | 5.7  | Layout                                                                                 | t                                                                                                                                                                   |
|   | 5.8  | Summa                                                                                  | ary                                                                                                                                                                 |
| 6 | Syst | em Inte                                                                                | gration and Validation 105                                                                                                                                          |
|   | 6.1  | Experi                                                                                 | mental Setup                                                                                                                                                        |
|   | 6.2  | AFE P                                                                                  | erformance                                                                                                                                                          |
|   |      | 6.2.1                                                                                  | Preamplifier                                                                                                                                                        |
|   |      | 6.2.2                                                                                  | Mixer                                                                                                                                                               |
|   |      |                                                                                        |                                                                                                                                                                     |
|   |      | 6.2.3                                                                                  | Programmable Gain Amplifier (PGA)                                                                                                                                   |
|   |      | 6.2.3<br>6.2.4                                                                         | Programmable Gain Amplifier (PGA)    112      Lowpass Filter    113                                                                                                 |
|   |      | <ul><li>6.2.3</li><li>6.2.4</li><li>6.2.5</li></ul>                                    | Programmable Gain Amplifier (PGA)                                                                                                                                   |
|   |      | <ul><li>6.2.3</li><li>6.2.4</li><li>6.2.5</li><li>6.2.6</li></ul>                      | Programmable Gain Amplifier (PGA)112Lowpass Filter113Transient Analysis115Performance Comparison117                                                                 |
|   | 6.3  | <ul> <li>6.2.3</li> <li>6.2.4</li> <li>6.2.5</li> <li>6.2.6</li> <li>Quadra</li> </ul> | Programmable Gain Amplifier (PGA)    112      Lowpass Filter    113      Transient Analysis    115      Performance Comparison    117      iture SAB Results    118 |

|    |        | 6.4.1 Ultrasound Signal Reconstruction              | . 125 |
|----|--------|-----------------------------------------------------|-------|
|    |        | 6.4.2 B-Mode Imaging                                | . 127 |
|    | 6.5    | Summary                                             | . 130 |
| 7  | Log    | Domain Demodulator                                  | 131   |
|    | 7.1    | Introduction                                        | . 131 |
|    | 7.2    | Current-Mode Analogue Demodulation                  | . 133 |
|    |        | 7.2.1 Pseudodifferential Demodulator Implementation | . 136 |
|    |        | 7.2.2 Biquadratic Implementation                    | . 140 |
|    |        | 7.2.3 Distortion and Noise Characteristics          | . 144 |
|    | 7.3    | Circuit Non-Idealities                              | . 146 |
|    | 7.4    | Simulated Performance                               | . 149 |
|    | 7.5    | Summary                                             | . 153 |
| 8  | Sun    | nmary and Future Work                               | 154   |
|    | 8.1    | Summary                                             | . 154 |
|    | 8.2    | Future Work                                         | . 157 |
|    | 8.3    | Conclusion                                          | . 159 |
| A  | Мог    | nte Carlo Analysis                                  | 160   |
| Re | eferen | nces                                                | 164   |
| Re | eferen | nces                                                | 164   |

# **List of Tables**

| 2.1 | Performance comparison of analogue delay implementations                                 | 15  |
|-----|------------------------------------------------------------------------------------------|-----|
| 3.1 | Comparison of different sampling kernels.                                                | 48  |
| 4.1 | Device utilisation summary on a Spartan-6 FPGA for $N_p = 16$ ,                          |     |
|     | frame rate = 7 <i>Hz</i> , pixel resolution = $64 \times 352$ and $i_{max} = 48$ angles. | 63  |
| 5.1 | Requirements specification for the analogue front-end                                    | 72  |
| 5.2 | Preamplifier device sizes.                                                               | 84  |
| 5.3 | Transistor sizes for the two-stage operational amplifier forming the                     |     |
|     | core of the PGA.                                                                         | 94  |
| 5.4 | Truth table defining the logical functionality of the unary decoder                      |     |
|     | shown in figure 5.12.                                                                    | 94  |
| 5.5 | Resistor and capacitor values for various lowpass filter bandwidths                      | 99  |
| 6.1 | Summary of performance for the active stages (preamplifier, PGA                          |     |
|     | and lowpass filter) in the analogue front-end                                            | 108 |
| 6.2 | Performance comparison for various ultrasound analogue front-ends.                       | 116 |
| 6.3 | Performance comparison for beamforming architectures targeting                           |     |
|     | various applications.                                                                    | 123 |

| 6.4 | Parameters for FRI compressive sensing experiments demonstrat- |
|-----|----------------------------------------------------------------|
|     | ing low-rate sampling and reconstruction                       |
| 7.1 | Simulated performance summary for the log domain demodulator   |
|     | circuit in figure 7.4                                          |

# **List of Figures**

| 2.1 | Generalised ultrasound system architecture with analogue beam-       |
|-----|----------------------------------------------------------------------|
|     | former (adapted from [17])                                           |
| 2.2 | Block diagrams of typical beamforming systems. (a) Analogue          |
|     | beamforming - reflected signals are delayed with analogue delay      |
|     | lines, and then summed and digitised to form scan lines. (b) Digital |
|     | beamforming - reflected signals are first amplified and then sam-    |
|     | pled. Digital delays are applied prior to summation in order to form |
|     | scanlines. (Adapted from [17])                                       |
| 2.3 | Subarray / microbeamforming. The beamforming process is split        |
|     | between the probe and ultrasound machine. Signals are pre-           |
|     | beamformed using subarrays of analogue/digital delay units which     |
|     | apply fine resolution time delays. This effectively reduces the the  |
|     | number of digital lines and transmission bandwidth. Course delays    |
|     | are applied in the digital back-end prior to summation. (Adapted     |
|     | from [34])                                                           |

- 2.5 Scatter spectrum showing the fundamental, second harmonic, subharmonic and ultraharmonic components (figure taken from [44]). 23

- 3.4 (a) Simulated noiseless stream of random Dirac impulses modulated at a carried frequency of 1.7 MHz. (b) Output after filtering signal with the 2<sup>nd</sup> order LPF and sampling the result (L = 5, F = 4). (c) Original versus reconstructed signal (2<sup>nd</sup> order LPF kernel). (d) Original versus reconstructed signal (4<sup>th</sup> order LPF kernel). 41

| 3.5  | Original verses reconstructed signal when using (a) a sinc filter (b)              |    |
|------|------------------------------------------------------------------------------------|----|
|      | a fourth order cascaded LPF. The output in (b) is shifted after re-                |    |
|      | construction by $5.5 \mu s$ to correct the time delay introduced by the            |    |
|      | filter                                                                             | 42 |
| 3.6  | An example of a Dirac pulse train with $L = 4$ plotted against the                 |    |
|      | reconstructed signal (without time shifting) when the oversampling                 |    |
|      | factor is 8                                                                        | 43 |
| 3.7  | Performance of various sampling kernels in the presence of noise:                  |    |
|      | (a) $L = 4$ Dirac pulses, (b) $L = 20$ Dirac pulses. In both cases, the            |    |
|      | oversampling factor is 4                                                           | 44 |
| 3.8  | Time and amplitude estimation errors for oversampling factors of                   |    |
|      | 1, 2, 4 and 8. In this case, a $2^{nd}$ order LPF is used as the sampling          |    |
|      | kernel                                                                             | 46 |
| 3.9  | Time and amplitude errors for various sampling kernels when $L =$                  |    |
|      | 4, $F = 8$ . Note how the error for the 4 <sup>th</sup> order cascade follows that |    |
|      | of the sinc kernel for low SNR values                                              | 47 |
| 3.10 | Original ultrasound signal used for testing the FRI CS sampling and                |    |
|      | reconstruction method.                                                             | 49 |
| 3.11 | Reconstructed I component versus original/ideal I component,                       |    |
|      | demonstrating the accuracy of the FRI CS reconstruction algorithm                  |    |
|      | (with $K = 20$ and $F = 4$ ) on real ultrasound data for various sam-              |    |
|      | pling kernels: (a) sum-of-sincs kernel (b) ideal sinc kernel (c) a                 |    |
|      | cascaded second order LPF (d) biquad filter                                        | 50 |

| 4.1 | Finite state machine block diagram for the digital beamforming al-                   |    |
|-----|--------------------------------------------------------------------------------------|----|
|     | gorithm.                                                                             | 53 |
| 4.2 | Receive synthetic aperture imaging protocol (adapted from [48]).                     | 54 |
| 4.3 | Illustration of a binary dynamic apodisation window, where grey                      |    |
|     | pixels hold a value of 1, and other pixels hold a value of 0. The                    |    |
|     | width of the window is a function of $z_k$ , and defines whether upcon-              |    |
|     | verted RF values are added or not added to the image sum at each                     |    |
|     | pixel location.                                                                      | 58 |
| 4.4 | Frame rate (left axis) and clock frequency (right axis) vs. the num-                 |    |
|     | ber of transmit positions $i_{max}$ . The number of parallel analogue                |    |
|     | channels, $N_a$ , defines the region of operation                                    | 62 |
| 4.5 | Layout of the beamforming ASIC, synthesized in Cadence using                         |    |
|     | AMS 0.18 $\mu m$ CMOS. The following parameters were used: $N_p =$                   |    |
|     | 1, frame rate = $4Hz$ , pixel resolution = $32 \times 352$ and $i_{max} = 8$ angles. | 66 |
| 5.1 | High-level block diagram showing the various subsystems consti-                      |    |
|     | tuting the analogue front-end (AFE). The AFE amplifies and de-                       |    |
|     | modulates ultrasound signals, which are then sampled externally                      |    |
|     | and processed by a digital beamformer.                                               | 71 |
| 5.2 | Ultrasound transceiver (taken from [8]) (a) Preamplifier core (b)                    |    |
|     | Variable gain amplifier.                                                             | 77 |
| 5.3 | Preamplifier with variable gain. The gain may is controlled by vary-                 |    |
|     | ing $V_c$ (the gate voltage of $M_5$ ).                                              | 79 |
| 5.4 | Small signal equivalent model of the differential pair and MOSFET                    |    |
|     | load                                                                                 | 80 |

| 5.5  | Modeling of noise sources present in the variable gain preamplifier.     | 81 |
|------|--------------------------------------------------------------------------|----|
| 5.6  | Theoretical and simulated plots of gain $(A_v)$ versus the control volt- |    |
|      | age $(V_c)$ . Both cases demonstrate a hyperbolic response, which        |    |
|      | closely approximates a linear-in-dB response for $V_c = 0 - 1V$          | 84 |
| 5.7  | Gilbert cell multipliers (a) Single-balanced version (b) double bal-     |    |
|      | anced version.                                                           | 87 |
| 5.8  | Passive MOS Ring Mixer                                                   | 88 |
| 5.9  | Simulated output power versus input power for the passive mixer.         | 91 |
| 5.10 | High level schematic of the programmable gain amplifier (PGA).           |    |
|      | The gain of the amplifier is adjusted by switching between series        |    |
|      | combinations of resistors $R_{1a}$ , $R_{1b}$ and $R_{1c}$               | 92 |
| 5.11 | Schematic of the PGA core: a classic two-stage differential ampli-       |    |
|      | fier with Miller-compensation.                                           | 93 |
| 5.12 | Unary decoder used to program the gain of the PGA                        | 94 |
| 5.13 | Bode diagram illustrating the effect of altering $Z = (R_1 + R_2)/R_1$ . | 96 |
| 5.14 | Frequency response of the PGA for three gain settings $(14 dB, 20 dB)$   |    |
|      | and $26 dB$ ).                                                           | 97 |
| 5.15 | Fully differential active RC lowpass filter topology used in the AFE.    | 98 |
| 5.16 | Frequency response of the LPF for three bandwidth settings               |    |
|      | (200 kHz, 500 kHz and 1.25 MHz)                                          | 99 |
| 5.17 | Central biasing circuitry with LDO regulator feedback loop used to       |    |
|      | general biasing current $I_o$ and common mode voltage $V_{CM}$ 1         | 00 |
| 5.18 | ESD protection diodes used in IO pads                                    | 02 |

| 5.19 | The ultrasound AFE layout designed using Cadence Layout Editor        |
|------|-----------------------------------------------------------------------|
|      | in AMS $0.35 \mu m$ technology                                        |
| 6.1  | Block diagram representing the SAB receiver experimental setup,       |
|      | which illustrates the relationship between the AFE, PCB compo-        |
|      | nents and external devices                                            |
| 6.2  | Photograph of the PCB used for testing the AFE and beamforming        |
|      | algorithm on FPGA: (1) AFE (2) Spartan-6 on EFM-02 develop-           |
|      | ment board (3) UART FT232 chip USB connector (4) ADC10D020            |
|      | Dual-Channel ADC. (5) ADM7155 voltage regulators 107                  |
| 6.3  | Preamplifier experimental results: (a) Gain versus control voltage    |
|      | $(V_c)$ for the preamplifier in 11 different chips. Time-gain control |
|      | is implemented by sweeping the control voltage linearly over time,    |
|      | yielding a quasi-exponential gain response. (b) Input referred noise  |
|      | spectrum. (c) Total harmonic distortion versus the input voltage 110  |
| 6.4  | Output power versus input power at $2.5MHz$ . There is a linear       |
|      | relationship between the input and output for the fundamental, up     |
|      | until the $1 dB$ compression point. The measured results also demon-  |
|      | strate third order intermodulation distortion and the extrapolated    |
|      | IP3 point                                                             |
| 6.5  | Frequency response of the lowpass filter for three bandwidth set-     |
|      | tings: $f_c = 195  kHz$ , $f_c = 510  kHz$ and $f_c = 1.85  MHz$      |
| 6.6  | Transient plots of the I/Q envelop signals from 11 chips overlayed    |
|      | against the original RF signal                                        |
|      |                                                                       |

| 6.7  | Images of a phantom containing $8 \times 3$ cross-sectional wires. In (a)                                                                    |
|------|----------------------------------------------------------------------------------------------------------------------------------------------|
|      | and (b), quadrature beamforming is carried out with $i_{max} = 8$ and 48                                                                     |
|      | transmit elements respectively ( $f #= 2.5, N_a = 1$ ). In (c) beamform-                                                                     |
|      | ing is carried out in the RF domain with 48 elements ( $f \# = 2.5$ ).                                                                       |
|      |                                                                                                                                              |
| 6.8  | Images of a phantom containing a hyperechoic cycst. In (a), beam-                                                                            |
|      | forming is carried out in the RF domain with 48 transmissions and                                                                            |
|      | F# = 2.5. In (b)-(c), quadrature beamforming is carried out with                                                                             |
|      | 48 and 16 transmissions respectively ( $F # = 2.5$ )                                                                                         |
| 6.9  | Lateral beamplots for 3, 8 and 48 transmitter positions ( $f \# = 2.5$ ,                                                                     |
|      | $z = 665  mm). \qquad \dots \qquad $ |
| 6.10 | Lateral beamplots for $f \# = 0.5$ , 2 and 3 ( $z = 665 mm$ , $i_{max} = 48$ ) 122                                                           |
| 6.11 | Contrast relative to $-50 dB$ for various $f$ # values ( $z = 665 mm$ ,                                                                      |
|      | $i_{max} = 48)124$                                                                                                                           |
| 6.12 | In (a), the original RF signal is overlayed against the ideal I/Q en-                                                                        |
|      | velop generated in software. Low-rate samples are obtained using                                                                             |
|      | the hardware front-end and the I/Q envelop is reconstructed using                                                                            |
|      | FRI CS with the following parameters: (b) $L = 7$ (c) $L = 17$ (d)                                                                           |
|      | L = 60.                                                                                                                                      |
| 6.13 | Lateral beamplots ( $i_{max} = 48$ , $f # = 2.5$ , $z = 66.5 mm$ ) demonstrat-                                                               |
|      | ing the effect of $L$ on the lateral resolution and magnitude of the                                                                         |
|      | main lobe                                                                                                                                    |

| 6.14 | Images of a phantom containing $8 \times 3$ cross-sectional wires. Com-                       |
|------|-----------------------------------------------------------------------------------------------|
|      | pressive SAB was carried out with 48 transmit elements ( $F # = 2.5$ ),                       |
|      | and (a) $L = 7$ and (b) $L = 17$ . In (c) beamforming is carried out in                       |
|      | the RF domain with 48 elements                                                                |
| 7.1  | Geometric mean current splitter                                                               |
| 7.2  | High level architecture of class AB demodulator / CS kernel 136                               |
| 7.3  | Log-domain demodulator circuit which multiplies currents $I_1^v$ , $I_2^v$                    |
|      | and $I_1^L$ , $I_2^L$ , sums their products and filters the result                            |
| 7.4  | Log domain demodulator circuit showing the current multiplier,                                |
|      | second order companding filters and current sink. The currents $I_{1,2}^L$                    |
|      | and $I_{1,2}^H$ are derived from two current splitters                                        |
| 7.5  | Biquadratic log-domain demodulator circuit. The circuit multiplies                            |
|      | currents $I_1^v$ , $I_2^v$ and $I_1^L$ , $I_2^L$ , sums their products and filters the result |
|      | by means of a biquadratic lowpass filter. Note that $\omega_o$ and $Q$ may                    |
|      | be adjusted independently using the currents $I_{o1}$ , $I_{d2}$ and $I_{o2}$ 143             |
| 7.6  | (a) $\beta$ -compensation using an NMOS device to replace the diode                           |
|      | connection. (b) The effect of $\beta$ -compensation on the magnitude of                       |
|      | the transfer function. The ideal response has a DC gain of $0 dB$ 146                         |
| 7.7  | Bournelli cell with non-idealities (adapted from [104, 105]) 148                              |

- (a) Addition of a trimming current to address current mismatches.
  (b) The effect of trimming on the transfer function of a second order cascaded lowpass filter with an ideal cutoff frequency f<sub>c</sub> = 387kHz.
  Without trimming, I<sub>d</sub> = 1.265 μA and f<sub>c</sub> = 304kHz. By trimming I<sub>d</sub> to 1.75 μA, the cutoff frequency f<sub>c</sub> tends towards 387kHz. With the other parameters fixed, the gain decreases as I<sub>d</sub> increases. . . . 148
- 7.9 Transient analysis of the log-domain demodulator circuit. A simple A-line signal is demodulated into I/Q components in order to form an envelop. (a) Demodulated I component (b) Demodulated Q component (c) RF input versus simulated envelop (d) Simulated envelop (generated using Cadence) versus the ideal envelop (generated using Matlab).

A.3 Monte carlo simulation results for the lowpass filter and central bias. (a) Differential gain (b) Common-mode rejection ratio (c) 3*dB* bandwidth (d) Filter core amplifier tail current (e) Central bias reference current (f) Central bias common mode voltage. . . . . . . 163

# Chapter 1

# Introduction

#### **1.1 Background and Problem Statement**

Medical ultrasound imaging has been used extensively as a diagnostic tool for over four decades. The popularity of ultrasound is largely due to its affordability in comparison to other modalities (e.g. X-ray, CT, MRI), and the fact that it offers reasonable imaging resolution while being harmless to human health. Advances in transducer technology, beamforming algorithms and electronics have paved the way for portable systems that are increasingly powerful and versatile. A large number of commercial, hand-held devices already exist on the market, such as the GE VScan [1] and Phillips Lumify [2]. Devices like this offer significant improvements in portability, with real-time, B-mode imaging and doppler flow capabilities. Such devices are also more affordable than larger, bed-side devices, making ultrasound imaging more accessible in low-resource clinical settings.

These developments indicate a trend towards highly integrated ultrasound image systems. Recent years have seen a proliferation of research focusing on the development of novel beamforming strategies, as well as integrated solutions such as single-FPGA beamformers [3], mixed-signal beamformers [4, 5, 6, 7], and ultrasound system-on-chips (SoCs) [8, 9]. Commercially, the general trend is away from dedicated beamformer ASICs and DSPs to FPGAs, which offer greater flexibility and scalability [10].

Despite these advances, there is still much scope for the development of further miniaturised systems. The objective is to reduce the system complexity and cost without significantly degrading imaging quality. This opens up novel application areas. For example, small-scale, wireless systems have been conceptualised, such as capsule endoscopes [11, 12], implantable ultrasound devices [13] and wearable ultrasound devices [14]. However, the translation of these ideas into practical hardware is exceedingly difficult, and little progress has been made beyond the development of hand-held systems. Miniaturising an ultrasound imaging device is a difficult challenge because of the multidimensional tradeoffs inherent in the design. Area and power consumption are major constraints, particularly with a large number of channels (modern systems have upwards of 128 channels). Since ultrasound imaging utilises high frequency signals in the megahertz range, the data rate after sampling is excessive, and the data transmission bandwidth (or cabling requirement) becomes a major limitation.

In this work, two architectural frameworks are proposed for small-scale ultrasound imaging systems. The objective is to aggressively reduce the size, power and cost of the device. These solutions are validated using simulations and are then tested in hardware. B-mode imaging results were obtained in order to compare the performance of both approaches.

#### **1.2 Aims and Objectives**

The following aims and objectives were identified:

- Identify system-level hardware architectures for miniaturised ultrasound imaging.
  - Review literature addressing efficient beamforming approaches and hardware architectures for B-mode ultrasound imaging.
  - Specify system-level and circuit-level requirements.
  - Complete system-level simulations to validate the chosen architectures in light of requirements.
- Implement the selected architectures in hardware.
  - Implement an efficient analogue front-end (AFE) to interface with a piezoelectric transducer.
  - Implement a digital beamformer on FPGA/ASIC.
  - Integration analogue and digital components on a custom circuit board.
- Obtain measured hardware results to validate the functionaly of individual analogue/digital components.
- Obtain system-level imaging results to quantify image quality and system performance.

#### **1.3** Contributions

#### Architectural and Algorithm Contributions

At a system level, two architectural frameworks are proposed. The novelty of these architectures lies in the combination of synthetic aperture beamforming (SAB) and two efficient sampling techniques - quadrature sampling and sub-Nyquist sampling (i.e. compressed sensing). SAB is used to aggressively reduce system complexity as only a single channel (or group of channels) is required in the receiver. Because signals are processed sequentially, frame rate is necessarily traded off. The follow architectural contributions are made:

- Quadrature SAB. This technique aims at processing I/Q signals in the baseband using a digital beamformer in the hardware front-end. This effectively "compresses" data through the formation of a B-mode image, thereby easing the constraints on the transmission link as less data must be transmitted from the receiver to the display. A novel beamforming algorithm was implemented in RTL and tested using an FPGA. The algorithm was also synthesised in 0.18 μm CMOS with a view to creating an integrated ultrasound receiver system-on-chip (SoC) in the future. This is the first time a complete SAB algorithm has been implemented in hardware for *real-time* operation. Future completion of an integrated SoC would allow for unprecedented miniaturisation in order to target new application areas such as capsule endoscopy or wearable ultrasound.
- 2. **FRI Compressive SAB**. This technique combines SAB with compressive sensing within the finite rate of innovation (FRI) framework. Demodulated

signals are constrained in bandwidth and sampled below the Nyquist frequency in order to reduce the data rate to the digital processor. I/Q components are reconstructed in software and processed sequentially by the beamformer to form an image. Again, this is the first time that compressive sensing has been applied to sequential processing of ultrasound signals in a synthetic aperture beamformer.

#### **Circuit level Contributions**

The following circuit-level contributions are made:

- 1. Fully differential analogue front-end (AFE). The proposed AFE functions as a voltage domain amplifier/demodulator, and is comprised of a variable gain low-noise amplifier (VG-LNA), mixer, programmable gain amplifier (PGA) and lowpass filter. A novel VG-LNA circuit topology is proposed, with a quasi-exponential gain response that is used for time-gain control (TGC). The circuit was implemented in  $0.35 \,\mu m$  CMOS and was fabricated and physically tested.
- 2. Current-mode, log-domain demodulator. A novel pseudodifferential, logdomain topology was adapted from the "multer" topology in [15]. The proposed circuit offers a 20 dB improvement in dynamic range over the voltage-mode demodulator and is electronically tunable using the bias currents. The circuit was implemented in  $0.35 \mu m$  BiCMOS rather than subthreshold CMOS to allow for high frequency operation, and was validated using simulations in Cadence.

#### **1.4 Thesis Organisation**

This thesis is organised into the following chapters:

Chapter 2 introduces fundamental concepts relating to ultrasound imaging. A generalised system-level architecture is discussed in the context of both analogue and digital beamforming strategies. Particular emphasis is given to SAB as this technique forms the basis for the architectural frameworks proposed in this work.

Chapter 3 begins with an analysis of existing ultrasound architectures specifically targeting portable, small-scale applications. Two novel architectures are then proposed and preliminary simulation results are presented. Architecture 1 (quadrature SAB) is presented in section 3.2, and architecture 2 (compressive SAB) is presented in section 3.3. These architectural frameworks form the basis for the rest of the work presented in the following chapters.

Chapter 4 describes implementation details for the digital SAB algorithm. A high level description of the algorithm is presented in section 4.1, and various tradeoffs inherent in the design are discussed in section 4.2. Lastly, both FPGA and ASIC implementations are presented in sections 4.3 and 4.4 respectively.

Chapter 5 presents the design of the analogue front-end (AFE), which functions as an I/Q demodulator. The chapter first discusses the system-level requirements for each component in the AFE, and then proceeds with a detailed analysis of each component. These include the VG-LNA/preamplifier (section 5.2), mixer (section 5.3), programmable gain amplifier (section 5.4), lowpass filter (section 5.5), and biasing circuitry (section 5.6). Finally, the layout of the design is presented in section 5.7. Chapter 6 begins with a description of the integrated experimental setup for testing the AFE and digital beamformer, as well as the entire system. The performance of the AFE is quantified in section 6.2. In sections 6.3 and 6.4, imaging results are presented for each architecture, and a comparison is made between the two.

Chapter 7 presents an alternative, log-domain circuit topology that performs demodulation in the current domain rather than the voltage domain. The circuit is analysed in section 7.2, and circuit non-idealities are discussed in section 7.3. While the circuit was not fabricated, extensive simulations were carried out to validate its functionality (section 7.4).

The thesis is summarised in chapter 8 and recommendations are made for future work.

# Chapter 2

# **Ultrasound Fundamentals**

In this chapter, the basic principles of ultrasound imaging are introduced. Section 2.1 begins with a brief review of fundamental principles, imaging modes and a description of a generalised ultrasound system. Various beamforming methods and hardware topologies are discussed in 2.2, with particular emphasis upon the synthetic aperture method. Advanced ultrasound techniques are also introduced, including second harmonic imaging and multi-frequency beamforming. The fundamental concepts outlined in this chapter lay the theoretical foundation underpinning the architectural solutions in chapter 3.

#### 2.1 Ultrasound Imaging Basics

Ultrasound imaging is a ubiquitous imaging modality used extensively in medical diagnostics. High frequency ultrasound waves (usually in the megahertz range) are generated using an ultrasound transducer. These waves propagate into the tissue at the speed of sound,  $c = 1540 m s^{-1}$  [16], reflecting off tissues interfaces with

variable reflection coefficients. As the ultrasound wave propagates into the tissue, it attenuates exponentially due to absorption, scattering and conversion of acoustic energy into heat. The attenuation of ultrasound waves in tissue may be expressed as [17]:

$$p(z) = p_0 e^{-a(f)z}$$
(2.1)

where  $p_0$  is the initial pressure amplitude, a(f) is the attenuation coefficient, and z is the depth. The attenuation rate is approximately 0.5 dB/MHz/cm in soft tissue [18]. For this reason, time-gain compensation (TCG) is usually applied to compensate for signal attenuation as a function of depth [17, 16].

Ultrasound systems use various types of "modes" depending on the diagnostic application. For instance, amplitude-mode (**A-mode**) scanners employ a single transducer to transmit and receive. The signal amplitude is plotted as a function of depth (time delay). Brightness-mode (**B-mode**) imaging relies upon a pulse-echo approach, where a 1D ultrasound array is used to generate ultrasound pulses and to receive reflections. Modern systems apply both transmit and receive beamforming to form scan/beam lines, as discussed in section 2.2. Scan lines are combined to form a cross-sectional, 2D greyscale image of the medium. The amplitude of the scan line corresponds to the brightness of the image; thus, the image mode is called brightness-mode. The axial and lateral resolution of a B-mode imaging device is proportional to the wavelength of the ultrasound wave [17]. Therefore, higher frequencies are used to achieve a higher resolution. However, according to 2.1, this limits the imaging depth due to tissue attenuation, indicating a tradeoff between imaging resolution and depth.



Figure 2.1: Generalised ultrasound system architecture with analogue beamformer (adapted from [17]).

In motion-mode (**M-mode**), pulses are emitted in quick succession to form either A-mode scans or B-mode images. The image is updated continuously at the pulse-repetition frequency (PRF). The pulse repetition period (PRP), T, must be long enough to allow echoes to propagate back to the receiver [17, 16]. For example, for an imaging depth of D = 10 cm, the minimum PRP is  $T = 2D/c = 130 \mu s$ . The acquisition rate fundamentally limits the system frame rate, depending on what beamforming algorithm is used.

Figure 2.1 shows a block diagram of a generalised B-mode ultrasound system [17, 16]. In the top pathway, a transmit beamformer produces time-delayed excitation pulses which are driven through transmit/receive switches to the transducer elements. On the receiver end, reflected ultrasound signals from each channel are amplified using a low-noise amplifier (LNA) and variable-gain amplifier (VGA). The VGA provides a time-varying gain (time gain control) to account for attenuation of the signal as a function of depth. An analogue beamformer then focuses and steers the received ultrasound beam. The result is sampled and processed by a control host which carries out image processing to yield an image. The host processor may also perform beamforming if a digital approach is preferred. The processor controls the entire system to function in the correct modes and provides a control interface to the front-end electronics [17, 16].

#### 2.2 Beamforming

Beamforming is one of the most important functions of an ultrasound system. A beamformer imparts directivity to the transducer, enhancing its gain, and defines a focal point within the imaged medium [16]. Beamforming is done by applying precisely timed delays to the transmitted and received signals for each element in the ultrasound array in order to steer and focus the ultrasound beam at an angle. After the delay is applied, the signals are then summed to form a scan line. In phased array systems, the time delay to a point  $P = (R, \theta)$  may be calculated by dividing the distance to/from the imaging point by the speed of sound in the medium, *c*. In polar coordinates, *R* is the range and  $\theta$  the steering angle. The time delay has been derived in [19]:

$$\Delta t_n = \frac{R}{c} \left[ \sqrt{1 + \left(\frac{nd}{R}\right)^2 - 2\left(\frac{nd}{R}\right)\sin\theta} - 1 \right] + t_0$$
(2.2)

where  $\triangle t_n$  is the steering plus focusing time delay for the  $n^{th}$  transducer element, *c* is the speed of sound in the medium, *d* is the element spacing (pitch), and  $t_0$  is a constant time period large enough to avoid negative time delays.

In general, beamforming may be carried out using an analogue or digital approach, as illustrated in figure 2.2. Numerous analogue/digital beamforming strategies have been proposed to balance the multidimensional tradeoff between power, data bandwidth, area, image quality and size. The following section presents a brief overview of prior art.


Figure 2.2: Block diagrams of typical beamforming systems. (a) Analogue beamforming - reflected signals are delayed with analogue delay lines, and then summed and digitised to form scan lines. (b) Digital beamforming - reflected signals are first amplified and then sampled. Digital delays are applied prior to summation in order to form scanlines. (Adapted from [17]).

### 2.2.1 Overview of Beamforming Architectures

Large-scale commercial ultrasound machines are generally not critically constrained in terms of power consumption and area. Paralellised RF data is processed using an analogue front-end, and dynamic beamforming is carried out using a separate, high-speed digital processor (e.g. DSP, FPGA or ASIC). Processing is typically carried out on a large number of parallel channels. Images are formed by sampling analogue signals, applying digital delays, and then summing digitally [17]. The popularity of digital beamformers has grown since the 1990s with advances in high speed A/D converters and the dramatic rise in gate counts in ASICs and FPGAs [20].

In systems with many channels, the processing requirements for both the frontend and backend are immense, so it is not practical or cost effective to build both functions into a single device. A multi-chip solution is typically employed, with high bandwidth requirements between the components. The backend is typically implemented in few components, whereas the frontend is implemented in many, often one per eight channels. The trend to manage the complexity and space constraints of a portable ultrasound system design is a move away from dedicated beamformer ASICs and DSPs to FPGAs [10].

Numerous techniques have been proposed to alleviate the high bandwidth constraint on the transmission line between the front-end and back-end. Insoo *et al.* have suggested implementing analogue multiplexing with a single ADC [17, 8]. This reduces the power consumption, but also the frame rate as the shared ADC architecture must perform 16 iterative operations for each scan. Low-power, singlebit oversampling delta-sigma ADCs have also been proposed as an alternative to multi-bit ADCs [21, 22, 23]. Oversampling at high frequencies removes the necessity for complicated fine delay generation methods. However, this method is only suitable for low bandwidth (< 5 MHz) applications due to the need for high oversampling rates [5]. Clearly, power consumption is a critical issue in digital beamformers, particularly those utilising one ADC per channel. For instance, a typical 40 MSPS commercial ADC such as the ADS5121 consumes 62.5 mW per channel. With full parallelisation, the combined sampling rate and transmission line bandwidth becomes excessive.

Various analogue **partial beamforming** architectures have been proposed to address this problem. Analogue partial beamforming topologies apply delays in the analogue domain using analogue delay lines. The delayed signals are summed and digitised to form a scan line [17], thereby reducing the number of digital channels. Continuous-time analogue delays can be generated using cascaded low-pass (LP) or all-pass (AP) unit-delay cells. The input signal is connected to an arbitrary input tap on the delay line. The line is composed of *M* cascaded low input impedance filters. A comparison of different filter cell implementations is shown in table 2.1. To date, the most power-efficient delay architecture with the largest bandwidth (150MHz)is demonstrated in [5], where 1.7 - 2.5 ns delay cells are used in the delay chain. This topology employs all-pass analogue delay cells to carry out dynamic receive beamforming with an annular transducer array. The main disadvantage is that analogue delay lines tend to be bulky, power hungry and prone to phase errors without proper matching. The number of cells required increases rapidly with the number of channels (proportional to  $N^2$ ), resulting in excessive power requirements and pulse distortion for a practical array [24].

A digital partial beamforming topology is proposed in [32, 33], where RF signals are processed in parallel, and beamforming is carried out in the digital domain using a digital delay line and adder, leading to a reduction in the overall data rate.

|        | Process | Supply | Unit Delay | Unit      | Filter Type                | Bandwidth    |
|--------|---------|--------|------------|-----------|----------------------------|--------------|
|        |         | (V)    | Power      | Delay     |                            | (MHz)        |
|        |         |        | (mW)       | (ns)      |                            |              |
| [25]   | 0.35 µm | ±3     | 12         | 0.86-2.4  | 1 <sup>st</sup> order AP   | 75 for 20 ns |
|        | CMOS    |        |            |           |                            |              |
| [26]   | 0.7 µm  | 5      | 7.5        | 1.67      | 1 <sup>st</sup> order LP + | 50 for 20 ns |
|        | BiCMOS  |        |            |           | 2 <sup>nd</sup> order LP   |              |
| [27]   | 0.5 µm  | 3      | 13         | 1.65      | 2 <sup>nd</sup> order LP   | 140 for      |
|        | CMOS    |        |            |           |                            | 20 ns        |
| [28]   | 0.35 µm | ±1.5   | 16.8       | 0.42-0.59 | 3 <sup>rd</sup> order LP   | 70 for 5 ns  |
|        | CMOS    |        |            |           |                            |              |
| [29]   | 0.5 µm  | 5      | N/A        | 4.0-6.0   | 4 <sup>th</sup> order LP   | 100 for      |
|        | BiCMOS  |        |            |           |                            | 26 ns        |
| [30]   | 0.35 µm | 3.3    | 10.9       | 1-5       | 2 <sup>nd</sup> order AP   | 50 for 10 ns |
|        | SiGe    |        |            |           |                            |              |
| [4, 5] | 0.35 µm | 3.3    | 2.1        | 1.7-2.5   | 1 <sup>st</sup> order AP   | 150 for      |
|        | CMOS    |        |            |           |                            | 35 ns        |
| [31]   | 60 GHz  | 2.5    | 3.5        | 57.3      | Log-domain                 | 20 for 60 ns |
|        | SiGe    |        |            |           | 1 <sup>st</sup> order AP   |              |
|        | BiCMOS  |        |            |           |                            |              |

Table 2.1: Performance comparison of analogue delay implementations.

Scan lines are formed sequentially, and full reconstruction is carried out off-chip. However, parallel ADCs are still required, and the wireless data rate (280 Mbit/s) is larger than the capacity of a wireless link (5 - 10 Mbit/s), thus necessitating the use of memory to buffer the data.

**Sub-array** or **"microbeamforming"** techniques have also been proposed, whereby signals are pre-beamformed using subarrays of analogue/digital delay units which apply fine resolution time delays. A generalised sub-array beamformer is illustrated in the block diagram in figure 2.3. This approach reduces the digital channel count, the power consumption of the front-end, and ultimately the computational burden on the back-end. There are numerous examples of commercial



Figure 2.3: Subarray / microbeamforming. The beamforming process is split between the probe and ultrasound machine. Signals are pre-beamformed using subarrays of analogue/digital delay units which apply fine resolution time delays. This effectively reduces the the number of digital lines and transmission bandwidth. Course delays are applied in the digital back-end prior to summation. (Adapted from [34]).

systems employing this technique, e.g. Philips X4-1/X3-1 and X7-2t TEE; Siemens 4Z1C; GE 3V probe. Sub-array beamformers have been implemented using analogue "memory", or switched capacitor networks, and have been applied to both 2D and 3D imaging systems [6, 7]. A bank of capacitors and switches is used to store sequential signal samples, which are read out after precisely controlled time delays. Chen *et al.* employ programmable analogue subarrays to partially beamform the data [7]. This minimises the data bandwidth and cabling requirements, but does not address the area and complexity problem associated with many parallel channels. Wygant *et al.* multiplex through the entire array using a single channel, and beamform using the synthetic aperture method [35]. This effectively reduces hardware complexity, but limits frame rate due to the finite reflection period and aquisition

time. Chen *et al.* attempt to alleviate this problem by multiplexing sequentially through multiple sub-arrays, striking a balance between complexity and acquisition rate [36]. The drawback of this approach method is that the digital control signals have the potential of producing spectral components within the ultrasound passband, producing undesired stationary "pattern" noise on the final image. To maintain acceptable side-lobe levels, the delay resolution must be at least in the order of 1/20th of the wavelength of the fundamental [5], meaning that the sampling frequency should be very high. However, the obvious advantage is that time delays can be very precisely controlled.

**Pseudo-dynamic, extended aperture (EA) beamforming** is another technique proposed in [3] aimed at reducing hardware complexity without significantly reducing the frame rate. Focused delays are calculated at a predetermined number of focal points depending on the number of focal zones. An extended aperture of 32 elements is formed by performing two transmissions and sequentially receiving using 16 channels, halving the frame rate but also the area.

### 2.2.2 Synthetic Aperture Beamforming (SAB)

Synthetic aperture beamforming (SAB) was originally proposed in the 1950s for radar systems for high resolution imaging of the terrain. Since the 1980s, extensive research has focused on applying SAB to ultrasound imaging as a means of reducing system complexity, and has been recommended for use in systems where size and cost are severe limitations [37, 38]. Synthetic aperture methods are very effective for special applications such as intravascular imaging, where both system

and probe simplicity are mandatory [37, 39]. Several synthetic aperture techniques have been proposed:

- Synthetic aperture focusing technique (SAFT) is the simplest SAB method whereby a single array element transmits and receives at a time. As figure 2.4(a) shows, all the elements are excited sequentially one after the other, and the echoes received are recorded and stored in memory [40, 41]. This data can be used for making a series of low resolution images, which, when combined, form a higher resolution image. SAFT significantly reduces system complexity, because only a single hardware channel is required. However, the technique usually requires data storage (multiple RF scans must be buffered prior to beam forming), and significant computational resources [37]. This problem can be alleviated through the use of sparse transmit arrays [41], high speed DSPs or FPGAs, or by forming the image dynamically without storing the entire RF dataset for each image frame. The other problem is that SAFT generally yields poor SNR, i.e. 10log*M*, where *M* is the number of subapertures used. To overcome low SNR, a multi-element synthetic aperture focusing (M-SAF) method has been proposed [37], as discussed below.
- **Multi-element synthetic aperture focusing** (M-SAF) is an adaptation of SAFT [37]. This method uses a defocused active transmit subaperture (parabolic defocusing lens) to emulate a high power, single element transmitter. A small subaperture size (less than 12) should be utilised to emulate the transmit response of a single element. The defocusing lens may be realised

using the following delays [37]:

$$\tau_n = \frac{1}{c} \frac{x_n^2}{2z_d} \tag{2.3}$$

where  $x_n$  is the distance of the  $n^{th}$  element from the subaperture center,  $z_d$  is the distance of the "defocal" point from the subaperture, and c is the sound velocity  $(1540 m s^{-1})$ . The lateral spread of the defocused beam is inversely proportional to the defocal length. The beam angle can be approximated by  $2 \arctan\left(\frac{K_t d}{2z_d}\right)$ , where d is the inter-element spacing, and  $K_t$  is the number of elements in the active transmit subaperture [37]. Echoes are recorded by receive subapertures that are stepped across the array. The acoustic power and SNR are increased compared to SAFT, but the method generally requires memory for data recordings. Because this method requires multiple transmission firings to form an image, the system is also susceptible to motion artifacts.

- Synthetic receive aperture (SRA) [42, 41] was proposed to improve lateral resolution. As figure 2.4 shows, a large transmit aperture is used together with multiple smaller receive apertures. This drastically reduces the number of parallel receive channels.
- Synthetic transmit aperture (STA) splits the transmit aperture into multiple smaller subapertures [41]. At each firing step, one subaperture transmits a pulse and all elements receive the echo signals. This increases the frame rate significantly compared to the conventional phased array method, making it suitable for real-time 3D imaging and Doppler flow imaging [43].



Figure 2.4: Synthetic aperture imaging method (taken from [41]). (a) The classical Synthetic Aperture Focusing Technique (SAFT) with N firing/transmission steps. (b) The Synthetic Receive Aperture (SRA) method with  $N_S$  firing/transmission steps.

### 2.2.3 Frame Rate

Frame rate is a significant constraint on all beamforming systems. In conventional phased array systems, the frame rate is simply a function of the number of beam/s-can lines per image frame. In order the achieve a frame rate of  $F_{FR}$ , the number of beams per frame is simply:

$$M = F_{PRF} / F_{FR} \tag{2.4}$$

where  $F_{PRF}$  is the pulse repetition frequency. However, the frame rate decreases when *composite focusing* is used. This technique uses multiple focal points to increase the image resolution and depth of field. Multiple beams with different focal points are transmitted one by one. The frame rate naturally decreases with the number of focal zones, indicating that there is a tradeoff between image quality and frame rate. Dynamic focusing is an alternative method whereby focusing is carried out on reception (not transmission), so the depth of field can be extended without a reduction in frame rate. In synthetic aperture imaging, the frame rate depends upon the specific technique that is used. The STA method achieves a higher frame rate than conventional phased array imaging using fewer firing steps and receiving over the whole aperture. However, when a single element or sub-array of elements are used, multiple transmissions are required to receive over the entire aperture. The frame rate is reduced, but with a corresponding reduction in system complexity.

### 2.2.4 Spatial Compounding

Laterl/axial resolution and signal-to-noise ratio (SNR) should be carefully considered when designing an ultrasound imaging system. Spatial compounding is used to increase the SNR - i.e., averaging multiple images of the same field of view from different directions/angles [16]. In traditional phased array beamforming, this is done by steering the beam in various directions, and then summing/averaging the resultant images. In synthetic aperture imaging, multiple transmission positions or angles are used. Because the image contains the same spatial information for different angles / positions, the SNR will improve by [16]:

$$SNR_{compound} = SNR_0.\sqrt{M}$$
 (2.5)

where M is the number of compounded images, and  $SNR_0$  and  $SNR_{compound}$  are the signal-to-noise ratios for single and compounded images respectively. Naturally, M must be maximised to increase the SNR. However, this leads to a reduction in the frame rate, as additional time must be spend on acquiring reflections and carrying out beamforming operations. Thus, there is an inherent tradeoff between M or SNR and the overall frame rate and power consumption of the system.

### 2.2.5 Second Harmonic / Multi-Frequency Imaging

Most modern ultrasound scanners use a technique called *second harmonic imaging*. This technique is based on non-linear interaction of ultrasound waves with tissue, which results in the generation of new waves whose frequency is an multiple of the fundamental [16]. That is, if the fundamental frequency is  $f_o$ , the waves generated due to non-linearity will have frequencies given by:

$$f' = N.f_o \tag{2.6}$$

An example of a typical spectrum from a reflected wave with a center frequency at 3.5MHz is shown in figure 2.5. The second harmonic (N = 2) may be filtered out and used to form a high harmonic image. The amplitude of the second harmonic is lower than the fundamental, leading to lower SNR. However, because the contrast is based on non-linearity in the tissue, this technique provides added value for diagnosis [16]. Second harmonic imaging is easily implemented using a bandpass or high pass filter to extract the band of frequencies centered around the second harmonic. Often, the second harmonic image is calculated consecutively or in parallel with the first harmonic image. The second harmonic can also be extracted using I/Q demodulation - i.e. mixing the signal with a reference signal at  $N.f_o$ , and filtering the baseband signal [16]. In multi-frequency imaging, the same principle is extended to include other spectral bands as well. For instance, two images may be constructed using high-frequency and low-frequency bands. As discussed, scattering of the ultrasound waves depends on both wavelength and tissue properties. Thus, there will be a discrepancy between the two resultant images, which may be used for tissues characterisation [16].



Figure 2.5: Scatter spectrum showing the fundamental, second harmonic, subharmonic and ultraharmonic components (figure taken from [44]).

### 2.3 Summary

This chapter introduces basic concepts relating to ultrasound imaging. Fundamental imaging principles are reviewed and a generalised ultrasound system architecture is described. Particular emphasis is given to beamforming, the process of focusing ultrasound signals in transmission or reception in order to form an image. Various analog and digital beamforming architectures exist, including phased array beamformers and synthetic aperture beamformers. Synthetic aperture techniques are discussed in detail, including the synthetic aperture focusing technique, multi-element

synthetic aperture focusing and synthetic receive/transmit aperture imaging. These techniques are adapted in the present work for use in small-scale systems. Finally, we considered imaging parameters affecting performance and techniques used to improve image quality - i.e. composite focusing, spatial compounding and harmonic imaging. The principles and techniques introduced in this chapter lay the foundation for the architectures presented in the following chapter.

## **Chapter 3**

# Synthetic Aperture Imaging Architectures

In this chapter, two system-level architectures are proposed - quadrature synthetic aperture beamforming, and compressive sythetic aperture beamforming. These architectures are introduced in section 3.1. In sections 3.2 and 3.3, both architectures are discussed. Extensive simulation results for the second technique (FRI compressive sensing) are presented.

### 3.1 Proposed Beamforming Architectures

Small-scale, portable ultrasound systems are severely constrained in power, data bandwidth, area and size. Large-scale parallelisation is not suitable due to the excessive power consumption of parallel ADCs. Partial beamforming architectures present a significant improvement in terms of reducing transmission line bandwidth, but do not address the paralellisation/area/complexity problem aggressively enough



Figure 3.1: Synthetic Aperture Beamforming Architectures. (a) FRI compressive beamforming architecture. Signals are demodulated by the analogue front-end. The bandwidth of the lowpass filter is reduced below the I/Q Nyquist cutoff frequency, and low rate samples (quasi-I/Q) are then transmitted to a computational back-end for processing and display. (b) I/Q beamforming architecture. Signal are demodulated by the analogue front-end (AFE) to form I/Q components. The digital processor carries out synthetic aperture beamforming (SAB). The resultant image is transmitted to a back-end for post-processing and display.

to enable further miniaturisation. The proposed solution to this problem is to apply synthetic aperture beamforming (SAB), which has been proposed as an effectively means of minimising systems complexity in small-scale systems [45, 37]. Signals from each element in the transducer are multiplexed through a single channel (or sub-aperture of channels) in order to synthetically form a larger aperture. Two architectural variations of SAB are proposed:

1. **FRI Compressive SAB**. This architecture employs compressive sensing within the Finite Rate of Innovation (FRI) framework [46] to further reduce the I/Q bandwidth prior to sampling. This not only reduces the ADC power consumption, but also the data rate of the transmission link. Beamforming is not carried out in the hardware front-end; instead, low-rate I/Q samples are

transmitted to a computational back-end for processing. The hardware frontend used for analogue processing is identical to the quadrate-SAB case, but the bandwidth of the lowpass filter should be set below the I/Q Nyquist frequency, as discussed in section 3.3.2. The FRI compressive SAB architecture is illustrated in figure 3.1(a).

2. Quadrature SAB. This architecture processes signals in the baseband in order to reduce bandwidth and memory capacity. This is effectively phaseerror-free quadrature sampling [45], where I/Q components are obtained by mixing with a reference signal. Direct sampled I/Q beamforming is a similar method which employs second-order sampling to obtain I/Q components directly from RF signals [47] - digital focusing is implemented via phase rotation of the I/Q data. In the proposed architecture, I/Q components are derived sequentially using an analogue demodulator, and image reconstruction is carried out using SAB in the hardware front-end. The quadrature compressive SAB architecture is illustrated in figure 3.1(b).

Both architectures are described in further detail below and are compared in chapter 6 in terms of hardware efficiency and image quality.

# 3.2 Architecture 1: Quadrature Synthetic Aperture Beamforming

A high level block diagram representing the quadrature SAB architecture is presented in figure 3.1(b). A bandpass amplitude modulated ultrasound signal may be represented as follows:

$$R(t) = A(t)\cos(\omega_c t + \phi)$$
(3.1)

where A(t) is the envelope,  $\omega_c$  the carrier frequency in radians per second, and  $\phi$  the phase [16]. Expansion of R(t) yields

$$R(t) = A_I(t)\cos(\omega_c t) - A_Q(t)\sin(\omega_c t)$$
(3.2)

where  $A_I(t) = A(t) \cos \phi$  and  $A_Q(t) = A(t) \sin \phi$  are the in-phase and quadrature components respectively. These may be obtained by mixing with a reference signal in the analogue domain and filtering the result. Since  $A_I(t)$  and  $A_Q(t)$  are baseband signals, they may be sampled at a lower rate, which reduces the data rate and computational burden on the digital processor. After sampling, the next step is to phase-rotate the I/Q data for focusing.

According to the synthetic aperture focusing method, for a given pixel location  $\overrightarrow{r_p}$  at depth index k, the required time instance  $t_p(i, j)$  to take the signal value for summation is calculated by dividing this distance by the speed of sound in the medium [40].

$$t_p(i,j) = \frac{|\overrightarrow{r_p} - \overrightarrow{r_e}(i)| + |\overrightarrow{r_p} - \overrightarrow{r_r}(j)|}{c}$$
(3.3)

where  $r_e(i)$  is the location of the *i*<sup>th</sup> transmitting element and  $r_r(j)$  the location of the *j*<sup>th</sup> receiving element. A corresponding discretised delay index  $I_p(i, j)$  may then be calculated. An interpolation factor, *K*, is applied to increase the delay

resolution. If  $N_S$  sample points are obtained, then there are many as  $K \times N_S$  index locations between 1 and  $I_p(i, j)_{max}$ . The index value is read from a lookup table that is calculated *a priori*, based on the locations of each pixel  $(\overrightarrow{r_p})$  and transmitting (*i*) or receiving (*j*) element. For each index location  $I_p(i, j)$ , the I or Q data are then interpolated on-the-fly using any standard technique such as linear or quadratic interpolation.

If the delay is applied directly to the I/Q data, frequency-dependent phase errors distort the final image [45]. Therefore, I/Q sample points are remodulated or upconverted back to RF by mixing the interpolated result with new discrete reference signals:

$$I_{ref}[n] = \cos\left[\omega_c n\right] \tag{3.4}$$

$$Q_{ref}[n] = \sin[\omega_c n] \tag{3.5}$$

where  $\omega_c$  is the carrier frequency and *n* is the discretised time index. Again,  $I_{ref}[n]$  and  $Q_{ref}[n]$  are calculated *a priori*. The interpolated *I* and *Q* values are multiplied by the reference signals at  $n = I_p$  and then summed to yield the RF amplitude:

$$R[n] = A_{I}[n] \cos [\omega_{c}n] - A_{Q}[n] \sin [\omega_{c}n]$$

$$= A[n] \cos \phi [n] \cdot \cos [\omega_{c}n] - A[n] \sin \phi [n] \sin [\omega_{c}n]$$

$$= A[n] \cos [\omega_{c}n + \phi]$$
(3.6)
(3.6)
(3.7)

This value is then added to the pixel location, and the process is repeated for all i, j and n values, resulting in a low-resolution image. These low-resolution



Figure 3.2: Receive synthetic aperture imaging protocol (adapted from [48]).

images are summed or averaged to obtain a higher resolution image, which may then be transmitted via a wireless transmission link to an external post-processor. The iterative process for a single transmission position, *i*, is illustrated in figure 3.2. The final focused signal  $y_f(\overrightarrow{r_p})$  expressed mathematically is:

$$y_f(\overrightarrow{r_p}) = \sum_{j=1}^N \sum_{i=1}^M a(I_p(i,j)) R(I_p(i,j))$$
(3.8)

where  $a(I_p(i, j))$  is the apodisation (weighting) function,  $R(I_p(i, j))$  is the phase-shifted I/Q sum evaluated at  $I_p(i, j)$ , N is the number of transducer elements, M the number of transmissions.

The proposed algorithm inherently lends itself to an iterative, pipelined approach that may easily be implemented in a hardware description language (HDL).

Calculations for parallel groups of pixels may be pipelined during the reflection period, and the only memory required is for the image frame (which is updated dynamically), a single delay matrix, an array of sine/cosine values and dynamic apodisation constants. In synthetic aperture beamforming, the image quality is dependent on the number of transmissions  $(i_{max})$  / size of the synthetic transmit aperture, and the number of receivers  $(j_{max})$  / size of the receive aperture. A larger value of  $i_{max}$ implies better spatial compounding and SNR. Similarly, the lateral resolution is a function of the size of the receive aperture, so increasing  $j_{max}$  improves the image quality. These tradeoffs are discussed in further detail in chapter 4, which addresses the implementation of the digital beamformer.

# 3.3 Architecture 2: FRI Compressive Synthetic Aperture Beamforming

In this architecture, RF signals are also demodulated into I/Q components. However, beamforming is not carried out in the front-end. Instead, compressive sensing is applied to reduce the bandwidth of the signal in the analogue domain prior to sampling. This leads to a significant reduction in data bandwidth and power comsumption, as the power budget is dominated by the power consumption of the ADC and transmission link. By compressing the signal in the analogue domain, the computational burden is shifted to the digital back end, which carries out reconstruction of the original I/Q signals and finally baseband beamforming. In this architecture, compressive sensing within the FRI framework is combined for the first time with synthetic aperture beamforming. The compressive sensing framework is discussed in further detail below, and simulation results are presented.

### 3.3.1 Introduction to Compressive Sensing within the FRI Framework

In classical sampling theory, the well-known Shannon-Nyquist theory states that a bandlimited signal whose maximum frequency is  $f_{max}$  must sampled at or above the Nyquist rate of  $2f_{max}$  for perfect reconstruction of the signal. Higher bandwidths therefore require higher Nyquist rates and expensive sampling hardware, even if the actual information content of the signal is low. Unfortunately, in many applications such as medical imaging, an excessive number of samples leads to a transmission link bottleneck and increases the computation load on the digital processor.

Compression techniques have been devised in an attempt to address problems associated with high-dimensional data. For example, *sparse approximation* forms the foundation of the transform coding schemes included in the JPEG, MPEG and MP3 standards. The process is typically lossy, meaning that the compressed signal quality is lower than the original.

Compressed sensing (CS) is a new framework leveraging the idea of transform coding. CS differs from conventional compression techniques in that it attempts to directly sense the data in a compressed form, rather than first sampling at a high rate and then compressing the data. The framework was originally proposed in 2006 by Candès, Romberg, Tao [49, 50, 51] and Donoho [52], who showed that signals having a sparse representation can be recovered using a small set of linear, non-adaptive measurements. In other words, compressive sensing aims at capturing

only the essential information of the signal. The sparsity of the signal is exploited to recover it from fewer samples than required by the Shannon-Nyquist theory.

Prior to this work on CS, Vetterli *et al.* developed sampling methods for certain classes of parametric signals [46]. Parametric signals with k parameters may be sampled and reconstructed using only 2k samples. These signals have what is termed a *finite rate of innovation* (FRI) and appear in many applications such as biomedical imaging and radar. The sampling scheme in [46] was applied to periodic and finite streams of FRI signals such as Dirac impulses, nonuniform splines, and piecewise polynomials. Sinc and Gaussian kernels are used to extract a set of Fourier coefficients which are then used to obtain an annihilating filter. The locations and amplitudes of the pulses are finally determined [53, 46].

More recently, Tur, Eldar and Friedman extended this work by providing a unified framework called *Xampling* for sampling and recovery of multi-band and FRI signals in noise-free and noisy settings [53, 54, 55]. The primary goal of Xampling is to enable the implementation of mathematical, theoretical ideas in hardware so that it may be applied in real-world applications. Tur *et al.* were the first to demonstrate a simple, generalised implementation of the FRI sampling scheme in hardware for sub-Nyquist radar [55, 56]. They also apply the FRI framework in software to ultrasound [54, 57], to enable a substantial reduction in the sampling rate. Wagner *et al.* [57] took this work further and demonstrated that ultrasound beamforming may be carried out in the frequency domain using low-rate FRI samples. Both Wagner et al. [57] and Chernyakova et al. [58], demonstrate that frequencybeamformed low-rate signals can be reconstructed using compressed sensing (CS) techniques. Leveraging these findings, Spaulding *et al.* [59] proposed a mixer-based hardware architecture for sub-Nyquist subarray ultrasound beamforming. When applied to waveforms taken from a commercial ultrasound machine, this method reduces the total data rate by a factor of 54 with only minor degradation in image quality. While the architecture represents significant step forward, it does trade off hardware complexity and speed against image quality. In conventional beamforming approaches, the image is acquired one scan line at a time. The architecture in [59] beamforms the signal for each subarray using a quadrature mixing scheme with digital, phase varying I/Q mixing signals for each element. After sub-Nyquist sampling, the subarray signals are combined to form a scan line using the frequency-domain techniques in [58, 57]. The digital scan line is transmitted or stored before the I/Q phases are changed and the next scan line is obtained. The overall complexity of the system scales linearly with the number of elements (N), while the frame rate decreases with the square of N [40].

In [60], a different approach is proposed: combining compressed sensing with synthetic transmit aperture (STA) imaging. Key benefits include a significant reduction in system complexity and data acquisition time. However, as with all synthetic aperture architectures, image quality, data volume and susceptibility to motion artifacts are generally traded against system simplicity, speed and power consumption. In the system proposed in the present work, this tradeoff is entirely necessary in order to satisfy the bandwidth and power constraints, provided that the resultant image quality/frame rate is suitable for diagnostic purposes. The proposed architecture utilises a combination of FRI compressed sensing [46] and the synthetic aperture beamforming method, where only a single element transmits and receives

at a time. A brief review of the sampling scheme in [46] is provided in section 3.3.2, and simulation results are presented in section 3.3.3.

### **3.3.2** Sampling Signals with Finite Rate of Innovation

By definition, a signal with a *finite rate of innovation*  $\rho$  is one that is characterised by finite number of free parameters or degrees of freedom per unit time. For example, a series of pulses may be viewed as a parametric signal defined by the amplitudes and time delays of the pulses. If a signal has *K* amplitudes and time delays, it has *K* degrees of freedom per period. Two examples of signals with parametric representations are [53]:

• Streams of Dirac impulses with amplitudes  $\{a_k\}_{k=0}^{K-1}$  and time locations  $\{t_k\}_{k=0}^{K-1}$ :

$$y(t) = \sum_{k=0}^{K-1} a_k \delta(t - t_k)$$
(3.9)

Streams of pulses with pulse shape p(t), amplitudes {a<sub>k</sub>}<sup>K-1</sup><sub>k=0</sub> and time locations {t<sub>k</sub>}<sup>K-1</sup><sub>k=0</sub>:

$$x(t) = \sum_{k=0}^{K-1} a_k p(t - t_k)$$
(3.10)

As figure 3.3 illustrates, a reflected ultrasound signal comprises a set of wideband pulses which have a known pulse shape and a set of amplitudes and delays. Therefore, x(t) may be used to model an ultrasound signal. Since the pulse shape p(t) is known *a priori* to be Gaussian [16, 53], the only free parameters of the signal



Figure 3.3: Transmitted and reflected ultrasound signals modeled as a series of Gaussian pulses (adapted from [54]).

are the amplitude coefficients and time shifts. Note that  $t_l \in [0, \tau)$ , where  $\tau$  is the total period of the reflection or A-mode beam. Since the signal has 2*K* degrees of freedom per  $\tau$ , we would therefore expect the minimum number of samples to be 2*K*.

The sample values are obtained by filtering the signal with a sampling kernel, such as a sinc, Gaussian or sum-of-sincs kernel [46, 53]. A sinc kernel may be defined in the time domain as follows:

$$h_B(t) = B\operatorname{sinc}(Bt) \tag{3.11}$$

where the bandwidth B = 1/T. Uniform sampling with a sampling interval T leads to samples given by:

$$y_n = \langle h_B(t - nT), x(t) \rangle, \ n = 0, ..., N - 1$$
 (3.12)

Given that  $c_k$  is the weight of each pulse in the signal  $y_n$ , this is equivalent to:

$$y_n = \sum_{k=0}^{K-1} c_k B \operatorname{sinc}\left(\frac{t_k}{T} - n\right)$$
(3.13)

$$= (-1)^{n} \sum_{k=0}^{K-1} \frac{c_{k} B \sin\left(\frac{\pi t_{k}}{T}\right)}{\pi\left(\frac{t_{k}}{T} - n\right)}$$
(3.14)

$$\iff (-1)^n y_n = \frac{1}{\pi} \sum_{k=0}^{K-1} c_k B \operatorname{sinc}\left(\frac{\pi t_k}{T}\right) \frac{1}{\left(\frac{t_k}{T} - n\right)}$$
(3.15)

Since the signal has *K* degrees of freedom, we require  $N \ge 2K$  samples to sufficiently recover the signal. The reconstruction method requires two systems of linear equations - one for the locations of the Gaussian pulses involving matrix *V*, and one for the weights of the pulses involving a matrix *A* [46]. Define a Lagrange polynomial  $L_k(u) = (P(u)/(u - t_k/T))$  of degree K - 1, where  $P(u) = \prod_{k=0}^{K-1} (u - t_k/T)$ . Multiplying both sides of (3.15) by P(n) yields an expression in terms of the interpolating polynomials:

$$\underbrace{(-1)^{n+1}P(n)y_n}_{\mathbf{Y}_n} = \sum_{k=0}^{K-1} c_k \underbrace{B\sin\left(\frac{\pi t_k}{T}\right)\frac{L_k(n)}{\pi}}_{[\mathbf{A}]_{nk}}$$
(3.16)

$$\iff \mathbf{Y} = \mathbf{A} \cdot \mathbf{c} \tag{3.17}$$

To find the *K* locations  $t_k$ , we begin by deriving an annihilating equation (equivalent to the annihilating filter in [46]) to find the roots of P(u). Now, since the right hand side of (3.16) is a polynomial of degree K - 1 in the variable *n*, if we apply *K* finite differences, the left hand side will become zero, i.e.,  $\Delta^K ((-1)^n P(n) y_n) = 0, n = K, ..., N - 1$ . Letting  $P(u) = \sum_k p_k u^k$  leads to an annihilating filter equation equal

to:

v

$$\sum_{k=0}^{K} p_k \underbrace{\bigtriangleup^K \left( (-1)^n n^k y_n \right)}_{[\mathbf{V}]_{nk}} = 0$$
(3.18)

$$\iff \mathbf{V} \cdot \mathbf{p} = 0 \tag{3.19}$$

where **V** is an  $(N-K) \times (K+1)$  matrix. The system has a solution when Rank(**V**)  $\leq K$  and  $N \geq 2K$ . Thus, (3.18) may be used to find the K + 1 unknowns  $p_k$ , which leads to K locations  $t_k$  as these are the roots of P(u). Once the locations have been determined, the weights of the Gaussian pulses  $c_k$  may be found by solving the system in (3.17) for n = 0, ..., K - 1. The system has no solution if Rank(**A**) = K, where **A**  $\in \mathbb{R}^{K \times K}$  is defined by (3.16). A more detailed discussion of annihilating filters is given in [46]. Theoretically, the result does not depend on the sampling period T. However, **V** may be poorly conditioned if T is not chosen appropriately. As simulation results show below, oversampling yields an increase in the SNR of the reconstructed result. The sampling period is defined as  $T = \frac{\tau}{N}$ , where  $N = 2L \times F$ , where **F** is the *oversampling factor* [54].

It is also important to note that the sinc kernel described above has infinite time support and is non-causal. In the frequency domain, it is represented by an ideal lowpass filter with an infinite rolloff. Practically, the sinc kernel may be approximated in hardware by means an high order analogue lowpass filter. Simulations below demonstrate the performance of multiple filter types and orders, and a comparison is made to other kernel types suggested in [54].

### **3.3.3 FRI Compressive Sensing Simulations**

MATLAB simulations are used to demonstrate the sampling scheme on finite streams of pulses which resemble A-line ultrasound signals. The sampling scheme is first demonstrated on ideal and noisy streams of Gaussian and Dirac pulses, and then on real ultrasound data. The MATLAB code used in these simulations is adapted from the code provided in [54, 61]. These simulations provide a basis for further hardware-level tests in chapter 6.

#### **Noiseless Case**

The first simulation uses a noiseless input signal x(t) comprising L = 5 delayed and weighted versions of a Gaussian pulse:

$$h(t) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(\frac{-t^2}{2\sigma^2}\right)$$
(3.20)

where  $\sigma = 3 \times 10^{-3}$  and period  $\tau = 1$ . The time delays and amplitudes are allocated randomly to give the signal x(t) shown in figure 3.4a. x(t) is modulated at a carried frequency of 1.7 MHz to mimic an ultrasound pulse. The signal is then passed through a second order lowpass filter. Choosing F = 4 with L = 5, the low-rate sampling frequency is  $f_s = \frac{N}{\tau} = \frac{2F \times L}{\tau} = \frac{2(4)(5)}{2.08 \times 10^{-4}} = 197 kHz$ , which implies that the bandwidth of the filter must be  $f_c = f_s/2 = 98.5 kHz$ . Note that the bandwidth is lower than that of the signal ( $f_b > 100 kHz$ ). The output of the second order filter is shown in figure 3.4b. Finally, the envelop is reconstructed using the low-rate samples. The results for both second and fourth order lowpass filters are plotted in figures 3.4c and 3.4d. Compare this to the reconstructed signal obtained after using an ideal sinc filter (figure 3.5a). When using the sinc filter, the reconstructed signal is exact to *numerical precision*. However, the cascaded lowpass filter introduces a time-delay between the input and reconstructed pulses. This is due to the causal nature of the filter, and may be corrected digitally after reconstruction. Figure 3.5b shows the signal in 3.4d shifted by a fixed value of  $5.5 \mu s$  after digital reconstruction. This time shift value is calculated by subtracting the time location of the first pulse in the original signal from that of the reconstructed signal. The resultant signal closely matches the original pulse train. The time shift error and amplitude error may be reduced by increasing the order of the filter or by increasing the oversampling factor, *F*, as shown in the following section.



Figure 3.4: (a) Simulated noiseless stream of random Dirac impulses modulated at a carried frequency of 1.7 MHz. (b) Output after filtering signal with the 2<sup>nd</sup> order LPF and sampling the result (L = 5, F = 4). (c) Original versus reconstructed signal (2<sup>nd</sup> order LPF kernel). (d) Original versus reconstructed signal (4<sup>th</sup> order LPF kernel).



Figure 3.5: Original verses reconstructed signal when using (a) a sinc filter (b) a fourth order cascaded LPF. The output in (b) is shifted after reconstruction by  $5.5 \,\mu s$  to correct the time delay introduced by the filter.

#### **Noisy Case**

Gaussian noise with variance  $\sigma_n^2$  was added to the samples to test the performance of compressive sensing in non-ideal conditions. The SNR is defined as [54]:

$$SNR = \frac{\frac{1}{N} ||c_2^2||}{\sigma_n^2} \tag{3.21}$$

where *c* denotes the clean samples. For each SNR value in a range of 5-35 dB, 400 experiments are carried out with unique noise vectors. Two test scenarios are considered where the test signals are comprised of a series of L = 4 and L = 20 Dirac pulses with an amplitude of unity. An example of a pulse train with L = 4 is shown in figure 3.6. The pulses are distributed uniformly in the time window  $[0, \tau)$ , where  $\tau = 1$ . The time and amplitude errors are defined as the average of  $||t - \hat{t}||_2^2$  and  $||a - \hat{a}||_2^2$ . Figure 3.7 shows the errors for various sampling kernels as a function of SNR. The comparison in [54] is extended to include a simple sinc



Figure 3.6: An example of a Dirac pulse train with L = 4 plotted against the reconstructed signal (without time shifting) when the oversampling factor is 8.

kernel as well as causal lowpass filters (cascaded first and second order filters, and a biquad filter with feedback). Since the causal filters introduce a time delay, the error is calculated *after* shifting to correct the delay. In this analysis, Gaussian and Spline kernels are not considered as they are unstable for L > 9 [54].

The simulated filter properties are listed below:

- Sinc filter:  $s(t) = \operatorname{sinc}(Bt)$ , where B = 1/T.
- SoS filter:  $s(t) = \operatorname{rect}\left(\frac{t}{\tau}\right) \sum_{k \in K} b_k e^{j2\pi kt/\tau}$ . In the frequency domain,  $S(s) = \frac{\tau}{\sqrt{2\pi}} \sum_{k \in K} b_k \operatorname{sinc}\left(\frac{\omega}{2\pi/\tau} k\right)$ . The coefficients  $b_k$  are set to 1, and  $K = \{-L, ..., L\}$ .
- Cascaded lowpass filter:  $S(s) = \frac{k}{(1 + \frac{s}{\omega_c})^2}$ . The gain k = 1, and cutoff  $\omega_c = 2\pi \left(\frac{B}{2}\right) = \pi B$ .
- Biquad filter:  $S(s) = \frac{k\omega_o^2}{s^2 + \frac{\omega_o}{Q}s + \omega_o^2}$ , where  $\omega_o$  is set to give the -3dB cutoff  $\omega_c$ .

Evidently, the simple sinc kernel is even more robust than the SoS kernel when the SNR is less than 33 dB. For SNR values of less than 20-25 dB, the response of the

SoS kernel is unstable, whereas that of the other kernels is stable. Also note that the time and amplitude errors tend toward fixed values for the causal filters. In figure 3.7a, the time and amplitude estimation errors for the biquad with feedback and second order cascaded filter are nearly identical.



Figure 3.7: Performance of various sampling kernels in the presence of noise: (a) L = 4 Dirac pulses, (b) L = 20 Dirac pulses. In both cases, the oversampling factor is 4.

When the order of the cascaded filter is increased to n = 4, the error begins to decrease. The error settles at a fixed value which represents the minimum error estimate of the kernel.

On a hardware level, the cascaded topology is advantageous in that first order stages may be easily cascaded to form higher order filters. The error of higher order filters more closely match the ideal 'sinc' response. However, each cascaded stage adds a time delay which results in a systematic time delay error that increases as a function of the filter order. This may be corrected in software after reconstruction, as demonstrated in the previous section.

### The Effect of Oversampling

The reconstruction accuracy may be improved at the expense of oversampling (and hence, increasing the hardware power consumption and data bandwidth). Increasing *F* implies that the number of samples is increased while *L* remains the same. Figure 3.8 shows how the time and amplitude estimation errors change for different oversampling factors over a range of SNR values. In this case, 500 experiments were carried out for each oversampling factor, and the number of pulses was set to L = 4 (see figure 3.6, which shows an example of the original pulse train verses the reconstructed signal when the oversampling factor is set to 8). Clearly, the time and amplitude estimation error decreases as the oversampling factor increases. As the SNR increases beyond 40 dB, the time error decreases from  $6.4 \times 10^{-5} \tau$  (F = 4) to  $4.5 \times 10^{-8} \tau$  (F = 8) and the amplitude error from 0.208 (20.8%) to 0.04 (4%). One would expect the response of the 'non-ideal' LPF to match that of the ideal sinc

filter as *F* increases. This is evident in figure 3.9, where the time and amplitude errors closely follow that of the sinc kernel until SNR increases beyond a point.

The results of the experiment are summarised in table 3.1. Spline and Gaussian features are derived from [54]. The most significant advantage of the sinc and SoS kernels is their reconstruction accuracy. However, hardware implementation poses a challenge. In the case of a sinc kernel, hardware implementation is impossible as the sinc function is non-causal. A practical implementation of a sinc kernel is a high order Butterworth lowpass filter or a Chebyshev filter with a steep rolloff.



Figure 3.8: Time and amplitude estimation errors for oversampling factors of 1, 2, 4 and 8. In this case, a  $2^{nd}$  order LPF is used as the sampling kernel.



Figure 3.9: Time and amplitude errors for various sampling kernels when L = 4, F = 8. Note how the error for the 4<sup>th</sup> order cascade follows that of the sinc kernel for low SNR values.

#### **Reconstruction of Real Ultrasound Data**

Simulations were carried out using real ultrasound data recorded using GE Healthcare's Vivid-i portable ultrasound imaging system (data derived from [54]). The center frequency of the probe is  $f_c = 1.7021 MHz$ , the width of the transmitted Gaussian pulse is  $\sigma = 3 \times 10^{-7}$  and the depth of imaging is R = 0.16 m, corresponding to a time window of  $\tau = 2.08 \times 10^{-4} s$ . The original ultrasound signal is shown in figure 3.10. The signal was demodulated into "quasi-I/Q" components, with the bandwidth of the lowpass filter limited below the I/Q Nyquist frequency. The original I/Q components were reconstructed using the algorithm described above. The reconstructed I channel signal is presented figure 3.11, demonstrating the reconstruction accuracy for different filter types.
| Biquad             | 2L                    | 2L+1              | Very high           | Stable for high L.<br>and low SNR.    |                                       | Single-channel | LPF approximation<br>with feedback       | Single channel.         | Stable for all   | SNRs. Simple      | hardware  | implementation | and low power | consumption. Q | factor may be | tuned. | n using SoS/sinc.      | ling to achieve  | uction accuracy.        |               |                                |
|--------------------|-----------------------|-------------------|---------------------|---------------------------------------|---------------------------------------|----------------|------------------------------------------|-------------------------|------------------|-------------------|-----------|----------------|---------------|----------------|---------------|--------|------------------------|------------------|-------------------------|---------------|--------------------------------|
| Cascaded LPF       | 2L                    | 2L+1              | Very high           | Stable for high <i>L</i> and low SNR. |                                       | Single-channel | LPF approximation                        | Single channel.         | Stable for all   | SNRs. Very simple | hardware  | implementation | and low power | consumption.   |               |        | Lower accuracy that    | Require oversamp | reasonable reconstr     |               |                                |
| SoS Filter         | 2L                    | 2L                | High                | Stable for high L.<br>Unstable for    | SNR < 19  dB                          | Multi-channel  | approximation in<br>[53, 56, 54, 62, 55] | Lowest estimation error | for $SNR > 30$ . |                   |           |                |               |                |               |        | Multi-channel approach | increases power  | consumption and overall | sampling rate | $(2L \times no. of channels).$ |
| Sinc Filter        | 2L                    | 2L+1              | High                | Stable for<br>high L.                 | Unstable for $SNR < 7  dB$            | NA             |                                          | Lowest                  | estimation       | error for         | SNR < 30. |                |               |                |               |        | not feasible           |                  |                         |               |                                |
| Gaussian<br>Filter | 2L                    | 2L                | Low                 | Unstable for $L > 6$                  | l                                     | NA             |                                          | I                       |                  |                   |           |                |               |                |               |        | mplementation 1        |                  |                         |               |                                |
| Spline Filter      | 2L                    | 2L                | Low                 | Unstable for $L > 5$                  | l                                     | NA             |                                          | I                       |                  |                   |           |                |               |                |               |        | Hardware i             |                  |                         |               |                                |
| Feature            | Degrees of<br>Freedom | No. of<br>samples | Noise<br>robustness | Stability $(F = 4)$                   | · · · · · · · · · · · · · · · · · · · | Hardware Im-   | plementation                             | Advantages              |                  |                   |           |                |               |                |               |        | Disadvantages          |                  |                         |               |                                |

Table 3.1: Comparison of different sampling kernels.

# CHAPTER 3. SYNTHETIC APERTURE IMAGING ARCHITECTURES 48



Figure 3.10: Original ultrasound signal used for testing the FRI CS sampling and reconstruction method.

The minimum number of samples per time window  $\tau$  is N = 2L [46], where L is the number of Gaussian pulses per period. For this simulation, L = 20 and F = 4, but larger values may be used for greater resolution/accuracy. Therefore, the low-rate sampling frequency is  $f_s = \frac{N}{\tau} = \frac{2FL}{\tau} = \frac{2(4)(20)}{2.08 \times 10^{-4}} = 774 \, kHz$ , and the bandwidth of the filter is  $f_c = f_s/2 = 387 \, kHz$ . Considering both I and Q channels, the sampling rate is reduced by a factor of 13 from the original  $20 \, MHz$ .

Compressive sensing only presents a significant advantage over quadrature sampling if the sampling rate is lower. For example, if the baseband pulse stream has a bandwidth of 1 MHz, the I/Q sampling rate should be at least  $f_D = 2 MHz$ . In general, the following condition is required:

$$F.L < \frac{1}{2} \left( f_D \tau - 1 \right) \tag{3.22}$$



Figure 3.11: Reconstructed I component versus original/ideal I component, demonstrating the accuracy of the FRI CS reconstruction algorithm (with K = 20 and F = 4) on real ultrasound data for various sampling kernels: (a) sum-of-sincs kernel (b) ideal sinc kernel (c) a cascaded second order LPF (d) biquad filter.

In the example above, the necessary condition is:

$$F.L < 207.5$$
 (3.23)

*L* and *F* must then be tuned to maximise performance while satisfying condition (3.23). When L = 20, F = 4, the condition is satisified with  $f_s = 774 kHz < f_D$ . However, when *L* is increased beyond 51 (with F = 4) or beyond 25 (with F = 8), then  $f_s > f_D$ . At this point, it would be more advantageous to use conventional demodulation alone.

The above discussion highlights a fundamental tradeoff between the reconstruction accuracy and sampling rate. To achieve higher accuracy, both L and F must be large, but this results in an increased sampling rate and power consumption.

# 3.4 Summary

Two ultrasound imaging architectures are introduced in this chapter - quadrature synthetic aperture beamforming, and compressive sythetic aperture beamforming. In the former approach, signals are demodulated and synthetic aperture beamforming is carried out in the baseband. This effectively reduces the memory and logic capacity requirements of the beamformer. The following chapter presents the digital implementation of the beamformer and its implementation on FPGA/ASIC. The latter architecture employs compressed sensing within the finite rate of innovation (FRI) framework to reduce the signal/data bandwidth further. Extensive simulation results are provided, highlighting the effect of various parameters and types of sampling kernel on the performance of the FRI signal reconstruction algorithm and on image quality.

# **Chapter 4**

# Digital Beamforming Implementation

This chapter details the design and hardware implementation of the digital quadrature SAB algorithm proposed in chapter 3. As discussed, the architecture targets highly miniaturised applications, and thus the digital beamforming algorithm is optimised to minimise the hardware complexity, area and power consumption while maintaining sufficient image quality. Synthetic aperture algorithms typically require large memory capacities as RF data are collected from the entire aperture prior to beamforming. However, in small-scale systems, memory capacity is a significant constraint. Thus, careful attention was given to memory constraints, and calculations are pipelined and serialised through multiple combinational blocks in order to achieve real-time operation. This approach inherently lends itself to implementation in a hardware description language (HDL) on either an FPGA or ASIC.

This chapter begins with a detailed description of the digital SAB algorithm in section 4.1. Tradeoffs relating to area, power, frame rate and image quality are then



Figure 4.1: Finite state machine block diagram for the digital beamforming algorithm.

discussed in section 4.2 in the context of multiple applications (mobile devices, capsule endoscopes and wearables). Lastly, FPGA and ASIC implementations are presented in sections 4.3 and 4.4 respectively.

# 4.1 Digital Beamforming Algorithm

As illustrated in figure 4.1, the beamforming algorithm functions as a finite state machine that iterates through three states: READ, CALCULATE and WRITE. A master process controls the program state, the timing of the reflection period, and iteration of the transmit and receive element index variables i and j.

### State 1: Read

After initialisation, the first transmission is carried out and the program enters the first state (READ) where I/Q signals are sampled and read into distributed memory (not BRAM) to allow for an enhanced access speed and parallel memory operations.

## **State 2: Calculate**

At the end of the reflection period, the program enters the second state (CALCU-LATE). Calculations on parallel groups of pixels are pipelined over multiple clocks cycles. The number of parallel operations that may be carried out depends on the logic capacity of the device, and this in turn determines the maximum frame rate and image size.

**Time delay calculation** As discussed in chapter 3, the required time instance  $t_p(i, j)$  to take the signal value for summation is calculated by dividing the distance



Figure 4.2: Receive synthetic aperture imaging protocol (adapted from [48]).

by the speed of sound in the medium [40].

$$t_p = \frac{|\overrightarrow{r_p} - \overrightarrow{r_e}(i)| + |\overrightarrow{r_p} - \overrightarrow{r_r}(j)|}{c}$$
(4.1)

The time delay is thus a function of the geometric distance from the transmit element (index *i*), to the imaging point,  $r_p$ , and back to the receive element (index *j*), as shown in figure 4.2. The discretised time delay may therefore be expressed in terms of the element indices and inter-element spacing  $x_d$  and spacing  $z_d$  in the z-direction:

$$t_p = \frac{1}{c} \left[ \sqrt{x_d^2 \left( i - k \right)^2 + z_d^2 l^2} + \sqrt{x_d^2 \left( j - k \right)^2 + z_d^2 l^2} \right]$$
(4.2)

$$= \frac{z_d}{c} \left[ \sqrt{a(i-k)^2 + l^2} + \sqrt{a(j-k)^2 + l^2} \right]$$
(4.3)

where  $a = (x_d/z_d)^2$ . The maximum value of *l* must be an integer multiple  $(n_p)$  of the number of pixels in the z-dimension of the final image. In order to improve the image quality, linear interpolation is applied over a finer time scale. If the I/Q sampling period is  $f_s$ , the interpolation time step is  $1/(n_p f_s)$  (where  $n_p$  is the interpolation factor). The values in the look-up table are found by first converting the delay  $t_p$  to an index  $I_p$  which is rounded to the nearest value on the interpolated scale:

$$I_p = \operatorname{round} \left\{ n_p f_s t_p \right\}$$
(4.4)

$$= \operatorname{round}\left\{\frac{n_{p}f_{s}D}{cl_{max}}\left[\sqrt{a(i-k)^{2}+l^{2}}+\sqrt{a(j-k)^{2}+l^{2}}\right]\right\}$$
(4.5)

Implementing square root operations in HDL translates to a large number of digital gates. These operations may be numerically approximated or calculated *a priori* and read from a look-up table. The latter approach was taken in this work. The lookup table is a 2D matrix containing discrete delay indices  $I_p$ . The table is read twice using the variables *l* and *m*, where *l* is the depth index and m = |i-k| for the time of flight (TOF) from the transmitter to  $r_p$ , and m = |j-k| for the TOF from  $r_p$  to the receiving element. Thus, two clock cycles are required for one complete delay calculation.

**Interpolation and Remodulation** After the delay index is calculated, the I/Q magnitude values must be interpolated at the index. Numerous methods may be used, including linear, spline or quadratic interpolation. In this work, linear regression was applied as follows. First, the I/Q samples  $S_1$  and  $S_2$  above and below the delay index are found (i.e. the samples at indices floor  $\{I_p/n_p\}$  and  $(\text{floor } \{I_p/n_p\}+1)$ . Next, the gradient  $a_{int}$  of the straight line between  $S_1$  and  $S_2$  is calculated. The interpolated value is then  $I_{int} = S_1 + a_{int}I_r$  (similarly for Q), where  $I_r = I_p - n_p$ .floor  $\{I_p/n_p\}$  is defined as the rotation index.

If these delayed and interpolated I/Q values are added to the image sum directly, blurring of the final image results due to phase-induced errors [45]. To correct this, I/Q data samples are remodulated by multiplying by discrete sine and cosine carrier reference signals. Now, with  $n_p$  chosen conveniently to be 4, the interpolated frequency is  $4f_s$ . Thus, there are four possible sine/cosine reference values per I/Q sampling period. Assuming an initial phase of zero for each reflection, these values are conveniently  $\begin{cases} 0, 1, 0, -1 \end{cases}$ . Thus, upcoversion is carried out by simply multiplying the interpolated value by the carrier reference at the rotation index. The

reconstructed RF signal is therefore:

$$R[n] = A_I[n] \cos[\omega_c n] - A_O[n] \sin[\omega_c n]$$
(4.6)

It should be emphasised again that the initial phase of each reflected signal must be exactly zero to ensure correlation between reference carriers and thus prevent blurring of the image. Therefore, it is crucial to properly align the phase of reflected signals. This may be done using a precisely controlled sampling protocol. Secondly, note that  $n_p$  may be increased to create a finer interpolation scale. This results in better image quality, but requires a larger memory capacity.

**Dynamic Apodisation** Dynamic apodisation is used to maintain a constant Fnumber (f#) over the imaging depth. The F-number is defined as the ratio of the imaging depth, z, to the aperture size,  $\alpha$  [63]. The synthetic aperture is dynamically grown as a function of the imaging depth in order to keep the f# constant. The number of lines l to consider in a window for focusing to a depth z are calculated using the following expression [63]:

$$l = \frac{z_k}{(f\#).\triangle x} \tag{4.7}$$

where  $z_k$  is the pixel depth and  $\triangle x$  is the inter-element spacing. This equation is used to derive a set of *a priori* constants that are stored in distributed memory to allow for real-time dynamic apodisation of the receive aperture. The simplest case is where the apodisation constant is a 1-bit value - i.e. 0 or 1. This effectively defines a rectangular window over the aperture that is a function of depth, as illustrated



Figure 4.3: Illustration of a binary dynamic apodisation window, where grey pixels hold a value of 1, and other pixels hold a value of 0. The width of the window is a function of  $z_k$ , and defines whether upconverted RF values are added or not added to the image sum at each pixel location.

in figure 4.3. Practically, the apodisation process described above is carried out by simply summing (1) or not summing (0) the upconverted value at each pixel at depth  $z_k$ , as explained below.

**Summation** Finally, after the final pixel value is calculated, it is added to a global 2D image array stored in simple dual port BRAM. The memory size depends on the size of the image - in this case,  $352 \times 96 = 33792$  elements. Since simple dual port BRAM is addressable only one register at a time, the number of parallel memory operations may be increased by distributing the required memory across multiple BRAM modules. On FPGA, 16 BRAM modules are used, allowing for 16 parallel pixel calculations to be carried out simultaneously. This increases the frame rate, but also the power consumption and core utilisation of the FPGA or silicon area if implemented on an ASIC. These parametric tradeoffs are discussed further in the section 4.2.

After summation, the program enters the read state again and the master process increments the transmit and receive element indices i and j. Each iteration incrementally improves the SNR of the final image through the process of linear superposition. The final image sum is expressed in equation (3.8) in chapter 3.

# State 3: Write

At the end of one iterative cycle, a high resolution frame is formed, and the algorithm enters a WRITE state, where the image is transmitted to the back-end for postprocessing and displaying. A universal asynchronous receiver/transmitter (UART) is used to transmit each pixel value to the receiver by means of a standard RS-232 adapter on PCB. The transmission operation involves first reading each 14-bit pixel value from BRAM, and transmitting the value in two 7-bit packets with a start bit (0) and two stop bits (1). Pixels are transmitted in a predefined order according to pixel location and read into Matlab, where final post-processing operations are carried out.

# **Image Post-Processing and Displaying**

Since the interpolated I/Q signals are remodulated back to RF prior to summation, envelop detection must be carried out on the final image. Envelop detection is carried out in Matlab using the Hilbert transform. Display parameters are then calculated and the image is logarithmically compressed and displayed as a B-mode image.

# 4.2 Digital Design Tradeoffs

In chapter (3), tradeoffs and constraints affecting the architecture of the design are discussed. These tradeoffs include silicon area / FPGA core utilisation, power consumption, frame rate, image quality/size, transmission line bandwidth and system complexity/cost. In this section, these tradeoffs are considered within the context of the design of the digital beamforming algorithm.

In synthetic aperture beamforming, the image quality is dependent on the number of transmissions  $(i_{max})$  / size of the synthetic transmit aperture, and the number of receivers  $(j_{max})$  / size of the receive aperture. A larger value of  $i_{max}$  implies better spatial compounding and SNR. Similarly, the lateral resolution is a function of the size of the receive aperture, so increasing  $j_{max}$  improves the image quality.

However, increasing  $i_{max}/j_{max}$  leads to an increase in data acquisition time and therefore a reduction in the *maximum* frame rate (*FR*<sub>max</sub>), which is a function of the time of flight  $t_f = 2D/c$ :

$$FR_{max} = \frac{N_a c}{2Di_{max}j_{max}} \tag{4.8}$$

The maximum frame rate is linearly proportional to the number of parallel receiver channels  $N_a$ .  $FR_{max}$  effectively defines the boundary of a region of operation for various values of  $N_a$ . For example, in figure 4.4, three regions of operation are defined for  $N_a = 1$ ,  $N_a = 2$  and  $N_a = 8$ . As a practical example, with a single channel ( $N_a = 1$ ), and D = 10cm, c = 1540m/s,  $i_{max} = 30$  and  $j_{max} = 64$ , the maximum frame rate is 4Hz, which is acceptable for capsule endoscopy but not for a portable scanner. In this case, either the image size/quality may be reduced, or more channels must be used at the expense of increased power consumption.

Within the regions of operation discussed above, the frame rate is a function of other digital design parameters such as clock frequency,  $f_{clk}$ , and the degree of paralellisation (i.e. the number of parallel delay calculations per clock edge,  $N_p$ ). In order to increase the frame rate up the maximum in (4.8),  $f_{clk}$  and/or  $N_p$  must be increased at the expense of power and/or area (or logic utilisation). In full dynamic receive beamforming, this relationship is expressed in the following equation:

$$FR = \frac{N_p f_{clk}}{2i_{max} j_{max}^2 z_{max}}$$
(4.9)

where  $z_{max}$  is the number of pixels in the axial dimension of the image. A factor of two in the denominator is introduced to account for serialising send and receive operations in hardware over two clock cycles. The multidimensional tradeoff inherent in (4.9) must be carefully balanced to maximise the frame rate/image quality/image size and minimise the operating frequency/area/power consumption. The relationship in (4.9) is illustrated in figure 4.4, where frame rate is plotted against the number of transmit position,  $i_{max}$ , for various clock frequencies and  $j_{max} = 64$ channels,  $z_{max} = 350$ ,  $N_a = 1$  and  $N_p = 8$ . Figure 4.4 also demonstrates the relationship between the clock frequency and  $i_{max}$  for a constant frame rate of 5Hz. The clock frequency may be increased at the expense of power up to the maximum operating frequency of the digital circuit.



Figure 4.4: Frame rate (left axis) and clock frequency (right axis) vs. the number of transmit positions  $i_{max}$ . The number of parallel analogue channels,  $N_a$ , defines the region of operation.

# 4.3 FPGA Implementation

The beamforming algorithm was implemented in Verilog and synthesized in Xilinx ISE® Design Suite with a Spartan-6LX® FPGA (XC6SLX150-3FGG484I). The device utilisation summary is provided in table 4.1 for the following parameters:  $f_{clk} = 20MHz$ , frame rate = 7Hz,  $N_p = 16$ ,  $i_{max} = 48$ ,  $j_{max} = 64$ ,  $k_{max} = 352$  (i.e. pixel resolution of  $64 \times 352$ ). On-chip BRAM was used to store the image and beamforming/apodisation parameters.

| Logic Utilisation              | Units | Device Utilisation |
|--------------------------------|-------|--------------------|
| Slice Registers                | 7576  | 4%                 |
| Slice Look-up Tables (LUTs)    | 28017 | 30%                |
| LUT-FF pairs                   | 1926  | 3%                 |
| Block RAM/FIFO                 | 32    | 11%                |
| DSP48A1s                       | 4     | 0.1%               |
| Global Buffers (BUFG/BUFGCTRL) | 1     | 25%                |

Table 4.1: Device utilisation summary on a Spartan-6 FPGA for  $N_p = 16$ , frame rate = 7 Hz, pixel resolution =  $64 \times 352$  and  $i_{max} = 48$  angles.

The on-chip power consumption is proportional to the system clock frequency  $(f_{clk})$ . For  $f_{clk} = 20MHz$ , the power consumption is estimated to be  $296\,mW$  (static power  $172\,mW$ , dynamic power  $124\,mW$ ) by the Xilinx power estimator. This works out to be an equivalent power consumption of  $4.6\,mW$ /channel across the entire synthetic aperture (64 elements). Doubling the clock frequency allows for better spatial compounding (larger  $i_{max}$ ), as well as a higher frame rate or pixel resolution, at the expense of doubled power consumption and half the battery life.

# 4.4 ASIC Implementation

The Verilog algorithm was also synthesized in Cadence® Encounter using AMS 0.18 $\mu m$  CMOS technology. In this case, we constrain the area and power for a "worst-case" application such as capsule endoscopy. First, we assume that offchip SRAM will be used to store the image. The following parameters are chosen:  $f_{clk} = 24MHz$ , frame rate = 4Hz,  $N_p = 1$ ,  $i_{max} = 8$ ,  $j_{max} = 32$ ,  $k_{max} = 352$  (i.e. pixel resolution of  $32 \times 352$ ).

### 4.4.1 Synthesis

The first step in the digital design flow was synthesis, which was carried out using the following steps:

- 1. Constraints entry and checking. First, the clock is constrained and modelled at  $f_{clk} = 24 MHz$ , and input/output constraints are applied. These timing constraints are initially checked and validated using the Conformal Constraint Designer.
- 2. Mapping and optimisation. The first step in the optimisation process is to apply generic optimisation (syn\_generic). This performs redundancy removal, removal of unloaded logic and various datapath optimisations. Next, technology mapping and optimisation is carried out (syn\_map), which maps the design using cells from the standard cell library and optimises the logic to meet timing constraints. Lastly, incremental optimisation is applied (syn\_opt), which optimises each block and performs area enhancement/restructuring.
- 3. Power Optimisation. This step was not completed during synthesis of the ultrasound ASIC, but is included here for completeness. Power may be optimised with the addition of clock gating and operand isolation, or by using multi- $V_t$  threshold libraries. Clock gating saves power by "pruning" the clock tree or reducing the switching activity on the clock network.
- 4. **Physical Synthesis**. The design is synthesized into a physical layout using standard cells. The design is then optimised again using syn\_opt to meet timing constraints. Typically, this involves buffering, but can also include

complex re-structuring or logic resynthesis. After synthesis, the placement is transferred to the place and route tool for physical implementation.

# 4.4.2 Physical Implementation

After synthesis, physical implementation was carried out using the Cadence place and route tools:

- Floor Planning. In this work, floor planning simply involved placing the synthesised design in the correct position in the core area. Power rings were also placed for VDD, VSS and VSUB. IO pads were not used as the design was not fabricated.
- 2. Early design rule check (DRC).
- 3. **Placement** of well-tap cells, and standard cells to minimise overall chip size and ensure routability.
- 4. **Clock Tree Synthesis**. The goal of this step is to generate a clock tree that is distributed evenly to all sequential elements in a design. This involves insertion of buffers or inverters along the clock paths of ASIC design in order to achieve zero/minimum skew or balanced skew.
- 5. **Routing**. The first step in routing is power routing, where power and ground pins are routed to nearby rings and stripes. Thereafter, the full routing pass includes global routing, final (detailed) routing and search-and-repair routing which eliminates routing violations from the previous steps. Post-route timing analysis is also carried out to ensure the absence of timing violations.



Figure 4.5: Layout of the beamforming ASIC, synthesized in Cadence using AMS 0.18 $\mu m$  CMOS. The following parameters were used:  $N_p = 1$ , frame rate = 4*Hz*, pixel resolution =  $32 \times 352$  and  $i_{max} = 8$  angles.

 Verification. Final verification checks were not carried, but these would include checks for open/short circuits, antenna violations, DRC checks and IR drop / electromigration (EM) analysis.

The final ASIC layout is presented in figure 4.5. The dimensions of the ASIC are  $1.35 mm \times 1.35 mm$ , and the estimated power-consumption is 14.9 mW for the parameters presented above.

### 4.4.3 System-Level Power Estimation

The total projected power consumption of the receiver is dominated by the digital beamformer, analogue-to-digital converter and RF transceiver. An estimate of power consumption is based upon the following system components:

- ADS900 ADCs operating at 5MHz:  $\sim 15 mW$ .
- Two  $6 \times 8 mm$ ,  $128K \times 8$  SRAM modules (ISSI 62WV 1288DBLL):  $\sim 12 mW$ .
- MICS-band wireless transceiver (Zarlink ZL70102) operating at 800 kbps: ~ 16.5 mW.
- Analogue front-end (AFE) consumes 8 mW during the reflection period.

Thus, the projected power consumption of the receiver is 65 mW. The power dissipated during emission is unknown because a transmission circuit was not designed and a physical transducer was not used for testing. However, previous work involving the use of PMUT/CMUT transducers indicates a power consumption range between 1 - 18 mW/channel, depending on the transmission topology [64, 65]. Based on this, a conservative estimate for transmission power is 20 mW/channel. Thus, the total system power would be approximately 85 mW. The system could run continuously off two 1.55V, 175 mAh SR44SW button cell batteries for around 6.8 hours continuously at a frame rate of 4 Hz, with image quality comparable to figure 6.7(a) (see following chapter). As stated before, a low frame rate is only acceptable for capsule endoscopy (not portable scanners), which is less susceptible to motion artifacts due to slow movement through the small intestine. Better image quality may be achieved by decreasing the frame rate, and the battery life could be extended by

imaging non-continuously. The system diameter would be approximately 11*mm* using stacked PCB units.

# 4.5 Summary

This chapter presents the design and hardware implementation of the digital quadrature synthetic aperture beamformer. Tradeoffs relating to area, power, frame rate and image quality are considered in the context of small-scale applications such as capsule endoscopy. These factors are used to optimise the performance of the system. Synthetic aperture algorithms typically require large memory capacities as RF data are collected from the entire aperture prior to beamforming. However, in the proposed system, calculations are pipelined and serialised through multiple combinational blocks in order to achieve real-time operation. This approach inherently lends itself to implementation in a hardware description language (HDL). Thus, the beamformer was implemented on FPGA. ASIC synthesis and physical implementation was also carried out to investigate the resultant size and power consumption in silicon. The entire beamforming process is carried out digitally in the baseband. Signals are demodulated in the analogue domain prior to sampling. This is the subject of the following chapter, which discusses the design and implementation of the analogue front-end.

# Chapter 5

# Analogue Front-End of Ultrasound Receiver

In chapter 3, two architectural frameworks were proposed for small-scale systems: quadrature and compressive synthetic aperture beamforming (SAB). This chapter continues by discussing the circuit-level implementation of the analogue front-end (AFE). Section 5.1 begins with a general overview of the AFE and its context within the entire system. Section 5.2 presents the first stage in the AFE, a low-noise preamplifier, which amplifies RF ultrasound signals without degrading the signal-to-noise ratio. Demodulation is carried out by a passive mixer and programmable lowpass filter, discussed in sections 5.3 and 5.5 respectively. The bandwidth of the filter is digitally selectable such that the AFE may be used for either quadrature or compressive SAB. The gain is also variable by means of a programmable gain amplifier (PGA), presented in section 5.4. Supply/common mode voltages and biasing currents are generated using the biasing circuitry presented in section 5.6. Finally, the physical layout of the AFE is briefly discussed in section 5.7.

# **5.1** Overview and Requirements Analysis

### 5.1.1 Design Overview

Before discussing each subsystem, we begin with a general overview of the frontend. The AFE is illustrated in figure 5.1 within the context of the broader system. Using synthetic aperture beamforming, a single channel may be used to process the signals from the entire array. This is done using an external multiplexer (MUX), which switches the channel between different transducer elements or synthetic ultrasound signals generated using a DAC. The firing/transmission sequence is precisely controlled using digital control signals generated using an FPGA. While only a single analogue processing channel was fabricated in this work, the system is scalable to any number of parallel channels. As discussed in the previous chapter, this would allow for a higher frame rate (due to a reduced acquisition time) at the expense of greater power consumption.

The first stage in the AFE is a preamplifier, or more specifically, a low-noise, variable gain amplifier (LN-VGA). It is important to minimise the input referred noise of the preamplifier as the noise is injected directly into the receiving signal. Time gain control (TGC) is achieved by increasing the gain of the amplifier during each reflection period, in order to counteract the effect of signal attenuation as a function of imaging depth. The LN-VGA sweeps the gain exponentially over time, thereby shifting the noise floor to an appropriate level. After the signal has been amplified, it is downconverted to baseband using a passive mixer and split into I/Q components. The signal is then amplified again using a programmable gain amplifier (PGA), which is placed after the mixer to reduce bandwidth requirements.



Figure 5.1: High-level block diagram showing the various subsystems constituting the analogue front-end (AFE). The AFE amplifies and demodulates ultrasound signals, which are then sampled externally and processed by a digital beamformer.

The PGA allows the user to set a suitable gain value, depending on the maximum amplitude of the transducer output voltage. The lowpass filter attenuates unwanted LO clock feedthrough and image frequencies resulting from the mixing operation. When used for compressive SAB, this filter also functions as a compressive sensing filter kernel with selectable bandwidth. Three bandwidth selections are provided in order to test the efficacy of the reconstruction algorithm. Finally, after sampling, the discretised I/Q signals are processed by the digital beamformer described in chapter 4.

# 5.1.2 Requirements Analysis

A full requirements specification is provided in table 5.1. These requirements are discussed below.

| Parameter                           | Constraint/Requirement     |
|-------------------------------------|----------------------------|
| Supply voltage                      | 3.3 <i>V</i>               |
| Transducer center frequency $(f_c)$ | 2.5 <i>MHz</i>             |
| Transducer Bandwidth                | $100\% f_c$                |
| Filter cutoff                       | 200 kHz, 500 kHz, 1.25 MHz |
| Gain                                | $32\pm 6dB$                |
| Input referred dynamic range        | 58 <i>dB</i>               |
| at 1kHz (THD < 1%)                  |                            |
| Input referred noise floor          | 17.6 µV                    |
| ADC resolution                      | 10 bits (60 <i>dB</i> )    |

Table 5.1: Requirements specification for the analogue front-end.

### **Dynamic Range and Noise**

Ultrasound signals typically have a large dynamic range (100 - 120 dB). However, modern ultrasound systems generally have a display resolution of 25 - 30 dB[66, 67], as the human eye can distinguish only around 30 grey levels [67]. Adding an image saturation allowance of 6 dB and noise threshold of 6 dB to the minimum display resolution (assumed to be 30 dB) yields a minimum dynamic range of 42 dBat the output. However, this specification is only valid under the assumption that scanlines are formed directly through the delay and summation of sampled RF signals. In this system, SAB degrades SNR, depending on how many transmission lines/angles are used. Thus, an additional 6 dB is specified, such that the minimum output dynamic range is 48 dB.

To find the overall input referred dynamic range, the attenuation rate of ultrasound in soft tissue must be taken into account. Assuming an attenuation rate of 0.5 dB/MHz/cm, then the total attenuation is 25 dB at 2.5 MHz for a penetration depth of 10 cm (total signal path of 20 cm, considering signal reflection). Thus, the dynamic range (or maximum SNR) for the receiver must be at least 48+25 = 73 dB. However, achieving this would require a high resolution ADC ( $\geq 12$  bit). Alternatively, to compensate for 25 dB signal attenuation, time-gain compensation should be applied first using a variable gain amplifier (VGA). The VGA sweeps the gain exponentially over time to compensate for tissue attenuation. With a VGA gain range of 0 - 15 dB, the ADC resolution should be at least 73 - 15 = 58 dB. This requirement is satisfied by a 10 bit ADC (60 dB).

### Gain

Since a transducer has not been implemented, the maximum preamplifier input is estimated to be  $40 mV_{pp}$  based on literature [68, 69, 70]. In this work, the ADC aperture is chosen to be  $1.5V_{pp}$ . The total gain is determined by the required input and output voltages, as well as the estimated gain/loss of each stage in the analogue front-end:

Total Gain = 
$$20\log\left(\frac{ADCaperture}{Max \, preamp \, input}\right) \pm (Design \, margin)$$
 (5.1)

$$= 20 \log\left(\frac{1.5V}{40\,mV}\right) \pm (Design \,margin) \tag{5.2}$$

$$= 32 \, dB \pm (Design margin) \tag{5.3}$$

A design margin of  $\pm 6 dB$  is included to account for variability in the design of the transducer. Therefore, the overall gain of the receiver must be variable between 26 dB to 38 dB. In many ultrasound systems, the gain is split between the preamplifier and a programmable gain amplifier (PGA) stage to allow the user to set the desired gain. This approach is also taken in the present work, where 20 dB is provided by the preamplifier and the remaining gain is set by the PGA. The nominal gain for the PGA is calculated as follows:

PGA Gain = 
$$A_{total} - A_{preamp} - A_{mixer} - A_{LPF} \pm (Design margin)$$
 (5.4)

$$= 32 - 20 - (-3.9) - 0 \pm 6 \, dB \tag{5.5}$$

$$= 16 \pm 6 dB \tag{5.6}$$

Note that the gain of the passive mixer,  $A_{mixer}$ , is -3.9 dB as discussed in section 5.3, and the gain of the lowpass filter,  $A_{LPF}$ , is unity. A design margin of 6 dB allows for variations in the output voltage of the transducer. Based upon this estimate, three selectable PGA gain values are conveniently chosen to be 5 (14*dB*), 10 (20*dB*) and 20 (26*dB*).

#### **Input Referred Noise**

The input referred noise floor for the preamplifier is the difference between the maximum input and the input referred dynamic range.

IR Noise Floor 
$$\leq 20\log(0.02V_p) - 58\,dBV_p = 25\,\mu V_p$$
 (5.7)

### Bandwidth

The required bandwidth of the receiver is determined by the center frequency and bandwidth of the ultrasound transducer. In this work, the center frequency of the transducer used to obtain test signals is  $f_c = 2.5 MHz$ , and the bandwidth is from

1.25 - 3.75 MHz (i.e. 100%  $f_c$ ). Thus, the bandwidth of the receiver must be at least 3.75 MHz.

# 5.2 Preamplifier

The design of the preamplifier is critical to ensure a good noise performance and dynamic range. Typically, the first stage dominates the sensitivity and noise performance of the entire signal chain. It has been proven that the noise figure of the system, F, is equal to

$$F = F_1 + \frac{F_2 - 1}{G_1} \tag{5.8}$$

where  $F_1$  and  $F_2$  are the noise figures of the first and second stage respectively, and  $G_1$  is the gain power of the first stage [71]. Hence, special care must be taken when designing the first stage so as to minimise the input referred noise.

In this section, we begin with a brief analysis of prior art, and then proceed to detail the design of the preamplifier. Particular emphasis is given to minimising noise and maximising linearity, while maintaining low power consumption and a sufficient bandwidth.

## 5.2.1 Prior Art

A tabulated summary of state-of-the art preamplifier topologies is provided in table 6.2 in chapter 6, where these topologies are compared against hardware results. These topologies are briefly reviewed below.

The choice of design topology depends upon the electrical impedance of the transducer. Capacitive micromachined ultrasound transducers (CMUTs) typically have a large impedance [72, 7] and interface with low input-impedance transimpedance amplifiers (TIAs) which convert current to voltage. Examples of transimpedance amplifiers are found in [36, 73, 65, 74, 75, 35, 76].

However, the present design targets piezoelectric transducers, which typically have a small impedance in the order of a few kiloohms near the resonant frequency [72, 7]. TIAs would require a very large gain-bandwidth product to minimise the input impedance, since  $Z = R_f f/GBP$  [7, 77]. The design would therefore fail to optimise the power/noise tradeoff, since excess power would be spent on increasing the GBW product rather than minimising input-referred noise.

A more appropriate design choice is to use a voltage amplifier to sense voltage, not current. Various topologies have been proposed in prior art using operational amplifiers with feedback or in open loop configuration [7, 78, 79, 8]. Voltage feedback is employed in [79, 80, 81] and current feedback in [82]. The design in [7] utilises a capacitive voltage feedback amplifier which offers a midband voltage gain equal to the ratio of the input to feedback capacitance:  $A_M = C_I/C_F$ . The input impedance of this design is in the order of tens of kiloohms. In [80], a three stage, miller-compensated topology is used for the amplifier core. Similarly, in [78], a three stage OTA with feedback is also proposed, with a folded cascode amplifier input stage and a class AB output stage. In this case, the preamplifier has a bandwidth of 32MHz, gain of 12 dB and consumes 20 mW of power.

Other designs use preamplifier cores without feedback to maximise the input impedance. For example, a  $33 \mu W$  low noise amplifier with a sub-3 dB noise figure



Figure 5.2: Ultrasound transceiver (taken from [8]) (a) Preamplifier core (b) Variable gain amplifier.

and 10.5MHz bandwidth is proposed in [83]. The preamplifier is a basic transconductor cell with resistive degeneration to maximise the linearity. A capacitive attenuator is used to implement variable gain control. In [8], the design of a wideband (75MHz) receiver for high resolution, high-frequency ultrasonic imaging systems is presented. A single stage, differential preamplifier core is used, as shown in figure 5.2a. *R*1 and *R*2 function as common-mode feedback resistors and may be used to adjust the gain, which is expressed as  $g_{m1,2}(r_{o1,2}||r_{o3,4}||R_{1,2})$ . The preamplifier in this design is followed by a Gilbert-type four-quadrant multiplier for variable gain control, as shown in figure 5.2b.

Matching is an important consideration as the preamplifier interfaces directly with the piezoelectric transducer. In [82], the input impedance of the LNA is matched with the internal resistance of the transducer by means of a tunable resistance that biases the non-inverting op-amp input. Similarly, in [8], impedance matching is done by terminating the receiver inputs with resistors that have the same impedance as the transducer. It has been shown that tight coupling between the electronics and transducer eliminated the need for broadband electrical matching networks, particularly when operating at low frequencies [69]. However, if the receiver is connected to transducers requiring high voltage excitation, T/R switches and coaxial connections, an impedance matching network should be carefully designed [8]. In this work, a matching network was not implemented as a physical transducer was not used and the design was tested using an existing database of ultrasound signals.

### 5.2.2 Design and Simulations Results

The proposed design is adapted from the differential configuration discussed above [8, 84], and is presented in figure 5.3. The resistors  $R_1$  and  $R_2$  in [8] are replaced with a MOS device operating in the triode region, such that the load is equal to the parallel combination between  $r_{o1,2}$ ,  $r_{o3,4}$  and the triode resistance. The core of the amplifier is a differential pair connected to a source follower, which buffers the output from the resistive load of the third stage (PGA). The common-mode output level of the output,  $V_{out}$ , is stabilised using the common mode (CM) feedback loop, and node *P* is virtual ground. The common mode amplifier provides sufficient open loop gain to force the common mode voltage  $V_{cm} = \frac{1}{2}(V_{out}^+ + V_{out}^-)$  equal to the CM reference  $V_{ref} = 1.6V$ .

The small signal equivalent model of the differential pair and MOS load is shown in figure 5.4. For low gain values, the PMOS device,  $M_5$ , is in triode since



Figure 5.3: Preamplifier with variable gain. The gain may is controlled by varying  $V_c$  (the gate voltage of  $M_5$ ).

 $V_{DS5} < V_{SG5} - V_{thP}$ . The resistance of  $M_5$  is therefore:

$$r_t = \frac{1}{\mu_o C_{OX} \left( W/L \right)_t \left( -V_{GS5} - V_{thP} \right)}$$
(5.9)

$$=\frac{1}{\mu_o C_{OX} \left(W/L\right)_t \left(v_{o2} - V_c + V_{thP}\right)}$$
(5.10)

where  $\mu_o$  is the charge-carrier effective mobility,  $(W/L)_t$  is the ratio of the gate width to the gate length and  $C_{OX}$  is the gate oxide capacitance per unit area. Using the concept of "half circuits", the small signal gain for the left side of the differential amplifier is:

$$A_{v} = g_{m1,2}\left(r_{o1,2}||r_{o3,4}||R_{t}\right)$$
(5.11)

$$=g_{m1,2}\left(\frac{r_{o1,2}r_{o3,4}R_t}{R_t r_{o1,2} + R_t r_{o3,4} + r_{o1,2} r_{o3,4}}\right)$$
(5.12)



Figure 5.4: Small signal equivalent model of the differential pair and MOSFET load.

where  $R_t = r_t/2$  and the transconductance of  $M_1$  and  $M_2$  is  $g_{m1,2} = \mu_o C_{OX} (W/L)_{1,2} (v_{GS1,2} - v_{thN})$ . The output resistances of the transistors  $M_1$ ,  $M_2$  is  $r_{o1,2} = \frac{1}{\lambda I_{1,2}}$ , and similarly for transistors  $M_3$  and  $M_4$ .

### **Noise Analysis**

We now turn to analyse the noise performance of the preamplifier in order to gain a better understanding of how optimise the design. In this analysis, we do not consider the noise contribution of the voltage regulator in section 5.6. The principal sources of noise include thermal and flicker noise generated by transistors in the amplifier core. Since all noise sources are independent, their individual noise contributions can be calculated and added together by the superposition principle. Individual noise sources may be modeled using voltage sources,  $\overline{V_n}$ , as shown in figure 5.5 [84].



Figure 5.5: Modeling of noise sources present in the variable gain preamplifier.

The analysis begins be considering the noise contribution of  $M_3$  and  $M_4$  at the output of the differential pair [84]:

$$\overline{V_{n3,XY}^2} = g_{m3}^2 R_O^2 \overline{V_{n3}^2} + g_{m4}^2 R_O^2 \overline{V_{n4}^2}$$
(5.13)

$$=2g_{m3}^2 R_O^2 \overline{V_{n3}^2}$$
(5.14)

where  $R_O = r_{o1} ||r_{o3}||R_t = r_{O2} ||r_{O4}||R_t$  and  $\overline{V_{n3}^2}$  is the total thermal and flicker noise referred to the gate of *M*3:

$$\overline{V_{n3}^2} = \underbrace{4kT\frac{2}{3}\frac{1}{g_{m3}}}_{\text{Thermal Noise}} + \underbrace{\frac{K_P}{C_{OX}(WL)_3 f}}_{\text{Flicker Noise}}$$
(5.15)

where k is Botlzmann's constant, T is the absolute temperature and f is the frequency. The noise from  $M_5$  may be modeled using the small signal equivalent in figure 5.4, where  $R_t$  is in parallel with  $r_{o1,2}$ ,  $r_{o3,4}$ . Assume the flicker noise current

from  $M_5$  is small, since  $g_{m5}$  is small in the triode region. In order to obtain the noise voltage at the output of the differential pair, the thermal noise current of  $M_5$  is multiplied by  $R_O$  to yield:

$$\overline{V_{n5,XY}^2} = \frac{4kT}{R_t} R_O^2 \tag{5.16}$$

Next, the noise contribution from the output stage (source follower) must be considered. The noise voltage referred to the output of the differential pair is:

$$\overline{V_{n6,8,XY}^2} = \underbrace{4kT\frac{2}{3}\left(\frac{1}{g_{m6}} + \frac{g_{m8}}{g_{m6}^2}\right)}_{\text{Thermal Noise}} + \underbrace{\frac{K_N}{C_{OX}f}\left(\frac{1}{(WL)_6} + \frac{g_{m8}^2}{(WL)_8g_{m6}^2}\right)}_{\text{Flicker Noise}}$$
(5.17)

The total noise at the output *XY* is then:

$$\overline{V_{n,out}^2} = 2\left[\overline{V_{n3,XY}^2} + \overline{V_{n5,XY}^2} + \overline{V_{n6,8,XY}^2}\right]$$
(5.18)

The factor of two is added because the topology is differential. The noise may be referred to the input by dividing by the gain of the differential pair and adding  $\overline{V_{n1}^2}$ , the input referred noise voltage from *M*1, *M*2:

$$\overline{V_{n,in}^{2}} = 2\left[\overline{V_{n1}^{2}} + \overline{V_{n,out}^{2}} / \left(g_{m1}^{2}R_{O}^{2}\right)\right]$$

$$= 8kT \left[\frac{2}{3g_{m1}} + \frac{2g_{m3}}{3g_{m1}^{2}} + \frac{2}{3g_{m1}^{2}R_{O}^{2}}\left(\frac{1}{g_{m6}} + \frac{g_{m8}}{g_{m6}^{2}}\right) + \frac{1}{2R_{t}g_{m1}^{2}}\right]$$

$$+ \frac{2K_{N}}{C_{OX}f} \left(\frac{1}{(WL)_{1}} + \frac{1}{(WL)_{6}} + \frac{g_{m8}^{2}}{(WL)_{8}g_{m6}^{2}}\right) + \frac{2K_{P}}{C_{OX}f} \left(\frac{g_{m3}^{2}}{(WL)_{3}g_{m1}^{2}}\right)$$

$$(5.19)$$

$$(5.19)$$

$$(5.19)$$

$$(5.19)$$

From (5.20), observe that  $g_{m3}$ ,  $g_{m8}$  should be small, and  $g_{m1}$  should be made as large as possible in order to minimise thermal noise. This may be done by increasing the drain current,  $I_D$ , or device width, since  $g_m = \sqrt{2\mu_n C_{OX} \frac{W}{L} I_D}$  in strong inversion and saturation. However, a higher  $I_D$  results in greater power consumption and limited output voltage swings, while a larger W results in larger input and output capacitances, and therefore a reduced speed. These tradeoffs should therefore be carefully balanced, based upon the requirement specifications. We also note that the thermal noise contribution of M5 is insignificant due to the presence  $g_{m1}^2$  in the demoninator of the contributing term in (5.20).

For 1/f noise, the best approach is to increase the area of M1, M2 and M7, M9 with W/L constant. Thus, the transconductance and thermal noise does not change, but device capacitance increases, which again highlights tradeoffs inherent in the design.

#### Simulations

The LN-VGA was implemented in AMS 0.35  $\mu m$  CMOS and simulated using Cadence. The size of the devices used in the preamplifier core are provided in table 5.2. Based on (5.12), the sizes of  $M_3$ ,  $M_4$  and  $M_5$  were chosen to provide a gain that is variable between 20 - 35 dB as  $V_c$  is increased from 0 - 1.5V. A theoretical plot of the gain,  $A_v$ , versus the control voltage,  $V_c$ , is presented in figure 5.6 and compared against the simulated result. The following parameters were used in the theoretical analysis:  $V_T = 25 mV$ ,  $K_P = \mu_o C_{OX} = 60 \mu A/V^2$ ,  $V_{thP} = -0.8V$ ,  $\lambda_N = \lambda_P = 0.04V^{-1}$ ,  $I_1 = I_2 = 75 \mu A$ . Both cases demonstrate a hyperbolic response, which approximates a linear-in-dB response for  $V_c = 0 - 1V$ . Over this


Table 5.2: Preamplifier device sizes.

Figure 5.6: Theoretical and simulated plots of gain  $(A_v)$  versus the control voltage  $(V_c)$ . Both cases demonstrate a hyperbolic response, which closely approximates a linear-in-dB response for  $V_c = 0 - 1V$ .

region, the response may be termed "quasi-exponential" and suitably compensates for exponential attenuation as a function of tissue depth.

Using large input devices  $(M_1, M_2)$  improves matching and the noise performance of the amplifier. Furthermore,  $M_6$  was sized to provide a sufficiently large tail current  $(150 \,\mu A)$  to meet the bandwidth, noise and linearity/DR specification. The simulated bandwidth is  $17.8 \,MHz$  and the input-referred noise is  $5.42 \,nV/\sqrt{Hz}$  at  $1 \,MHz$ , or  $7.49 \,\mu V$  integrated from  $1.25 - 3.75 \,MHz$ . Although LNAs are primarily concerned with amplifying weak signals that are just above the noise floor, the presence of larger signals causing intermodulation distortion must be considered. Distortion may be quantified by calculating the total harmonic distortion (THD), which is defined as the ratio of the RMS amplitudes of a set of higher harmonic frequencies to the RMS amplitude of the fundamental:

$$THD_{dB} = 20\log\left[\frac{\sqrt{V_2^2 + V_3^2 + \dots + V_N^2}}{V_1}\right]$$
(5.21)

The upper limit of the dynamic range (DR) is defined as the maximum input amplitude resulting in an acceptable level of distortion at the output (in this case, 1% or 40 dB). The simulated input voltage at 1% THD at the output (considering the first five harmonics) is  $19.2 mV_p$ , which is close to the specified value in section 5.1.2 (i.e. 20 mV).

Process variations should be considered due to variability in the size of  $M_5$ . Monte Carlo (MC) simulations were carried out to investigate the robustness of the design to process variations. Multiple circuit parameters were calculated using 100 MC samples. The results are presented graphically in figure A.1, appendix A, together with MC results from the other subcircuits. At  $V_c = 0V$ , the mean differential gain is 19.4*dB* and  $\sigma = 0.65 dB$  (i.e. 3% variation at 1 $\sigma$ ). Various other parameters were simulated, including the tail current ( $\mu_o = 155 \mu A$ ,  $\sigma = 1.5 \mu A$ ), CMFB voltage ( $\mu_o = 1.6V$ ,  $\sigma = 8.3 mV$ ) and CMRR ( $\mu_o = 82 dB$ ,  $\sigma = 8.4 dB$ ) indicating acceptable robustness to process variations.

## 5.3 Mixer

## 5.3.1 Prior Art

Mixers are non-linear devices used to translate one frequency to another. The input is applied to the mixer's RF port, and a mixing signal to the LO port. The output signal appears at the intermediate frequency (IF) port.

Multiplicative electronic mixers may be implemented in a wide variety of ways. Each design involves compromises between power consumption, noise figure (NF) and conversion gain. Two popular mixer types are considered below: the active, double balanced Gilbert cell and passive MOS ring mixers [85, 86].

Gilbert cells may be used as single and double balanced mixers, as shown in figure 5.7. The single-balanced version has a single-ended RF input, and generates both even and odd-order harmonics, while the balanced LO input suppresses evenorder LO harmonics. Double balanced Gilber cells have the advantage that even and odd harmonics are suppressed and that LO feedthrough is negligible. However, the penalty is higher power consumption and complexity [85].

For a fully differential Gilbert cell, the ideal voltage gain (or conversion gain) of the mixer is [84]:

$$A_{v} = \frac{V_{out,IF}}{V_{RF}} = \frac{2}{\pi} g_{m1} R_{L}$$
(5.22)



Figure 5.7: Gilbert cell multipliers (a) Single-balanced version (b) double balanced version.

where

$$g_{m1} = \sqrt{2\mu_n C_{ox} I_{D1} \left( W/L \right)_1}$$
(5.23)

These parameters must be chosen carefully so as to optimise the power consumption, noise figure (NF) and conversion gain.  $M_1$  represents the main noise source, which produces noise that is multiplied by the voltage gain in (5.22) and frequency translated from RF to IF. Thermal noise also appears at the output and is a function of the size of the resistor  $R_L$ .

The passive, double-balanced MOS "ring" mixer shown in figure 5.8 [86], also known as the FET-quad mixer, operates using four switches that turn ON and OFF. The RF signal is effectively multiplied by  $\pm 1$  at a rate determined by the LO signal.

Consequently, the output contains many mixing products that result from the oddharmonic Fourier components of the square wave. These must be filtered out by the following stages.

Two significant advantages of the MOS ring mixer include high linearly and low power consumption as the circuit requires no bias current. However, since the circuit is passive, the conversion gain,  $g_c$ , is always less than 0dB. Theoretically,  $g_c = 2/\pi = -3.9dB$  due to the IF energy splitting evenly between the sum and difference components [86]. However,  $g_c$  is generally slightly larger due to the finite switching time of the LO signal, and due to the finite "on" resistance of the MOS devices operating in the triode region [86]:

$$R_{ON} = r_{ds} = \frac{1}{\mu_n C_{OX} \left( W/L \right) \left( V_{GS} - V_{TH} \right)}$$
(5.24)

To decrease  $R_{ON}$ , a large LO amplitude and W/L ratio should be used. It should be noted that increasing W/L causes the bandwidth of the circuit to decrease, and necessitates the use of an even larger LO magnitude due to larger transistor capacitance. Larger capacitance also leads to increased LO clock feedthrough, so care should be taken in selecting W/L to balance these tradeoffs.



Figure 5.8: Passive MOS Ring Mixer.

## 5.3.2 Performance Specifications

The performance of the mixer may be quantified by measuring the conversion loss, 1 dB compression point, third-order intercept point (IP<sub>3</sub>) and noise factor.

### **Noise Figure**

Noise figure is commonly used to describe RF systems as it provides a means of determining the impact of noise on sensitivity. The *noise factor* is the power ratio of the signal-to-noise ratio (SNR) at the input ( $SNR_1$ ) divided by the SNR at the output ( $SNR_O$ ) [86]:

$$F = \frac{SNR_I}{SNR_O} = 1 + \frac{N_A}{N_I} \tag{5.25}$$

where  $N_I$  is the noise delivered to the input from the source and  $N_A$  is the input noise of the device. The *noise figure* is the decibel equivalent of the noise factor.

Input referred noise is also a useful metric when the noise at the input is undefined. Input referred noise is often specified as RMS spectral noise density (units  $nV/\sqrt{Hz}$ ), and is equal to the output noise divided by the circuit gain [86]. To get the total *noise power* (i.e. *noise floor*), the spectral density should be integrated over the bandwidth of the circuit.

### **Conversion loss**

Conversion gain is defined as the difference in power between the input RF power level and the output IF frequency power level. A negative gain implies conversion *loss*.

#### Linearity, 1 dB Compression Point and Intermodulation Distortion

Most linear systems have a fixed gain for a given frequency range, and there is a linear relationship between the input power and output power. However, as the input power continues to increase, a non-ideal mixer goes into compression where no further output increase occurs for an input increase. The 1 dB compression point is defined as the output power at which the amplifier's gain is 1 dB lower than the linear gain specification. Beyond this point, distortion begins to dominate the output. Two-tone  $IMD_3$  is the measure of the third-order intermodulation distortion products produced by a nonlinear device when two tones closely spaced in frequency are fed into its input. If  $f_1$  and  $f_2$  are the frequencies of the two tones, then the third-order distortion products occur on both sides of these tones at  $2f_2-f_1$  and  $2f_1-f_2$ . The IMD<sub>3</sub> spectral components at the IF output are generated as a result of the third-order frequency terms,  $|(2 \times f_1 - f_2) - f_{LO}|$  and  $|(2 \times f_2 - f_1) - f_{LO}|$ . Assuming that the amplitudes of these two tones are equal, the IMD<sub>3</sub> level is the difference between the power of the fundamental signals and the third-order products. The third-order intercept (TOI) method is a popular means of measuring the capability of the mixer to suppress two-tone  $IMD_3$  as a function of input power. The IP<sub>3</sub> point is a theoretical location on the IF output versus RF input curve where the output signal and the third-order distortion products become equal in power, as RF input power is raised.

### **5.3.3** Design and Simulation Results

A passive, double-balanced mixer was chosen over an active topology (Gilbert cell) due to its simplicity, high linearity and ease of operation using digital switching



Figure 5.9: Simulated output power versus input power for the passive mixer.

signals. A 3.3V LO signal is generated using an FPGA, and the W/L ratio is (10/0.35), which yields an "on" resistance of  $120 \Omega$ . This resistance is sufficiently low to prevent loading of the first stage. Simulation results indicate insignificant clock feedthrough and a MOS transition frequency of 5.5 GHz. The simulated conversion loss is -2.9 dB, which is slightly larger than the theoretical value due to the finite switching time of the LO signal and the finite "on" resistance,  $R_{ON}$ , of the MOS devices operating in the triode region. The simulated 1 dB compression point is at an input power of 8.6 dBm, and the IP<sub>3</sub> point is at 18 dBm. A graph of simulated output power versus RF input power is provided in figure 5.9.



Figure 5.10: High level schematic of the programmable gain amplifier (PGA). The gain of the amplifier is adjusted by switching between series combinations of resistors  $R_{1a}$ ,  $R_{1b}$  and  $R_{1c}$ .

## **5.4** Programmable Gain Amplifier (PGA)

In section 5.1.2, the PGA gain settings were specified to be 14dB, 20dB and 26dB, allowing the user to select the gain that maximises the dynamic range of the signal at the input to the ADC. The PGA is fully differential, and may be programmed using signals generated using digital control circuitry (FPGA/ASIC). Analogue MOSFET switches control a series combination of polysilicon resistors forming  $R_1$ , as shown in figure 5.10. The overall gain of the PGA is  $-R_2/R_1$ , where  $R_2 = 200k\Omega$ , and  $R_1 = 10k\Omega$ ,  $20k\Omega$  or  $40k\Omega$ . Thus, there are three gain settings: 5, 10 and 20. The digital circuitry controlling the gain of the PGA was designed by translating truth table 5.4 into the digital logic shown in figure 5.12. This circuit is a simple unary decoder that converts the digital code [AB] into the digital control signals  $[Q_1Q_2Q_3]$  which control switches  $S_1$ ,  $S_2$  and  $S_3$ . All transistor sizes are presented in table 5.3.



Figure 5.11: Schematic of the PGA core: a classic two-stage differential amplifier with Miller-compensation.

The core of the PGA shown in figure 5.11 is a classic two-stage, Millercompensated operational amplifier (differential amplifier followed by a commonsource stage). The amplifier is compensated with  $R_x = 27.5 k\Omega$  and  $C_x = 1 pF$  to improve the phase margin and ensure stability. However, the stability of the closed loop system is not only affected by the internal poles/zeros. The non-dominant pole introduced by output impedance  $R_o$  and the load capacitance  $C_L$  varies depending on the magnitude of the feedback resistor  $R_2$ . Increasing  $R_2$  moves the output pole closer to the origin of the complex frequency plane, thereby adding a phase shift to the system and degrading the phase margin. However, reducing  $R_2$  while keeping  $R_1$  constant decreases the closed-loop gain of the system. In this work,  $R_2$  is kept constant at  $200k\Omega$  and the input resistance,  $R_1$ , is varied to control the gain. The effect of altering  $R_1$  on the stability of the amplifier may be quantified by examining

| Table 5.3: Transistor size | es for the two-stage opera | ational amplifier | forming the core |
|----------------------------|----------------------------|-------------------|------------------|
| of the PGA.                |                            |                   |                  |
| Г                          | <b>T</b> . (               | <b>TT</b> 7 / 7   |                  |

| Transistor         | W/L   |
|--------------------|-------|
| M1, M2             | 600/4 |
| M3, M4             | 320/2 |
| M5, M6             | 480/8 |
| M7, M8             | 120/1 |
| M9, M10            | 180/2 |
| M11                | 480/2 |
| M12                | 320/1 |
| M13, M14, M15, M16 | 320/2 |

Table 5.4: Truth table defining the logical functionality of the unary decoder shown in figure 5.12.

| Gain | A | В | $S_1$ | $S_2$ | <b>S</b> <sub>3</sub> |
|------|---|---|-------|-------|-----------------------|
| 5    | 0 | 0 | 0     | 0     | 0                     |
| 10   | 0 | 1 | 1     | 0     | 0                     |
| 20   | 1 | 0 | 1     | 1     | 0                     |
| N/A  | 1 | 1 | 1     | 1     | 1                     |



Figure 5.12: Unary decoder used to program the gain of the PGA.

the loop gain for the circuit (derived in [87]):

$$A\beta = \frac{aR_1}{R_1 + R_2} \left(\frac{1}{R_o C_L s + 1}\right) \tag{5.26}$$

$$=\frac{a}{Z}\left(\frac{1}{R_o C_L s+1}\right) \tag{5.27}$$

where  $Z = (R_1 + R_2)/R_1$  and *a* is the second order model for the op amp:

$$a = \frac{K}{(s + \tau_1)(s + \tau_2)}$$
(5.28)

The stability is therefore affected by the gain of the amplifier, and the closed loop system may be compensated by altering  $R_1$  and  $R_2$ . With  $R_2$  fixed, decreasing the value of  $R_1$  increases the closed loop gain and shifts the loop-gain intercept down on the Bode plot, therefore improving the phase margin and stability of the circuit. This effect is illustrated in figure 5.13. Conversely, when  $R_1$  is increased, the closed-loop gain and phase margin decreases. For this reason, simulations were carried out to check the stability of the op amp for all gain settings. For a minimum gain of 5, the bandwidth is 4.6MHz and the phase margin (PM) is  $59.8^{\circ}$ , indicating that the amplifier is stable for all modes of operation. The common-mode feedback loop was also checked for stability ( $PM = 59.3^{\circ}$ ).

Linearity and noise performance are also key criteria affecting the dynamic range of the circuit. For a total static supply current of  $180 \mu A$  and gain of 20 dB, the integrated noise floor from 10 mHz to 1.25 MHz is  $31 \mu V / \sqrt{Hz}$  and the total harmonic distortion (THD) is 40 dB for a full-scale input voltage ( $V_{in} = 330 mV$ , f = 100 kHz). Thus, the maximum SNR or dynamic range is 80 dB. Simulated



Figure 5.13: Bode diagram illustrating the effect of altering  $Z = (R_1 + R_2)/R_1$ .

frequency responses for each gain value (14dB, 20dB and 26dB) are presented in figure 5.14.

As with the preamplifier, monte carlo analysis was used to verify the robustness of the PGA design to process variations. These results are presented in figure A.2(a)-(f), appendix A.

## 5.5 Image-Reject Filter

The topic of filter realisation is broad, encompassing many different design strategies and circuit topologies. Generally, there are two main techniques for realising integrated analogue filters. The first is using a switched-capacitor topology to implement a discrete filter. Because switch-capacitor filters require a clock frequency of at least twice the signal bandwidth, they are limited in their ability to process high-frequency signals [88, 89]. The second most popular technique is



Figure 5.14: Frequency response of the PGA for three gain settings (14dB, 20dB) and 26dB.

using continuous-time filters such as active RC filters or transconductance-C ( $G_m$ -C) filters. These filters have advantages over sampled-data filters in terms of high speed and low power dissipation, but generally exhibit poorer linearity and noise performance.

The proposed fully-differential filter was adapted from the single-ended active RC design in [90]. A classic, Miller-compensated two-stage operational amplifier core is used, similar to the topology used in the PGA. Three second order, Butterworth lowpass filters with the topology displayed in figure 5.15 were cascaded to form a sixth order lowpass filter. This yields sufficient attenuation (> 60 dB) of the IF image band and 2.5 MHz LO feedthrough. The transfer function of the circuit is



Figure 5.15: Fully differential active RC lowpass filter topology used in the AFE.

[90]:

$$H(s) = \frac{-1}{s^2 R_1 R_3 C_1 C_2 + s C_1 \left(R_1 + R_3 + \frac{R_1 R_3}{R_2}\right) + \frac{R_1}{R_2}}$$
(5.29)

The corner frequency must be low enough to attenuate unwanted frequencies, while larger than the IF bandwidth (1.25MHz). For quadrature demodulation, the LPF corner frequency is chosen to be 1.3MHz, and resistor/capacitor values are chosen accordingly. Two other bandwidths (500kHz) and 200kHz may also be selected by switching between different  $C_1$  and  $C_2$  values. The unary decoder described in section 5.4 is used to control a bank of switches which select the capacitors. This functionality is provided so as to test both the compressive SAB and quadrature SAB algorithms in hardware. All resistor/capacitor combinations are provided in table 5.5.

As with the previous stages, simulations were carried out in Cadence to quantify the performance of the filter. Simulated bode plots for each cutoff frequency are presented in figure 5.16. The combined sixth order filter draws  $188 \,\mu A$  with a 3.3V

| Bandwidth      | $R_1, R_2$  | $R_3$        | $C_1$         | $C_2$         |
|----------------|-------------|--------------|---------------|---------------|
| 100 <i>kHz</i> |             |              | 2.4 <i>pF</i> | 8.4 <i>pF</i> |
| 500 <i>kHz</i> | $36k\Omega$ | $72 k\Omega$ | 800 <i>fF</i> | 2.4 <i>pF</i> |
| 1.3 <i>MHz</i> |             |              | 400 <i>fF</i> | 1.2 <i>pF</i> |

Table 5.5: Resistor and capacitor values for various lowpass filter bandwidths.

supply, such that the power consumption is  $0.62 \, mW$ . Since a sixth order filter is used, the rolloff is  $60 \, dB/decade$ . Monte Carlo analysis was used to investigate the effect of process variations on bandwidth (e.g.  $\mu_o = 1.26 \, MHz$ ,  $\sigma = 178 \, kHz$ ), gain  $(\mu_o = -0.2 \, dB$ ,  $\sigma = 78 \, mdB$ ) and tail current  $(\mu_o = 30.5 \, \mu A, \sigma = 0.25 \, \mu A)$ . These results are presented graphically in figure A.3.



Figure 5.16: Frequency response of the LPF for three bandwidth settings (200 kHz, 500 kHz and 1.25 MHz).

# 5.6 Biasing Circuitry

The central biasing circuit in figure 5.17 is used to general a common mode voltage,  $V_{CM}$ , and biasing current,  $I_o$ . A low-dropout (LDO) regulator feedback loop holds the bandgap reference voltage (1.2*V*) across resistor  $R_A = 120 k\Omega$ , producing a bias current  $I_o$  equal to  $10 \mu A$ . A standard Cadence library bandgap reference was used to this end.  $I_o$  is mirrored to various branches supplying the preamplifier, PGA and lowpass filter for both I/Q signal paths. It is also mirrored through resistor  $R_B = 160 k\Omega$  to create the common mode voltage  $V_{CM} = 1.6V$ . Both  $R_A$  and  $R_B$  are high precision off-chip resistors (0.05% tolerance). Monte Carlo results are presented in figure A.3, appendix A, indicating sufficient robustness to process variations: over MC 100 samples, the mean bias current is  $10.88 \mu A$  ( $\sigma = 0.18 \mu A$ ) and the mean CM voltage is 1.61V ( $\sigma = 16 mV$ ).



Figure 5.17: Central biasing circuitry with LDO regulator feedback loop used to general biasing current  $I_o$  and common mode voltage  $V_{CM}$ .

## 5.7 Layout

The ultrasound AFE layout shown in figure 5.19 was designed using Cadence Layout Editor in AMS  $0.35 \,\mu m$  CMOS technology. The dimensions of the AFE (including the padring) are  $1.5 \times 1.5 \,mm$ .

**Matching** Special layout techniques were used to improve the matching between differential components. Particular care was taken in matching differential pairs and current mirrors. In circuits using these structures, device threshold differences of only a few millivolts can determine the performance and yield of a design. The common centroid technique was used extensively to improve matching - MOSFETs are split into "fingers" and then layout out in a symmetrical pattern. For example, if two devices, A and B, are to be matched, they may be split into two fingers and aligned using an A-B-B-A pattern. Dummy transistors are placed around the perimeter of grouped transistors in order to minimise the effect of asymmetric surroundings and unwanted etching at the edges. At a higher level, I and Q channels are layed out symmetrically so as to improve matching between the two channels and minimise phase errors.

Analogue/digital Separation Analogue and digital supplies were separated so as to isolate the analogue supply from digital noise. However, since a single substrate technology was used, noise may be seen at the bulk connection of the analogue circuitry. In order to alleviate this problem, critical noise-sensitive components such as input differential pairs were placed within guard rings which connect the substrate to a low-noise ground connection. Furthermore, a three bus model was used, which enables one to separate the substrate from the noisy digital VSS. Despite these efforts, minor digital clock noise can be seen at the output of the AFE.

**Pad Rings** The core area is surrounded by IO pads forming the pad-ring. Analogue pads are equipped with electrostatic discharge (ESD) protection diodes, as depicted in figure 5.18. This protects on-chip devices from large voltages - when the magnitude of the input voltage  $V_{in}$  exceeds the supply voltage by more than the diode threshold voltage, the pad will be shorted to either the positive or negative supply. The resistor protects the chip from large currents.



Figure 5.18: ESD protection diodes used in IO pads.

# 5.8 Summary

The design of the analogue front-end (AFE) is presented in this chapter. A general overview of the AFE is provided within the broader system-level context, followed by an analysis of the first stage in the AFE, a low-noise preamplifier, which is designed for time-gain control. Demodulation is carried out by a passive mixer and programmable lowpass filter. The bandwidth of the filter is digitally selectable such



Figure 5.19: The ultrasound AFE layout designed using Cadence Layout Editor in AMS  $0.35 \,\mu m$  technology.

that the AFE may be used for either quadrature or compressive SAB. The gain may also be varied using a programmable gain amplifier (PGA). Supply/common mode voltages and biasing currents are generated using the biasing circuitry presented in section 5.6. Finally, the physical layout of the AFE is discussed. The chip was fabricated and tested on a custom PCB. Measured results are presented in the following chapter.

# Chapter 6

# **System Integration and Validation**

This chapter reports on the system-level physical implementation of the SAB receiver, and presents experimental results at a system and circuit level. The experimental setup is described in section 6.1. The performance of each stage in the AFE is presented in section 6.2, together with a full transient analysis for multiple chips. In section 6.3, system-level results for the quadrature SAB method are presented. Lastly, section 6.4 presents system-level tests validating the functionality of the compressive SAB algorithm. Signal reconstruction as well as full B-mode imaging are demonstrated using the method.

# 6.1 Experimental Setup

A block diagram representing the SAB receiver experimental setup is shown in figure 6.1, illustrating the relationship between the AFE, PCB components and external devices. In order to perform measurements, a dedicated 2-layer printed circuit board (PCB) was designed to interface with the AFE integrated circuit (IC),



Figure 6.1: Block diagram representing the SAB receiver experimental setup, which illustrates the relationship between the AFE, PCB components and external devices.

as shown in figure 6.2. The board is powered by three AA batteries, which supply 4.5V to the board. Off-chip regulators (Analog Devices ADM7155) are used to generate the supply voltage (3.3V) for the IC.

The PCB hosts a Cesys EFM-02 embedded FPGA module based on the Xilinx Spartan-6LX® FPGA (XC6SLX150-3FGG484I). The FPGA is used to generate control signals for the IC (PGA gain, filter bandwidth and output stage selection), as well as digital mixing signals. The FPGA controls whether the IC uses a fixed internal gain control voltage for the preamplifier, or whether an external waveform generator is used. The FPGA also communicates with a ADC10D020 dual 10-bit ADC, which samples I and Q channels separately at 2.5*MHz*. The ADC communicates with the FPGA via a 20-bit parallel output bus. The quadrature SAB method and an internal UART module were implemented on FPGA. This module connects to an external FT232 USB to serial UART interface controlling communication with



Figure 6.2: Photograph of the PCB used for testing the AFE and beamforming algorithm on FPGA: (1) AFE (2) Spartan-6 on EFM-02 development board (3) UART FT232 chip USB connector (4) ADC10D020 Dual-Channel ADC. (5) ADM7155 voltage regulators.

the PC. All post-processing is carried out in MATLAB, which also handles PC-side serial communications.

A custom MATLAB program was written to control an external PicoScope® 5442B oscilloscope/arbitrary waveform generator (AWG). For *circuit-level tests*, the PicoScope directly records the output of the IC. Differential I/Q outputs and an analogue MUX output enable measurement of each stage in the AFE. For *system-level testing*, a custom MATLAB script was written to control the arbitrary waveform generator of the PicoScope. The script sequentially updates the AWG with RF

| <b>-</b> ,          | e                   |                              |                                  |                                 |  |
|---------------------|---------------------|------------------------------|----------------------------------|---------------------------------|--|
|                     | Preamplifier        | PGA                          | Lowpass Filter                   | Entire AFE                      |  |
| Supply Voltage      | 3.3 <i>V</i>        | 3.3V                         | 3.3 <i>V</i>                     | 3.3V                            |  |
| Power               | 0.9 <i>m</i> W      | 0.6 <i>mW</i>                | $1.5  mW  (6^{th} \text{order})$ | 7.9 <i>m</i> W                  |  |
| 3-dB bandwidth      | 6.6 <i>MHz</i>      | 2.6 <i>MHz</i>               | 1.85 <i>MHz</i> /                | 1.85 <i>MHz</i> /               |  |
|                     |                     |                              | 195 <i>kHz</i> / 510 <i>kHz</i>  | 195 <i>kHz</i> / 510 <i>kHz</i> |  |
| Gain                | 16.5 - 31  dB       | 15 <i>dB</i> /22 <i>dB</i> / | 1 <i>dB</i>                      | 31.5 - 53  dB                   |  |
|                     |                     | 25.5 <i>dB</i>               |                                  |                                 |  |
| Input Ref. Noise    | $5.42 nV/\sqrt{Hz}$ | $105  nV/\sqrt{Hz}$          | $202  nV / \sqrt{Hz}$            | $15.1  nV/\sqrt{Hz}$            |  |
|                     | (2.5 MHz)           | (1 kHz)                      | (1 kHz)                          | (2.5 MHz)                       |  |
| Integrated Noise    | 7.49 uV (1.25 –     | 0.11 <i>mV</i>               | 0.46 <i>mV</i>                   | 15.6 <i>uV</i>                  |  |
|                     | 3.75 <i>MHz</i> )   | (10 mHz -                    | (10  mHz -                       | (1.25 - 3.75 MHz)               |  |
|                     |                     | 1.25 <i>MHz</i> )            | 1.25 <i>MHz</i> )                |                                 |  |
| $V_{in}$ (THD = 1%) | 34.5 mV             | 304 <i>mV</i>                | 1.1 <i>V</i>                     | 11 mV                           |  |
| Dynamic range       | 67.2 <i>dB</i>      | 57 dB                        | 62 <i>dB</i>                     | 57 <i>dB</i>                    |  |

Table 6.1: Summary of performance for the active stages (preamplifier, PGA and lowpass filter) in the analogue front-end.

signals obtained from a synthetic aperture database stored on the PC. This data were previously captured on a Verasonics Vantage  $256^{TM}$  system using a P4-1 phased array (central frequency at 2.5MHz) with 96 active elements, with the assistance of Dr. Matthieu Toulemonde from the ULIS group. The single-ended RF signals from the AWG are fed through a PWB2010LB balun, which converts them into differential signals for the IC.

# 6.2 AFE Performance

Experiments using the setup described above were carried out to quantify the performance of each stage in the AFE. A summary of the performance for each active stage is provided in table 6.1. The experimental method and conditions for each measurement are discussed below.

### 6.2.1 Preamplifier

As explained in chapter 5, the preamplifier functions as a low-noise amplifier with variable gain. Time-gain control is implemented by sweeping the control voltage,  $V_c$ , linearly over time, yielding a quasi-exponential gain response. Results were obtained for 11 different chips, as shown in figure 6.3a. A linear-in-dB response may be approximated for an input voltage ranging from  $V_c = 0 - 1V$ . The gain tails off as the device enters saturation. The relative difference in gain between all 11 chips does not exceed 1.02 dB, indicating that the design is robust to process variations. However, the gain is on average 2.9 dB lower than the nominal simulated response, which may be due to a systematic difference in W/L between the simulated and fabricated circuit. However, this difference is inconsequential as it may be compensated for by altering the gain of the PGA.

The output referred noise of the amplifier was determined by grounding the input to the amplifier and measuring the output. The input referred (IR) noise is found by dividing the output referred noise by the gain of the amplifier (in this experiment, 16.5 dB at  $V_c = 0V$ ). The single-sided input referred noise spectrum is shown in figure 6.3b. At 2.5 MHz, the input referred noise is  $5.42 nV_p / \sqrt{Hz}$  (see table 6.1). The noise floor may be found by integrating the input referred noise (in  $nV / \sqrt{Hz}$ ) over the bandwidth of interest. Assuming a bandwidth of 1.25 – 3.75 MHz, the noise floor is calculated to be  $0.41 uV_p$ .

The upper limit of the dynamic range (DR) is defined as the maximum input amplitude resulting in an acceptable level of distortion at the output (in this case, 1% or 40*dB*). Considering 5 harmonics in this analysis, the THD reaches 40*dB* at an input amplitude of  $34.5 mV_{pp} = 17.5 mV_p$ . The dynamic range is then the *dB* 



Figure 6.3: Preamplifier experimental results: (a) Gain versus control voltage  $(V_c)$  for the preamplifier in 11 different chips. Time-gain control is implemented by sweeping the control voltage linearly over time, yielding a quasi-exponential gain response. (b) Input referred noise spectrum. (c) Total harmonic distortion versus the input voltage.

ratio between the largest amplitude (at 1% THD) and the noise floor. The DR is therefore equal to 67.2 dB, which meets the DB requirement in chapter 5.

### 6.2.2 Mixer

The performance of the mixer was quantified by measuring the conversion loss, 1 dB compression point and the third-order intercept point (IP<sub>3</sub>)

### **Conversion loss**

The conversion loss was found by calculating the difference in power between the input RF power level and the output IF power. As stated in chapter 5, the ideal gain of a passive, double-balanced mixer is -3.9 dB. However, the measured conversion loss was -2.8 dB. The measured value is larger than the theoretical value due to the finite switching time of the LO signal and the finite "on" resistance,  $R_{ON}$ , of the MOS devices operating in the triode region.

### Linearity, 1 dB Compression Point and Intermodulation Distortion

The mixer's 1 *dB* compression point and IP<sub>3</sub> point were measured by increasing the voltage at the input to the preamplifier, and then measuring the output of the preamplifier (input to the mixer) and the output of the mixer. These outputs had to be measured separately using the analogue MUX. The input testing range is limited by the linear region of the preamplifier ( $V_{in} < 34 \, mV$ ), meaning that the mixer could only be tested in the full range of 0 - 230 mV. Therefore, while the fundamental and third order harmonic curves in figure 6.4 do provide a measure of the linearity of the mixer, the 1 *dB* compression point could not be properly quantified as the mixer



Figure 6.4: Output power versus input power at 2.5MHz. There is a linear relationship between the input and output for the fundamental, up until the 1 dB compression point. The measured results also demonstrate third order intermodulation distortion and the extrapolated IP3 point.

could not be tested over a large enough input range. However, by extrapolating the curves, the IP<sub>3</sub> point may be approximated to be around -2dBm.

## 6.2.3 Programmable Gain Amplifier (PGA)

The gain of the PGA may be selected digitally to compensate for differences in transducers and to ensure that image saturation does not occur. The gain of the PGA for each setting in table 5.4 was measured using a 1 mV test signal at the input to the preamplifier. In order to calculate the gain, signals at the output of the mixer (input to the PGA) and at the output of the PGA were measured using the analogue MUX over the frequency range of 0 - 10 MHz. DC gain values are reported in table

6.1. The input referred noise  $(105 nV/\sqrt{Hz})$  was found by grounding the input to the preamplifier and measuring the same output signals. Note that the output noise from the preamplifier had to be subtracted from the input referred noise of the PGA to ensure that only the noise contribution from the PGA is considered. Since the signal occupies the baseband at this stage, the noise is integrated from 10mHz - 1.25MHz to yield a noise floor of 0.11 mV.

Similarly, THD may was calculated by sweeping the input voltage of the preamplifier over it's linear region, and measuring the output of the mixer and the PGA. THD is less than 1% for an input voltage range of  $0 - 152 \, mV$ . The dynamic range of the PGA is therefore  $20 \log \left(\frac{152 \, mV}{0.11 \, mV}\right) \approx 57 \, dB$ . This dynamic range figure is significantly lower than the simulated result (80 dB). This is because the input referred noise was approximately 10 times larger than the simulated value. However, the overall performance meets the required specification in table 5.1.

### 6.2.4 Lowpass Filter

The performance of the lowpass filter was measured using a similar experimental procedure to the method outlined in section 6.2.3. The results are reported in table 6.1. The bandwidth of the lowpass filter is digitally programmable with three settings. The transfer function of the filter for each setting was measured by sweeping the frequency of a  $1 \, mV$  test signal at the preamplifier input and measuring the magnitude at the input and output of the filter at the downconverted frequency. For example, if the frequency of the test signal at the preamplifier input was  $2.4 \, MHz$ , the magnitude of the frequency component at  $2500 - 2400 = 100 \, kHz$  was measured.



Figure 6.5: Frequency response of the lowpass filter for three bandwidth settings:  $f_c = 195 kHz$ ,  $f_c = 510 kHz$  and  $f_c = 1.85 MHz$ .

The transfer functions for each gain setting are shown in figure 6.5; the respective bandwidths are 195kHz, 510kHz and 1.85MHz. While the first two settings are close to the simulated values (200kHz and 500kHz), the third bandwidth differs by 550kHz from the ideal value (1.3MHz), which is due to process variations. The rolloff is between 23 - 25.5dB/oct, which is smaller than the simulated value for a sixth order filter ( $\sim 36dB/oct$ ). This is because measurements were taken close to the cutoff due to the limited bandwidth of the measurement device (Picoscope 5442B) and preamplifier. Furthermore, device or measurement instrument parasitics may affect the measured rolloff.



Figure 6.6: Transient plots of the I/Q envelop signals from 11 chips overlayed against the original RF signal.

## 6.2.5 Transient Analysis

Transient analysis was used to validate the functionality of the AFE as a whole and the robustness of the design to process variations. The bandwidth of the filter was set to 1.85 MHz, and a raw RF ultrasound signal was used as the input to the AFE. The differential I/Q output signals,  $I^+$ ,  $I^-$ ,  $Q^+$  and  $Q^-$ , were recorded during the reflection period, and the I/Q envelop was calculated as follows:

Envelop = 
$$\sqrt{(I^+ - I^-)^2 + (Q^+ - Q^-)^2}$$
 (6.1)

Measured results for 11 different ICs were obtained and overlayed against the original RF signal, as shown in figure 6.6. The degree of transient variance is minimal, indicating that the design is robust to process variations.

| Donon | Voor | Drogogg      | Target | AFE                | Power/          | Preamplifier    |                |                |                       |
|-------|------|--------------|--------|--------------------|-----------------|-----------------|----------------|----------------|-----------------------|
| raper | Tear | rrocess      | Trans- | Components         | channel         | Gain            | Dynamic        | Band-          | Input-                |
|       |      |              | ducer  |                    |                 |                 | Range          | width          | referred              |
|       |      |              |        |                    |                 |                 |                |                | Noise                 |
| This  | 2017 | 0.35 µm      | PZT    | LN-VGA,            | 7.9 <i>mW</i>   | 16.5-           | 67.2 <i>dB</i> | 6.6 <i>MHz</i> | $5.42  nV/\sqrt{H}$   |
| work  |      | CMOS         |        | mixer, PGA,        |                 | -31 dB          |                |                | (2.5MHz)              |
|       |      |              |        | filter             |                 |                 |                |                |                       |
| [7]   | 2017 | 0.18µm       | PZT    | Preamplifier,      | 0.135 <i>mW</i> | 24 <i>dB</i>    | 81 <i>dB</i>   | 9.8 <i>MHz</i> | $5.5  nV/\sqrt{Hz}$   |
|       |      | CMOS         |        | mixed-mode         |                 |                 |                |                | (5MHz)                |
|       |      |              |        | beamformer         |                 |                 |                |                |                       |
| [36]  | 2016 | 0.18µm       | CMUT   | Transimpedance     | 1.4 <i>mW</i>   | 104-            | -              | 8MHz           | $410 fA/\sqrt{Hz}$    |
|       |      | CMOS         |        | amplifier, filter, |                 | 116 <i>dB</i> Ω |                |                | (8MHz)                |
|       |      |              |        | ADC                |                 |                 |                |                |                       |
| [73]  | 2014 | 0.35 µm      | PMUT   | Transimpendance    | 0.8 <i>mW</i>   | 106 <i>dB</i>   | 50 <i>dB</i>   | 40 <i>MHz</i>  | $310 fA/\sqrt{Hz}$    |
|       |      | CMOS         |        | amplifier          |                 |                 |                |                | (20MHz)               |
| [65]  | 2013 | 0.35 µm      | CMUT   | Transimpedance     | 9 <i>mW</i>     | 107 <i>dB</i>   | -              | 25MHz          | -                     |
|       |      | CMOS         |        | amplifier, buffer  |                 |                 |                |                |                       |
| [79]  | 2011 | 0.35 µm      | LiNbO3 | Preamplifier,      | 49.53 <i>mW</i> | 25.8 <i>dB</i>  | -              | 82 <i>MHz</i>  | 2.9 <i>dB</i>         |
|       |      | BiC-         |        | active LP filter   |                 |                 |                |                |                       |
|       |      | MOS          |        |                    |                 |                 |                |                |                       |
| [74]  | 2009 | 90 <i>nm</i> | CMUT   | Transimpedance     | 598 µW          | 18.9 <i>dB</i>  | -              | 45 MHz         | $42 nV/\sqrt{Hz}$     |
|       |      | CMOS         |        | amplifier          |                 |                 |                |                | (30MHz)               |
| [75]  | 2009 | 1.5 µm       | CMUT   | Transimpedance     | 9 <i>mW</i>     | 106 <i>dB</i>   | -              | 25 MHz         | $280 fA/\sqrt{Hz}$    |
|       |      | CMOS         |        | amplifier          |                 |                 |                |                | (25MHz)               |
| [8]   | 2009 | 0.35 µm      | CMUT   | Preamplifier,      | 16.8 <i>mW</i>  | 5-20 <i>dB</i>  |                | 250 MHz        | -                     |
|       |      | CMOS         |        | VGA, ADC,          |                 |                 |                |                |                       |
|       |      |              |        | memory, filter,    |                 |                 |                |                |                       |
|       |      |              |        | transmitter        |                 |                 |                |                |                       |
| [35]  | 2008 | 1.5 µm       | CMUT   | Transimpendance    | 4  mW           | 112 <i>dB</i>   |                | 10 <i>MH</i>   | -                     |
|       |      | CMOS         |        | amplifier          |                 |                 |                |                |                       |
| [91]  | 2005 | $0.8\mu m$   | CMUT   | Transimpedance     | 2 mW            | 16 <i>dB</i>    |                | 11 <i>MHz</i>  | $6.5  nV / \sqrt{Hz}$ |
|       |      | CMOS         |        | amplifier          |                 |                 |                |                | (1 <i>MHz</i> )       |
| [78]  | 2004 | 0.35 µm      | PVDF   | Preamplifier       | 20 <i>mW</i>    | 12 <i>dB</i>    |                | 35 MHz         | $6.3  nV / \sqrt{Hz}$ |
|       |      | CMOS         |        |                    |                 |                 |                |                | (10 <i>MHz</i> )      |
| [76]  | 2002 | $0.8\mu m$   | CMUT   | Transimpedance     | -               | 22 <i>dB</i>    |                | 6.5 MHz        | $9.4 nV/\sqrt{Hz}$    |
|       |      | CMOS         |        | amplifier          |                 |                 |                |                | (10MHz)               |

Table 6.2: Performance comparison for various ultrasound analogue front-ends.

### 6.2.6 Performance Comparison

As explained in the introduction, demodulation is not usually carried out in the front-end, except in case of phase-rotation beamforming [20] and in CW doppler systems. Typical pulse-echo, B-mode receiver AFEs only incorporate preamplifiers/LNAs, lowpass filters and analogue-to-digital converters (ADCs). Furthermore, there is a great diversity of transducer types (CMUT/PMUT) requiring either voltage or current-mode processing. Different target applications also require different center frequencies, which significantly impacts power consumption and bandwidth. Table 6.2 highlights these differences. Power/channel is compared on a system-level basis. Comparative performance specifications are also provided for the preamplifier as it is the only common component in all citations. Note that the bandwidth of the proposed system is lower than the other works as the transducer center frequency is lower (2.5 MHz). The input referred (IR) noise of the proposed preamplifier/LNA is marginally lower than the state-of-the-art PZT receiver in [7]. The noise of the PGA referred to the input of the LNA is  $2.96 nV / \sqrt{Hz}$  at the largest gain value, indicating that the front-end noise is dominated by that of the preamplifier. The DR of the proposed design (67.2 dB) is 13.8 dB poorer than [7]. This is not a significant issue as the dynamic range meets the required specification for this application. The power/channel is comparatively higher than [7, 92, 75, 76], given the operating frequency. This is largely because a sixth order filter was required to achieve proper image rejection after the mixing operation. However, as discussed in the following section, the overall power consumption per channel (including beamforming) is comparable to state-of-the-art mixed signal front-ends. Furthermore, the area, complexity and cost is reduced with only a single channel.

## 6.3 Quadrature SAB Results

System-level SAB tests were carried out using RF data captured using a Verasonics Vantage  $256^{TM}$  system (central frequency at 2.5MHz) with 96 active elements. The synthetic aperture method was used during transmission/reception, and RF signals were sampled at 10MHz. Two phantoms were used for imaging tests: a wire phantom containing  $8 \times 3$  cross-sectional wires (figure 6.7) and a hyperechoic cyst (figure 6.8). All signals were multiplexed through the AFE, and the resultant I/Q signals were sampled at 2.5MHz. Quadrature SAB was carried out on a Spartan 6 FPGA, as explained in section 4.3, chapter 4. The resultant images were post processed in MATLAB - a Hilbert transform was used to perform envelop detection and the image was then logarithmically compressed.

In figure 6.7, B-mode images of the phantom are compared for RF-domain beamforming and quadrature beamforming. The normalised root-mean-squareerror (NRMSE) may be used as a quantitative measure of image quality. The NRMSE is computed on a scan-line/columnwise basis by comparing each pixel in the RF-beamformed image,  $g_{j,k}$ , to that of the quadrature image,  $f_{j,k}$ , as follows:

$$NRMSE = \frac{1}{K} \sum_{k=1}^{K} \frac{\sqrt{\frac{1}{J} \sum_{j=1}^{J} (f_{j,k} - g_{j,k})^2}}{\max(g_{j,k}) - \min(g_{j,k})}$$
(6.2)

where  $max(g_{j,k})$  and  $min(g_{j,k})$  represent the maximum and minimum values of each column in  $g_{j,k}$  respectively. For  $N_t = 48$ , the NRMSE is 12.5%. When decreasing the number of transmissions to 16, the NRMSE increases to 16.5% due to a reduction in the SNR caused by larger sidelobes and increased speckle noise. The reduction in image quality is qualitatively evident in figure 6.7a and in the lateral



Figure 6.7: Images of a phantom containing  $8 \times 3$  cross-sectional wires. In (a) and (b), quadrature beamforming is carried out with  $i_{max} = 8$  and 48 transmit elements respectively (f# = 2.5,  $N_a = 1$ ). In (c) beamforming is carried out in the RF domain with 48 elements (f# = 2.5).


Figure 6.8: Images of a phantom containing a hyperechoic cycst. In (a), beamforming is carried out in the RF domain with 48 transmissions and F# = 2.5. In (b)-(c), quadrature beamforming is carried out with 48 and 16 transmissions respectively (F# = 2.5).



Figure 6.9: Lateral beamplots for 3, 8 and 48 transmitter positions (f # = 2.5, z = 665 mm).

beamplots in figure 6.9, where the greyscale magnitude is plotted against lateral width. Again, observe that decreasing  $i_{max}$  leads to a reduction in the SNR and poorer lateral resolution. Furthermore, the SNR and resolution decrease as a function of depth. However, according to equation (4.9), there is an inverse relationship between the  $i_{max}$  and frame rate. Decreasing  $i_{max}$  from 48 to 8 elements leads to an increase in frame rate from 2.5Hz to 7.5Hz. Similarly, for a constant frame rate, a sixfold reduction in the number of transmissions leads to a proportional decrease in power consumption, since the system clock frequency may be decreased or the area/logic capacity reduced as fewer pixels are calculated in parallel.

The f# is also an important parameter affecting the width of the main lobe, and thus the lateral resolution. In figure 6.10, it can be seen that the main lobe width increases as f# increases from 0.5 to 3. Thus, better focusing is achieved with a



Figure 6.10: Lateral beamplots for f # = 0.5, 2 and 3 (z = 665 mm,  $i_{max} = 48$ ).

smaller *f*#. The *f*# also affects the relative contrast, as shown in figure 6.11. For RF and quadrature beamforming, the contrast increases to a maximum of 40 dB and 40.5 dB at *f*# = 2.2, after which it gradually decreases.

A comparison of various state-of-the-art beamforming architectures is provided in table 6.3, which does not include software-level beamformers. The key advantage of the proposed architecture is a reduced number of analogue receiver (Rx) channels and significantly reduced system complexity. Delay resolution is lower than that of prior art due to a relatively low oversampling factor. Future work could involve increasing the delay resolution by increasing the interpolation factor so as to achieve better focusing. In order to decrease system complexity, the natural tradeoff is frame rate - for  $i_{max} = 16$ , the frame rate is 3 - 4 times lower than prior art. For a single analogue channel ( $N_a = 1$ ), the maximum frame rate is limited by the reflection

|                | Table 6.3: Perforr           | nance ( | comparison             | for beamforming                                                | architecture         | s targeting              | various app       | olications.             |                             |
|----------------|------------------------------|---------|------------------------|----------------------------------------------------------------|----------------------|--------------------------|-------------------|-------------------------|-----------------------------|
| Paper          | Technology                   | Year    | Channels               | Beamformer<br>Architecture                                     | Center<br>Freq       | Delay<br>Resolu-<br>tion | Frame<br>Rate     | Power/<br>chan-<br>nel* | Target<br>Application       |
| This work      | Spartan-6 <sup>TM</sup> FPGA | 2017    | 64 (1 Rx<br>channel)   | Quadrature, single<br>channel, digital<br>SAB                  | 2.5 <i>MHz</i>       | 100 <i>ns</i>            | 7Hz               | 4.1  mW                 | Miniature,<br>B-mode        |
| [7]            | 0.18 μm CMOS                 | 2017    | $32 \times 32$         | Analogue,<br>sample-and-hold<br>sub-array<br>beamformer        | 5 <i>MHz</i>         | 30 <i>ns</i>             | 44.4 <i>vol/s</i> | 0.27 <i>mW</i>          | 3D imaging<br>probe         |
| [6]            | 0.13 μm NAND                 | 2015    | 32                     | Parallel digital<br>delay-and-sum                              | 3.5 MHz              | 17.8 <i>ns</i>           | 30Hz              | 9.5 <i>mW</i>           | Portable<br>ultrasound      |
| [93]           | 0.13 μm CMOS                 | 2015    | 32                     | Analogue,<br>sample-and-hold<br>sub-array<br>beamformer        | 1.25 <i>MHz</i>      | 1.75-<br>2.5 <i>ns</i>   | ı                 | 8.9 <i>mW</i>           | Portable/3D<br>ultrasound   |
| [4, 5]         | 0.35 μm CMOS                 | 2012    | 8                      | Parallel<br>delay-and-sum<br>using analogue<br>delay cells     | 30-<br>50 <i>MHz</i> | 1.75-<br>2.5 <i>ns</i>   | ı                 | 8.4 <i>mW</i>           | Intravascular<br>ultrasound |
| [3]            | Spartan-3 FPGA               | 2012    | 32 (16 Rx<br>channels) | Pseudo-dynamic,<br>extended aperture,<br>digital<br>beamformer | 3.5 MHz              |                          | 30 <i>Hz</i>      |                         | Portable<br>ultrasound      |
| *beamformer on | ly                           |         |                        |                                                                |                      |                          |                   |                         |                             |

### CHAPTER 6. SYSTEM INTEGRATION AND VALIDATION 123



Figure 6.11: Contrast relative to -50 dB for various f# values (z = 665 mm,  $i_{max} = 48$ ).

or acquisition time. The frame rate can only be increased for the same  $i_{max}$  if the number of parallel analogue channels ( $N_a$ ) is increased to 2 or more at the expense of increased power consumption.

In this work, images were formed using the fundamental (2.5 MHz). However, *second harmonic imaging* may also be carried out by doubling the mixing frequency to 5MHz and filtering the result. As discussed in chapter 2, this improves the contrast of the image. However, second harmonic images results were not obtained here as the sampling rate of the ultrasound dataset was insufficiently high. Future work could involve obtaining a second harmonic image using RF data sampled at a higher frequency (e.g. 20MHz).

## 6.4 FRI Compressive SAB Results

#### 6.4.1 Ultrasound Signal Reconstruction

In order to validate the functionality of the compressive sensing algorithm in hardware (prior to beamforming), experiments were carried out using a single A-mode signal derived from the database described above. The A-line signal may be modeled as a 1D stream of Gaussian pulses with width  $\sigma = 3 \times 10^{-7}$ . After demodulating and filtering below the Nyquist frequency, each I/Q signal is sampled at frequency  $f_s$  and then reconstructed using the method in [46]. The results of the experiment are shown in figure 6.12. The number of samples per time window  $\tau$  is N = 2L, where L is the number of Gaussian pulses per period. Three experiments were carried out for each cutoff frequency in the AFE. The parameters for these experiments are defined in table 6.4.

The original RF sampling frequency is 10MHz. Thus, the sampling rate (for both I and Q) is reduced by a factor of 12.8, 4.9 and 1.4. For the quadrature method described above, the combined I/Q sampling rate is 5MHz, such that the overall sampling rate (5MHz) is reduced by a factor of 2 from the RF sampling rate. Thus, for L = 60, there is no advantage in using compressive sensing as the sampling rate is higher than the ideal I/Q Nyquist sampling rate. There is thus a tradeoff between

Table 6.4: Parameters for FRI compressive sensing experiments demonstrating lowrate sampling and reconstruction.

|       | Experiment 1              | Experiment 2              | Experiment 3              |
|-------|---------------------------|---------------------------|---------------------------|
|       | $(f_c = 195  kHz, F = 4)$ | $(f_c = 510  kHz, F = 4)$ | $(f_c = 1.85 MHz, F = 4)$ |
| $f_s$ | 390 <i>kHz</i>            | 1.02 <i>MHz</i>           | 3.7 <i>MHz</i>            |
| L     | 7                         | 17                        | 60                        |



Figure 6.12: In (a), the original RF signal is overlayed against the ideal I/Q envelop generated in software. Low-rate samples are obtained using the hardware front-end and the I/Q envelop is reconstructed using FRI CS with the following parameters: (b) L = 7 (c) L = 17 (d) L = 60.

L, the reconstruction accuracy and the sampling rate. To achieve higher accuracy, both L and F should be large, but this results in an increased sampling rate and power consumption. The circuit topology realising the CS framework should be tuneable to maximise the performance and minimise the sampling rate.

#### 6.4.2 B-Mode Imaging

Finally, the compressive SAB architecture was evaluated by producing a full Bmode image using the RF dataset. The RF signals were sequentially demodulated using the cutoff frequencies in experiment 1 and 2, corresponding to *L* values of 7 and 17. The I and Q components were reconstructed, and beamforming was carried out in MATLAB using the quadrature SAB algorithm. The lateral beamplots and reconstructed images are presented in figures 6.13 and 6.14 respectively. For L = 7, the SNR and image contrast is poor since fewer Guassian pulses are used to reconstruct the I/Q signals. Increasing the number of Gaussian pulses *L* increases the reconstruction accuracy and thus the lateral resolution and image quality. However, increasing *L* eventually pushes the low-rate sampling above that of the Nyquist quadrature sampling frequency.

The NRMSE may be used to quantify the image quality in order to compare it to that of the quadrature SAB architecture. Before the NRMSE is calculated, the systematic time delay error introduced by the filter is eliminated by time shifting the data in order to align it with the RF data. For L = 7 and L = 17, the NRMSE is 26% and 22% respectively, in comparison with the "ideal" RF case. Recall that the NRMSE for the quadrature architecture was 13% - i.e. 9% lower for the same  $i_{max}$  value.



Figure 6.13: Lateral beamplots ( $i_{max} = 48$ ,  $f^{\#} = 2.5$ , z = 66.5 mm) demonstrating the effect of L on the lateral resolution and magnitude of the main lobe.

The compressive SAB architecture uses identical analogue hardware components to the quadrature SAB architecture. However, the front-end does not carry out beamforming computations, but merely compresses the signal in the analogue domain, and transmits low-rate samples to a computational back-end for image reconstruction. This effectively lowers the sampling rate, power consumption and data bandwidth requirements of the transmission link. However, the experimental results shown indicate that *L* should be increased beyond 17 in order to achieve image quality that is comparable to the quadrature SAB case. This in turn increases the bit-rate and power consumption of the transmission link. For instance, for L = 17 $(f_s = 1.05 MHz)$ , the required bit rate is  $2 \times 1.05 \times 10 = 21 Mbps$ , which is 4.7 times lower than the bit rate required to transmit RF samples at 100 Mbps. Transmission at this frequency is feasible using a typical 2.4 GHz 802.11g transceiver, for example, which operates up to a maximum of 54 Mbps. Beyond this, the transmission link constrains the frame rate. As discussed before, the frame rate is also



Figure 6.14: Images of a phantom containing  $8 \times 3$  cross-sectional wires. Compressive SAB was carried out with 48 transmit elements (F # = 2.5), and (a) L = 7 and (b) L = 17. In (c) beamforming is carried out in the RF domain with 48 elements.

SAB case). This limits the frame rate to 4Hz - an significant disadvantage of compressive SAB compared to quadrature SAB.

## 6.5 Summary

This chapter reports on the hardware implementation the SAB receiver, and presents experimental imaging and circuit-level results. First, the experimental setup is described in section 6.1. A custom PCB was designed to interface with the AFE and host the FPGA beamformer. Detailed circuit-level results are presented for each stage in the AFE (preamplifier, mixer, programmable gain amplifier and lowpass filter), together with a full transient analysis for multiple chips. System-level results for the quadrature SAB method are also presented in order to validate the resultant image quality. Results were also obtained for the compressive SAB architecture, and compared against the quadrature architecture. In general, the quadrature SAB approach yields superior image quality, with greater flexibility to adapt the parameters of the imaging algorithm in hardware. Before concluding, we now proceed to discuss an alternative circuit-level topology using a pseudodifferential log-domain demodulator. This topology compliments the signal chain presented in the preceding chapters, and provides an alternative means of processing signals in the current-domain.

# Chapter 7

# **Log Domain Demodulator**

# 7.1 Introduction

Most ultrasound receivers employ a low-noise preamplifier or transimpedance amplifier in the first stage of the signal processing chain. The design in chapter 5 followed this approach, where well-established voltage-mode signal processing blocks are employed. However, a strong case can be made for the utility of current-input, current-output (current-mode) circuits based upon their enhanced dynamic range, tunability and high operating frequency range in BiCMOS [94, 15, 95]. In this chapter, we explore a current-mode, log-domain circuit architecture which employs *companding*, a well-known and widely-used principle in communication systems [96]. Companding systems are a subclass of externally linear internally nonlinear (ELIN) systems [97]. The signal is compressed prior to entering the signal processor, in order to fit within the limited dynamic range of the processor. After processing, the signal is expanded again to occupy a large dynamic range. Logdomain companding circuits exploit the exponential I-V characteristics of BJTs or MOS devices in weak inversion, thereby reducing the voltage swings at internal nodes [95]. This allows for a high dynamic range with low supply voltages [95]. Such circuits operate in accordance with the translinear principle [98] and may be synthesized using a range of systematic methods [99, 95, 100, 101].

In section 7.2, a novel log-domain demodulator design is proposed based upon the "multer" topology in [15]. Inputs signals are compressed logarithmically and then filtered by a non-linear filtering block. Multiplication of two current signals may be carried out by simply adding their logarithmically compressed base emitter voltages. The input stage to the low-pass filter is therefore modified to add base-emitter voltages of two input tones, which results in a single stage providing multiplication and filtering. The topology is therefore termed a "multer" for short, and is an effective means of demodulating AM signals. However, in class-A operation, RF and LO current signals are offset by a DC bias value. Since the circuit implements multiplication of the RF and LO inputs, this leads to unwanted DC terms in the mixing product. We propose a multer-based demodulator using a pseudodifferential class-AB architecture to maximise the dynamic range and avoid multiplying bias currents. The demodulator employs a geometric mean current splitter [102], such that positive and negative portions of the signal are processed separately and combined after demodulation. The circuit is simulated using the commercially available  $0.35 \,\mu m$  BiCMOS technology instead of CMOS. The maximum useful frequency for MOS devices is the transition frequency, which in weak inversion is typically less than a few MHz. BJTs provide provide a wide-bandwidth capability (GHz), which is suitable for RF ultrasound applications. A key advantage of the proposed topology is that the bandwidth/gain may be tuned electronically, based upon the requirements of the application.

In section 7.3, non-idealities affecting the performance of the circuit are considered, such as finite  $\beta$  gain,  $V_{BE}$  mismatch and parasitic resistance. Finally, simulation results are presented in section 7.4, including a full transient analysis on real ultrasound data.

# 7.2 Current-Mode Analogue Demodulation

For a simple A-line scan, the reflected signal comprises a series of N received echoes given by:

$$R(t) = \sum_{n=1}^{N} R_n(t)$$
(7.1)

The reflected pulse waveform  $R_n(t)$  has an envelop  $Ae^{-\beta t^2}$  modulated at carried frequency  $\omega_c$ . Demodulation may be accomplished by first mixing the signal,  $I_1$ , with a reference carrier,  $I_2$ , and then filtering the result. In class A operation, the input signals  $I_1$  and  $I_2$  are biased by  $I_{b1}$  and  $I_{b2}$ , such that:

$$I_1 = Ae^{-\beta t^2} \cos(\omega_c t) + I_{b1}$$
(7.2)

$$I_2 = B\cos\left(\omega_c t\right) + I_{b2} \tag{7.3}$$



Figure 7.1: Geometric mean current splitter.

Multiplication yields the following product:

$$I_{1} \times I_{2} = \left(Ae^{-\beta t^{2}}\cos(\omega_{c}t) + I_{b1}\right) \left(B\cos(\omega_{c}t) + I_{b2}\right)$$
  
$$= ABe^{-\beta t^{2}}\cos(\omega_{c}t)\cos(\omega_{c}t) + \left(Ae^{-\beta t^{2}}I_{b2} + BI_{b1}\right)\cos(\omega_{c}t) + I_{b1}I_{b2}$$
  
$$= \frac{1}{2}ABe^{-\beta t^{2}} + \frac{1}{2}ABe^{-\beta t^{2}}\cos(2\omega_{c}t) + \left(Ae^{-\beta t^{2}}I_{b2} + BI_{b1}\right)\cos(\omega_{c}t) + I_{b1}I_{b2}$$
  
(7.4)

The last two terms in (7.4) are non-idealities caused by the bias currents  $I_{b1}$  and  $I_{b2}$ . The first term is the desired envelop and the second term is the  $2\omega_c$  image component.

In class B operation, the circuit does does not require bias currents as the bipolar input current is decomposed into two positive currents,  $I_+$  and  $I_-$ , for separate processing. This may be realised using a translinear geometric-mean current splitter [102], as shown in figure 7.1, which produces the two positive output currents from

the input current *I*<sub>in</sub>:

$$I_{+,-} = \pm \frac{I_{in}}{2} + \sqrt{\left(\frac{I_{in}}{2}\right)^2 + I_q^2}$$
(7.5)

where  $I_q$  is the quiescent current of  $i_{+,-}$ . This equation may be realised under the following two conditions:

$$I_{in} = I_{+} - I_{-} \tag{7.6}$$

$$I_q^2 = I_+ I_- (7.7)$$

Therefore, if the bipolar currents,  $I_1$  and  $I_2$ , are split into positive and negative components, multiplication yields:

$$I_{1} \times I_{2} = (I_{1+} - I_{1-}) (I_{2+} - I_{2-})$$
  
=  $I_{1+}I_{2+} - I_{1+}I_{2-} + I_{1-}I_{2-} - I_{1-}I_{2+}$   
=  $(I_{1+}I_{2+} + I_{1-}I_{2-}) - (I_{1+}I_{2-} + I_{1-}I_{2+})$  (7.8)

Practically, the four terms in (7.8) may be implemented using four current multipliers for each term in (7.8) and summing their output currents, as illustrated in figure 7.2. The two sums of products are filtered using two identical class A log-domain filters, and then subtracted by means of a current sink. This topology is termed a *pseudodifferential log-domain demodulator*. A fully analysis of the circuit level dynamics are presented below.



Figure 7.2: High level architecture of class AB demodulator / CS kernel.

#### 7.2.1 Pseudodifferential Demodulator Implementation

The "multer" topology proposed in [15] provides a compact means of multiplying and then filtering currents. By superposition, each term in (7.8) may implemented separately using a multer block. However, further simplification may be achieved by combining two multer blocks to form the new circuit topology in figure 7.3.

The transfer function for this circuit may be derived using the Bernoulli Cell Formalism [95]. We begin by recognising that the BJT collector current,  $I_{C1}$ , is exponentially related to the base-emitter voltage, as described by the Shockley equation for transistor  $Q_6$ :

$$I_{C1} = I_{S}e^{\frac{V_{BE}}{V_{T}}} = I_{S}e^{\frac{V_{B}-V_{E}}{V_{T}}}$$
(7.9)

where  $I_S$  is the diode's reverse saturation current,  $V_{BE}$  is the base-emitter voltage,  $V_T$  is the thermal voltage (approximately  $25 \, mV$  at room temperature) and  $I_{C1}$  is the



Figure 7.3: Log-domain demodulator circuit which multiplies currents  $I_1^v$ ,  $I_2^v$  and  $I_1^L$ ,  $I_2^L$ , sums their products and filters the result.

collector current. Differentiating yields:

$$\dot{I_{C1}} = I_{C1} \frac{\dot{V_B} - \dot{V_E}}{V_T} = I_{C1} \frac{\dot{V_B}}{V_T} - I_{C1} \frac{I_{cap}}{C_1 V_T}$$
(7.10)

where  $I_{cap}$  is the current through  $C_1$ . Writing KCL at node 1 gives:

$$I_{C1} + v = I_{d1} + I_{cap} \tag{7.11}$$

Rearranging 7.10 and substituting in  $I_{cap}$  yields:

$$\dot{I_{C1}} = I_{C1} \frac{\dot{V_B}}{V_T} - \frac{I_{C1} \left( I_{C1} + v - I_{d1} \right)}{C_1 V_T}$$
(7.12)

$$\dot{I_{C1}} - \left[\frac{\dot{V_B}}{V_T} + \frac{I_{d1} - v}{C_1 V_T}\right] I_{C1} + \frac{I_{C1}^2}{C_1 V_T} = 0$$
(7.13)

$$\dot{T}_1 + \left[\frac{\dot{V}_B}{V_T} + \frac{I_{d1} - \nu}{C_1 V_T}\right] T_1 - \frac{1}{C_1 V_T} = 0$$
(7.14)

where the substitution  $I_{C1} = 1/T_1$  is made. Now,  $V_B$  is the logarithmic product of  $I_1^L$ and  $I_2^L$ :

$$V_B = V_T \ln\left(\frac{I_1^L}{I_S}\right) + V_T \ln\left(\frac{I_2^L}{I_S}\right) = V_T \ln\left(\frac{I_1^L I_2^L}{I_S}\right)$$
(7.15)

$$\Rightarrow \dot{V}_B = V_T \frac{d}{dt} \left\{ \ln \left( I_1^L I_2^L \right) \right\}$$
(7.16)

Substituting (7.16) into (7.14) and rearranging gives:

$$C_1 V_T \frac{d}{dt} \left\{ \ln \left( T_1 I_1^L I_2^L \right) \right\} + (I_{d1} - \nu) = \frac{1}{T_1}$$
(7.17)

The second BC is characterised by a BJT with collector current  $I_{C2} = 1/T_2$  with a grounded capacitor,  $C_2$ . Using the procedure above, one may derive the dynamical equation for the second BC:

$$C_2 V_T \frac{d}{dt} \left\{ \ln \left( T_2 T_1 I_1^L I_2^L \right) \right\} + I_{d2} = \frac{1}{T_2}$$
(7.18)

A new set of state variables is now defined as follows:

$$w_1 = T_1 I_1^L I_2^L \tag{7.19}$$

$$w_2 = T_2 T_1 I_1^L I_2^L = T_2 w_1 \tag{7.20}$$

Recognising that  $\frac{d}{dt} \{\ln w\} = \frac{\dot{w}}{w}$ , and substituting the state variables into equations (7.17) and (7.18), we obtain the following system of linear ODEs:

$$C_1 V_T \dot{w}_1 + (I_{d1} - v) w_1 = I_1^L I_2^L \tag{7.21}$$

$$C_2 V_T \dot{w}_2 + I_{d2} w_2 = w_1 \tag{7.22}$$

Equations for translinear loops  $Q_0Q_1Q_2Q_6Q_7Q_8$  and  $Q_8Q_7Q_6Q_3Q_4Q_5$  are defined by the following equations:

$$I_1^L I_2^L v = I_1^v I_2^v \frac{1}{T_1}$$
(7.23)

$$I_1^L I_2^L I_{o1} = \frac{1}{T_1} \frac{1}{T_2} I_{out}$$
(7.24)

Hence, one may rewrite equations (7.21) and (7.22) as:

$$C_1 V_T \dot{w}_1 + I_{d1} w_1 = I_1^L I_2^L + I_1^v I_2^v \tag{7.25}$$

$$C_2 V_T \dot{w}_2 + I_{d2} w_2 = w_1 \tag{7.26}$$

Applying the Laplace Transform, solving for  $W_1(s)$  and  $W_2(s)$ , and substituting in (7.24) finally yields the following lowpass transfer function:

$$I_{out}(s) = \frac{\frac{I_{o1}}{C_1 C_2 V_T^2}}{\left(s + \frac{I_{d1}}{C_1 V_T}\right) \left(s + \frac{I_{d2}}{C_2 V_T}\right)} \mathscr{L}\left\{I_1^L I_2^L + I_1^v I_2^v\right\}$$
(7.27)

where  $\mathscr{L}\left\{I_{1}^{L}I_{2}^{L}+I_{1}^{v}I_{2}^{v}\right\}$  is the input to the filter. It is evident that the filter comprises two first order, cascaded lowpass stages. The bandwidth of each stage may be adjusted independently by varying the currents  $I_{d1}$  and  $I_{d2}$ . The gain may be adjusted by varying  $I_{o1}$ . Note that the input  $\mathscr{L}\left\{I_{1}^{L}I_{2}^{L}+I_{1}^{v}I_{2}^{v}\right\}$  corresponds to the first bracket in (7.8). The second bracket may be implemented by duplicating the circuit in figure 7.3. The output current of one circuit is added and the current from the other circuit is subtracted by means of a current sink. The entire circuit-level implementation is presented in figure 7.4.

#### 7.2.2 Biquadratic Implementation

Instead of cascading two first order lowpass filters, a biquadratic architecture may also be used, as shown in figure 7.5. The main advantage of a biquad is that the Q-factor may be altered to achieve a faster response. Current-mode topologies are



Figure 7.4: Log domain demodulator circuit showing the current multiplier, second order companding filters and current sink. The currents  $I_{1,2}^L$  and  $I_{1,2}^H$  are derived from two current splitters.

advantageous in this regard, as the gain, natural frequency and *Q*-factor may easily be adjusted by tuning currents, as the following analysis shows.

The derivation for the transfer function for a biquad proceeds in a similar fashion to the derivation in section 7.2.1, where  $I_{d1}$  is replaced with the current *u*:

$$C_1 V_T \dot{w}_1 + (u - v) w_1 = I_1^L I_2^L \tag{7.28}$$

$$C_1 V_T \dot{w}_2 + I_{d2} w_2 = w_1 \tag{7.29}$$

Equations for translinear loops  $Q_6Q_7Q_8Q_9$ ,  $Q_0Q_1Q_2Q_{10}Q_{11}Q_{12}$  and  $Q_{12}Q_{11}Q_{10}Q_3Q_4Q_5$  are defined by the following equations:

$$uI = u\frac{1}{T_2} = I_{oz}^2 \tag{7.30}$$

$$I_1^L I_2^L v = I_1^v I_2^v \frac{1}{T_1}$$
(7.31)

$$I_1^L I_2^L I_{o1} = \frac{1}{T_1} \frac{1}{T_2} I_{out}$$
(7.32)

Hence, one may rewrite equations (7.28) and (7.29) as:

$$C_1 V_T \dot{w}_1 + I_{oz}^2 w_2 = I_1^L I_2^L + I_1^v I_2^v$$
(7.33)

$$C_1 V_T \dot{w}_2 + I_{d2} w_2 = w_1 \tag{7.34}$$

Applying the Laplace Transform, and solving for  $W_2(s)$  and substituting in (7.32) finally yields the following lowpass biquadratic transfer function:

$$I_{out}(s) = \frac{\frac{I_{o1}}{C_1 C_2 V_T^2}}{s^2 + \frac{I_{d2}}{C_2 V_T} s + \frac{I_{o2}^2}{C_1 C_2 V_T^2}} \mathscr{L}\left\{I_1^L I_2^L + I_1^v I_2^v\right\}$$
(7.35)



Figure 7.5: Biquadratic log-domain demodulator circuit. The circuit multiplies currents  $I_1^v$ ,  $I_2^v$  and  $I_1^L$ ,  $I_2^L$ , sums their products and filters the result by means of a biquadratic lowpass filter. Note that  $\omega_o$  and Q may be adjusted independently using the currents  $I_{o1}$ ,  $I_{d2}$  and  $I_{oz}$ .

where  $\mathscr{L}\left\{I_1^L I_2^L + I_1^v I_2^v\right\}$  is the input corresponding to the first bracket in (7.8). It is useful to compare (7.35) to the standard form for a lowpass biquadratic transfer function, i.e.:

$$H(s) = \frac{\omega_o^2}{s^2 + \frac{\omega_o}{Q}s + \omega_o^2}$$
(7.36)

Clearly, the natural frequency and Q factor may be adjusted independently using the currents  $I_{o1}$ ,  $I_{d2}$  and  $I_{oz}$ .

#### 7.2.3 Distortion and Noise Characteristics

As already discussed, one of the advantages of log-domain circuits is that they permit the use of as much as 20 dB of additional headroom, and therefore provide a larger dynamic range [89]. The upper limit of the dynamic range is set by an acceptable level of distortion at the output relative to the bias current  $I_o$ . As a rule, the peak current swing should be less than the DC bias current to prevent distortion. In this work, we define the maximum level of total harmonic distortion (THD) at the output to be 1%, and it is usual to quote the distortion at frequencies lower than 1/3 of the cutoff. The magnitude of the input signal may be expressed as the ratio of its peak value to the bias current  $I_o$ . The parameter  $m = I_{in_peak}/I_o$  is called the modulation index [103].

The dynamic range is also limited relative to the noise floor. The lower bound of the dynamic range and SNR is defined by the noise floor. Shot noise typically dominates log-domain circuits, and one can accurately approximate quiescent noise current in a transistor using the following formula [89, 94]:

$$H(f) = \sqrt{2qI_o\Delta f} = \sqrt{2kT\Delta f/r}$$
(7.37)

where  $I_o$  is the DC bias current,  $\Delta f$  is the noise measurement bandwidth, q is the electronic charge, k is the Botzmann's constant, q is the electronic charge, r is the dynamic impedance of the the transistor operating with a bias current  $I_o$ , and Tis the absolute temperature in Kelvin. The maximum signal-to-noise ratio is then the ratio of the peak RMS signal to the noise:

$$H(f) = \frac{I_o/\sqrt{2}}{\sqrt{2qI_o\Delta f}} = \frac{\sqrt{I_o}}{2\sqrt{q\Delta f}} = \frac{\sqrt{kT}}{2q\sqrt{r\Delta f}}$$
(7.38)

To use a practical example, consider the special case when  $I_o = 100 uA$ ,  $\Delta f = 1 MHz$ . Assuming room temperature, the maximum SNR is 82 dB [89]. However, this only applies for a single device. As simulation results show below, the overall SNR in a complete filter circuit will be lower. A reasonable rough estimate is that the noise power scales by the number of transistors in the signal path. In the example above, if the circuit has 10 transistors, then the SNR will be around 72 dB [89].

Equation (7.38) highlights the tradeoffs inherent in the design of a log-domain circuit. Improvements in dynamic range come with a significant penalty in power consumption - i.e., for a 3dB increase in SNR, the power consumption must be doubled [89]. Similarly, SNR is related to the dynamic impedance r. In order to improve the *SNR* by 3dB, ultimately all capacitor sizes and power dissipation must be doubled without affecting the filter cutoff frequencies. This is a more serious tradeoff than what is normally encountered in voltage-domain filters [89].



Figure 7.6: (a)  $\beta$ -compensation using an NMOS device to replace the diode connection. (b) The effect of  $\beta$ -compensation on the magnitude of the transfer function. The ideal response has a DC gain of 0dB.

# 7.3 Circuit Non-Idealities

**Beta Compensation** In the analysis above, we have assumed that the transistor current gain,  $\beta$ , is very large. In practice this will not be the case - translinear circuits using BJTs are susceptible to errors caused by finite beta values, as the base current for one device must "rob" some of the driving current to another device [104]. The circuit topology in figure 7.3 is susceptible to  $1/\beta$  errors [105], which may be mitigated to an extent using buffered feedback. After replacing the diode connections at the buffering nodes with BJTs (as shown in figure 7.6a), the errors become proportional to  $1/\beta^2$  [105]. As figure 7.6b shows, adding buffering leads to a response that is closer to ideal (with DC gain of zero). Without buffering, the DC gain is higher than expected (4.6*dB*).

 $V_{BE}$  **Mismatch** Non-ideal emitter area ratios often occur in implementation, leading to  $V_{BE}$  mismatch. The well-known translinear loop equation is modified to include emitter areas [104]:

$$\prod_{CW} \frac{1}{A_k} \prod_{CW} I_{Ck} = \prod_{CCW} \frac{1}{A_k} \prod_{CCW} I_{Ck}$$
(7.39)

where  $I_{Ck}$  is the collector current and  $A_k$  the emitter area, assuming there are an equal number of clockwise-facing (*CW*) and counterclockwise-facing (*CCW*)  $V_{BE}$  junctions in the design. Hence:

$$\prod_{CW} J = \lambda \prod_{CCW} J \tag{7.40}$$

$$\lambda = \frac{\prod_{CW} A_k}{\prod_{CCW} A_k} \tag{7.41}$$

where *J* is the current density and  $\lambda$  is the "area-ratio factor". Ideally,  $\lambda$  should be as close to unity as possible. However, unintentional errors in emitter area ratios ("*V<sub>BE</sub>* mismatch") occur in the implementation.

 $V_{BE}$  mismatch may also be caused by local variations in junction doping or by thermal gradients on the chip. Heat from the output stage may cause a fixed thermal gradient which disrupts the operation of the translinear core. This effect is referred to as "thermal feedback" and leads to signal distortion and a lowering of the lowfrequency open-loop gain. To counteract this effect, a highly symmetrical layout is required, and critical pairs of transistors should be arranged as cross-connected quads [104].



Figure 7.7: Bournelli cell with non-idealities (adapted from [104, 105]).



Figure 7.8: (a) Addition of a trimming current to address current mismatches. (b) The effect of trimming on the transfer function of a second order cascaded lowpass filter with an ideal cutoff frequency  $f_c = 387 kHz$ . Without trimming,  $I_d = 1.265 \mu A$  and  $f_c = 304 kHz$ . By trimming  $I_d$  to  $1.75 \mu A$ , the cutoff frequency  $f_c$  tends towards 387 kHz. With the other parameters fixed, the gain decreases as  $I_d$  increases.

**Current Source Mismatches** The error due to mismatches of the current sources biasing the BJTs is denoted by  $\delta$  in figure 7.7. This error, while not affecting the linearity of the circuit, does alter the  $\omega_c$  value (in the case of a cascaded lowpass filter) or the  $\omega_o$  and Q values in the case of a biquad with feedback. To correct this, small trimming currents may be added at the integrating nodes (see figure 7.8a), or by tuning the currents  $I_o$  and  $I_d$  to match the ideal response.

**Parasitic Base and Emitter Resistance** Parasitic base and emitter resistances  $(r_B \text{ and } r_E)$  introduce an additional voltage drop in the translinear loop ("excess voltage") leading to a reduction in the DC gain and cutoff frequency, and increased harmonic distortion. These parasitic resistances are depicted in figure 7.7. A full analysis of the harmonic distortion components of lossy log-domain integrators is presented in [105]. The effect of parasitic resistance may be compensated by tuning the bias current from  $I_o$  to  $I_{comp}$  [106]:

$$I_{comp} = \frac{V_T}{V_T - \frac{r_B I_o}{\beta}} I_o \tag{7.42}$$

## 7.4 Simulated Performance

The geometric mean splitter in figure 7.1 and log domain demodulator in figure 7.4 were simulated together in  $0.35 \,\mu m$  AMS SiGe BiCMOS technology using Cadence. Transient simulations were carried out using a real ultrasound A-line signal with a 1.7 MHz center frequency. The signal was demodulated into I/Q components using 1.7 MHz sine and cosine current reference signals. The values of currents  $I_{d1}$  and  $I_{d2}$  required to get the desired cutoff frequency (900 kHz) are calculated based

upon the relation between  $I_{d1}$ ,  $I_{d2}$ ,  $C_1$ ,  $C_2$ ,  $V_T$  and  $\omega_c$  as per (7.27):

$$w_{c2} = \frac{I_{d1}}{C_1 V_T}$$
(7.43)

$$w_{c1} = \frac{I_{d2}}{C_2 V_T} \tag{7.44}$$

Based on these relations, the parameters set for this simulation are:  $I_{d1} = I_{d2} = 5.6 \,\mu A$ ,  $C_1 = C_2 = 40 \,pF$ ,  $I_{INDC} = 2 \,\mu A$ . Normalised transient results are shown in figure 7.9. The I/Q envelop was calculated as follows:

$$\text{Envelop} = \sqrt{I_I^2 + I_Q^2} \tag{7.45}$$

Table 7.1 summarises the circuit performance under two conditions: (A)  $I_o = 4\mu A$  and (B)  $I_o = 10\mu A$ . The static power consumption, IP3 levels and 1-dB compression points are reported for the entire circuit. A key advantage of the proposed topology is enhanced dynamic range - i.e. 20 dB larger than the voltage mode topology in chapter 5. Furthermore, the bandwidth/gain may be tuned electronically, based upon the requirements of the application. The transition frequency, gain and dynamic range may be increased at the expense of higher power consumption by increasing the bias currents. The bandwidth may be adjusted by tuning the biasing currents or by changing the capacitance, *C*.

However, the topology requires two filters and thus needs 2n integrating capacitors to implement a  $n^{th}$  order filter. It also needs double the number of transistors in comparison to the Class-A log-domain filter (for processing positive and negative components). This could lead to severely increased chip area, especially when large capacitors are needed to implement low frequency poles.



Figure 7.9: Transient analysis of the log-domain demodulator circuit. A simple A-line signal is demodulated into I/Q components in order to form an envelop. (a) Demodulated I component (b) Demodulated Q component (c) RF input versus simulated envelop (d) Simulated envelop (generated using Cadence) versus the ideal envelop (generated using Matlab).

| Parameter                                    | (A)                | (B)                 |
|----------------------------------------------|--------------------|---------------------|
| Supply Voltage                               | 3.3 <i>V</i>       | 3.3 <i>V</i>        |
| Static power consumption                     | 375 µW             | 1.1 mW              |
| $3 dB$ bandwidth ( $f_c$ )                   | 910 <i>kHz</i>     | 905 kHz             |
| Min transition frequency $(f_T)$             | 1.81 <i>GHz</i>    | 7.4 <i>GHz</i>      |
| DC gain                                      | 0 dB               | 1 dB                |
| Input dynamic range @ $1MHz$ ( $THD < 1\%$ ) | 76.3 <i>dB</i>     | 78.1 <i>dB</i>      |
| Input referred noise floor                   | $39  pA/\sqrt{Hz}$ | $4.8  pA/\sqrt{Hz}$ |
| 1-dB compression point                       | $27 dB \mu A$      | $38 dB \mu A$       |
| Third order input intercept (IP3)            | $51 dB \mu A$      | $57 dB \mu A$       |

Table 7.1: Simulated performance summary for the log domain demodulator circuit in figure 7.4.

The noise performance of the circuit is generally poorer than the currentmode designs reviewed in section 6.2. The lowest reported input referred noise is  $280 fA/\sqrt{Hz} (25 MHz)$  [75], which is an order of magnitude lower than that of the proposed design. However, the power consumption per channel is lower - i.e.  $375 \mu W (f_c = 1 MHz)$  as opposed to  $9mW (f_c = 25 MHz)$  [75]. Increasing the bias current  $I_o$  naturally leads to a reduction in input referred noise, as is evident in table 7.1. Increasing  $I_o$  to  $85 \mu A$  pushes up the power consumption to 9mW, which matches [75]. However, the noise is only reduced to  $1.2 pA/\sqrt{Hz}$ . Thus, for an equal power consumption, the voltage mode topology in [75] achieves better noise performance than the proposed design. However, since the topology processes signals in the current domain, it conveniently targets high impedance, current-output transducers such as capacitive micromachined ultrasound transducers (CMUTs). This approach differs from traditional I-V converters, offering a compact means of both demodulating and amplifying signals.

Process variations are an important consideration in log-domain circuits. Monte Carlo simulations were carried out to investigate the robustness of the design to process variations. Specifically, the bandwidth and gain were analysed using 100 monte carlo sample points. The mean gain is 0.7 dB ( $\sigma = 1.5$ ), and the mean bandwidth is 956 kHz ( $\sigma = 76 kHz$ ). This simulation assumes poor matching of biasing currents  $I_o$  and of the two class-A filters in the circuit. To obtain a distortionless output, devices must be properly matched.

### 7.5 Summary

A current-input, current-output demodulator circuit is presented. The circuit is adapted for processing ultrasound signals, and compliments the linear, voltage-domain design presented in Chapter 5. In section 7.2.1, the demodulator circuit is analysed and the transfer function is derived. A biquadratic version of the circuit is also presented. Circuit characteristics and non-idealities are discussed in detail, including distortion, noise,  $1/\beta$  errors and the effect of mismatch and parasitic resistances. The circuit was simulated in in  $0.35 \,\mu m$  AMS SiGe BiCMOS, and results highlight key advantages of the proposed topology: enhanced dynamic range, tunability and a high operating frequency range. However, noise performance is poorer than the voltage-mode topology, for the same power consumption. Overall, the approach is a promising alternative for processing ultrasound signals in the current-domain.

# **Chapter 8**

# **Summary and Future Work**

## 8.1 Summary

In this work, two architectural solutions are proposed to enable aggressive miniaturisation of ultrasound imaging systems applied in new applications such as capsule endoscopes, implantable ultrasound devices and wearable ultrasound devices. Both proposed architectures employ the synthetic aperture beamforming (SAB) method to form 2D, B-mode images. A novel method is employed which combines aspects of the synthetic aperture focusing technique (SAFT), multi-element synthetic aperture focusing (M-SAF) and synthetic receive aperture (SRA) beamforming. Transmission is carried out *n* times for all receive elements, and reflected signals are multiplexed through a single receive channel, which significantly reduces system complexity and size. Spatial compounding across multiple transmit positions increases the SNR. Although only a single channel is used, the entire system is scalable to any number of channels, depending on what frame rate is required. The first architectural solution combines SAB with a well-known technique in RF systems: quadrature sampling. RF signals are demodulated to form I/Q components, which are processed sequentially to form a B-mode image. This effectively halves the bandwidth compared to RF-domain beamforming, thereby lowering the power consumption and required logic capacity of the system.

The second architecture employs compressive sensing within the finite rate of innovation (FRI) framework to reduce the sampling rate below the Nyquist frequency. The bandwidth of the signal is constrained prior to sampling in order to overcome the data bandwidth constraint of the transmission link between the frontend and digital processor. Signals are reconstructed non-linearity to form I/Q components, which are sequentially processed using SAB.

Extensive simulations were carried out to validate the functionality of these architectures. However, the primary objective was to translate theoretical constructs into hardware, and to obtain B-mode images with sufficient quality while reducing size and power consumption. Therefore, further work was done on the design and implementation of an analogue front-end (AFE) and digital beamformer.

The analogue front-end was designed to interface with a piezoelectric transducer. It functions as a fully-differential amplifier and demodulator comprising a low-noise preamplifier, mixer, programmable gain amplifier and lowpass filter. The circuit is implemented in  $0.35 \,\mu m$  CMOS and was fabricated and tested. The AFE has a total power consumption of  $7.9 \,mW$  with a 3.3V supply and occupies an area of  $2.25 \,mm^2$ . The input referred noise of the preamplifier  $(5.42 \,nV/\sqrt{Hz})$  is marginally better than the state-of-the-art, and the dynamic
range is 67.2*dB*. A novel preamplifier circuit topology is used to achieve quasiexponential time-gain control (TGC) (16.5 - 31 dB) by varying the control voltage. A programmable gain amplifier enables digital selection of three gain values (15 dB, 22 dB and 25.5 dB). The lowpass filter also has three selectable bandwidths (195 kHz, 1.85 MHz, 510 kHz) to allow for testing of both architectural frameworks.

We also propose a second circuit topology in chapter 7 - a log-domain demodulator. This circuit targets high impedance transducers such as capacitive micromachined ultrasound transducers (CMUTs) using a current-mode approach. The circuit employs a pseudodifferential, log-domain topology adapted from the "multer" topology in [15]. The proposed circuit offers a 20*dB* improvement in dynamic range over the voltage-mode demodulator and is electronically tunable by adjusting the bias currents. It was implemented in  $0.35 \,\mu m$  BiCMOS and validated using simulations in Cadence, but no fabrication results were obtained.

The *digital beamformer* was implemented and tested using a Spartan-6 FPGA. The system was specifically designed to carry out calculations dynamically, thereby reducing the memory requirements and enabling real-time operation. For a frame rate of 7Hz, the power consumption is 4.6 mW/channel across an aperture of 64 elements. The system was tested offline using a database of signals derived from a commercial ultrasound machine. The RTL design was also synthesised in Cadence® Encounter using AMS  $0.18\mu m$  CMOS technology. The dimensions of the ASIC are  $1.35 mm \times 1.35 mm$ , and the estimated power-consumption is 14.9 mW. This is the first reported SAB ASIC with power consumption per channel comparable to state-of-the-art mixed signal beamformers. However, no fabrication results have yet been obtained for the ASIC.

System-level experiments were carried to compare the image quality produced by both architectures. The normalised root mean squared error (NRMSE) between the quadrature SAB image and the RF reference image was 13%, while the compressive SAB error was 22% for the same number of transmission angles. This indicates that better image quality may be achieved using the quadrature architecture. The frame rate of the compressive SAB architecture is also constrained by the maximum data rate of the transmission link. In quadrature SAB, the frame rate is a function of the number of parallel receiver channels.

### 8.2 Future Work

In light of the above conclusions, the following recommendations are made for future work:

#### 1. Second Generation AFE

One of the most important tradeoffs in the design is frame rate. As mentioned above, the quadrature SAB design is limited by the extended acquisition time. While this frame rate is acceptable for capsule endoscopy, it is not acceptable for portable, B-mode imaging. The obvious solution is to increase the number of channels in the AFE and beamformer. Specifically, as chapter 4 explains, doubling the number of channels would allow one to increase the frame rate to around 15Hz with sufficient image quality, but at the expense of increased power consumption and silicon area. A smaller feature size could be used (e.g.  $0.18 \mu m$  CMOS) to help alleviate these problems.

Having validated both SAB architectures, the AFE area should be optimised by selecting a single, fixed capacitance for the lowpass filters. Since the quadrature SAB architecture performs better than the FRI compressive SAB architecture, the bandwidth should be  $1.3MH_z$ . Eliminating the other capacitors used for the lower bandwidths will significantly save silicon area.

#### 2. Integration with Transducer

In this work, the proposed architectures were tested offline using data captured from a commercial ultrasound machine. The system should now be tested with a physical transducer. The design targets piezoelectric transducers with a small impedance in the order of a few kiloohms near the resonant frequency. Piezoelectric transducers are available commercially and should be coupled with the front-end via an impedance matching network. In order to excite the transducer, a high-voltage excitation or pulser circuit should be designed and integrated with the AFE.

#### 3. Integrated System-on-Chip (SoC)

The ultimate vision is to create a fully integrated system-on-chip incorporating transmission circuitry, analogue-to-digital conversion, beamforming and IO functionality. Some work has already gone into synthesizing a standalone digital ASIC in 0.18  $\mu m$  CMOS. In order to integrate analogue and digital components, the digital beamformer would either need to be resynthesized in 0.35  $\mu m$  CMOS or the AFE should be redesigned in 0.18  $\mu m$  CMOS. Furthermore, an on-chip ADC should be designed to operate in the range of 5 – 10 *MHz* (depending on whether I/Q channels

are multiplexed), with a resolution of 10 bits. Given these specifications, either a successive approximation register (SAR) or pipelined architecture should be used.

A fully integrated SoC would pave the way for the development of a complete, miniaturised device targeting small-scale applications.

#### 2. Second Harmonic Imaging

Second harmonic imaging is commonly used in commercial systems and may be applied using the proposed system in order to enhance image contrast. The technique should be tested using a new ultrasound database with a sufficiently high sampling rate. This could be done using identical hardware by simply doubling the mixing frequency.

### 8.3 Conclusion

Two architectures have been proposed and implemented in hardware, paving the way for the development of small-scale, wireless applications such as capsule endoscopy. Both architectures achieve a significant reduction in sampling rate, system complexity and physical area, allowing for aggressive miniaturisation of the imaging front-end. The quadrature SAB technique in particular achieves the highest degree of image quality for a given frame rate. While significant progress has made in this direction, many avenues of future work exist, including integration with a transducer, and integration of both analogue and digital components on a single chip.

## **Appendix A**

# **Monte Carlo Analysis**

When an integrated circuit is fabricated, device mismatches and small random process variations may result in non-ideal behaviour in the manufactured chip. Monte Carlo (MC) analysis is commonly used to statistically model the effect of parameter variations on circuit behaviour. MC results are presented below for the preamplifier (figure A.1), PGA (figure A.2), lowpass filter (A.3(a)-(d)) and central bias (figure A.3(e)-(f)). The number of MC samples used in each experiment in the analysis is indicated in each figure.



Figure A.1: Preamplifier monte carlo simulation results. (a) Differential gain (b) Common-mode rejection ratio (c) Phase margin (*degrees*) (d) 3dB bandwidth (e) Tail current through  $M_6$  (f) Common mode voltage.



Figure A.2: PGA monte carlo simulation results. (a) Common mode loop phase margin (degrees) (b) Closed loop phase margin (degrees) (c) Core amplifier tail current through M11 (d) Common mode rejection ratio (e) Closed loop bandwidth (f) Positive power supply rejection ratio.



Figure A.3: Monte carlo simulation results for the lowpass filter and central bias. (a) Differential gain (b) Common-mode rejection ratio (c) 3 dB bandwidth (d) Filter core amplifier tail current (e) Central bias reference current (f) Central bias common mode voltage.

## References

- GE. Vscan | GE Healthcare. Retrieved May 25, 2017, from http://www3.gehealthcare.com/en/Products/Categories/Ultrasound/Vscan\_ Family/Vscan.
- [2] Philips. Philips Lumify | Portable Ultrasound Machine. Retrieved May 25, 2017, from https://www.lumify.philips.com/web/.
- [3] Gi-duck Kim, Changhan Yoon, Sang-bum Kye, Youngbae Lee, and Jeeun Kang. A Single FPGA-Based Portable Ultrasound Imaging System for Pointof-Care Applications. *IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control*, 59(7):1386–1394, 2012.
- [4] Gokce Gurun, Jaime Zahorian, Coskun Tekes, Mustafa Karaman, Paul Hasler, and F. Levent Degertekin. An analog beamformer for integrated highfrequency medical ultrasound imaging. 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS), pages 65–68, nov 2011.
- [5] Gokce Gurun, Jaime S Zahorian, Alper Sisman, Mustafa Karaman, Paul E Hasler, and F Levent Degertekin. An analog integrated circuit beamformer

for high-frequency medical ultrasound imaging. *IEEE transactions on biomedical circuits and systems*, 6(5):454–67, oct 2012.

- [6] Zili Yu, Michiel a.P. Pertijs, and Gerard C. M. Meijer. A programmable analog delay line for Micro-beamforming in a transesophageal ultrasound probe. 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology, pages 299–301, nov 2010.
- [7] Chao Chen, Zhao Chen, Deep Bera, Shreyas B Raghunathan, Maysam Shabanimotlagh, Emile Noothout, Zu-yao Chang, Jacco Ponte, Christian Prins, Hendrik J Vos, Johan G Bosch, Martin D Verweij, Nico de Jong, and Michiel A P Pertijs. A Front-End ASIC With Receive Sub-array Beamforming Integrated With a 32 x 32 PZT Matrix Transducer for 3-D Transesophageal Echocardiography. *IEEE Journal of Solid-State Circuits*, pages 1–13, 2017.
- [8] Insoo Kim, Hyunsoo Kim, F Griggio, R L Tutwiler, T N Jackson, S Trolier-McKinstry, and Kyusun Choi. CMOS Ultrasound Transceiver Chip for High-Resolution Ultrasonic Imaging Systems. *IEEE transactions on biomedical circuits and systems*, 3(5):293–303, oct 2009.
- [9] Jeeun Kang, Changhan Yoon, Jaejin Lee, Sang-Bum Kye, Yongbae Lee, Jin Ho Chang, Gi-duck Kim, Yangmo Yoo, and Tai-kyong Song. A Systemon-Chip Solution for Point-of-Care Ultrasound Imaging Systems: Architecture and ASIC Implementation. *IEEE Transactions on Biomedical Circuits and Systems*, 10(2):412 – 423, 2016.
- [10] Jon Alexander. Xilinx Devices in Portable Ultrasound Systems. Technical report, Xilinx Inc., 2013.

- [11] John H. Lee, Giovanni Traverso, Carl M. Schoellhammer, Daniel Blankschtein, Robert Langer, Kai E. Thomenius, Duane S. Boning, and Brian W. Anthony. Towards wireless capsule endoscopic ultrasound (WCEU). *IEEE International Ultrasonics Symposium, IUS*, pages 734–737, 2014.
- [12] João Correia. TROY: Endoscope Capsule Using Ultrasound Technology, Final Report. Technical report, IAITI, SA, 2009.
- [13] Abhishek Basak, Student Member, Vaishnavi Ranganathan, and Student Member. Implantable Ultrasonic Imaging Assembly for Automated Monitoring of Internal Organs. *IEEE transactions on biomedical circuits and systems*, 8(6):881–890, 2015.
- [14] Andrzej P Mierzwa, Sean P Huang, Kristen T Nguyen, Martin O Culjat, and Rahul S Singh. Wearable Ultrasound Array for Point-of-Care Imaging and Patient Monitoring. *Studies in health technology and informatics*, 220:241– 4, 2016.
- [15] G. Kathiresan, E.M. Drakakis, and C. Toumazou. A highly linear front-end based on a logarithmic multiplier-filter. In *Proceedings of the 2003 International Symposium on Circuits and Systems.*, volume 1, pages 9–12, 2003.
- [16] Haim Azhari. Basics of Biomedical Ultrasound for Engineers. Wiley-IEEE Press, 1st edition, 2010.
- [17] Troy Farncombe and Kris Iniewski. Medical imaging: technology and applications. CRC Press, 2014.

- [18] Avinash C. Kak and Kris A. Dines. Signal Processing of Broadband Pulsed Ultrasound: Measurement of Attenuation of Soft Biological Tissues. *Biomedical Engineering, IEEE Transactions on*, BME-25(4):321–344, 1978.
- [19] J. H. Kim, T. K. Song, and S. B. Park. Pipelined Sampled-Delay Focusing in Ultrasound Imaging Systems. *Ultrasonic Imaging*, 9(2):75–91, apr 1987.
- [20] K.E. Thomenius. Evolution of ultrasound beamformers. 1996 IEEE Ultrasonics Symposium. Proceedings, 2:1615–1622, 1996.
- [21] S R Freeman, M K Quick, M a Morin, R C Anderson, C S Desilets, T E Linnenbrink, and M O'Donnell. Delta-sigma oversampled ultrasound beamformer with dynamic delays. *IEEE transactions on ultrasonics, ferroelectrics, and frequency control*, 46(2):320–32, jan 1999.
- [22] Borislav Gueorguiev Tomov and Jørgen Arendt Jensen. Compact FPGAbased beamformer using oversampled 1-bit A/D converters. *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, 52(5):870–880, 2005.
- [23] Pengyu Song, Kei Tee Tiew, Yvonne Lam, and Liang Mong Koh. A CMOS 3.4 mW 200 MHz continuous-time delta-sigma modulator with 61.5 dB dynamic range and 5 MHz bandwidth for ultrasound application. *Midwest Symposium on Circuits and Systems*, pages 152–155, 2007.
- [24] JR Talman and SL Garverick. Unit-delay focusing architecture and integrated-circuit implementation for high-frequency ultrasound. *IEEE*

Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 50(11):1455–1463, 2003.

- [25] J R Talman, S L Garverick, and G R Lockwood. Integrated circuit for highfrequency ultrasound annular array. *Custom Integrated Circuits Conference* 2003 Proceedings of the IEEE 2003, pages 477–480, 2003.
- [26] Roberto Alini, Giorgio Betti, Ivan Bietti, and Giacomino Bollati. A 200-MSample/s trellis-coded PRML read/write channel with analog adaptive equalizer and digital servo. *IEEE Journal of Solid-State Circuits*, 32(11):1824–1838, 1997.
- [27] E. Burlingame and R. Spencer. An analog CMOS high-speed continuoustime FIR filter. *Proceedings of the 26th European Solid-State Circuits Conference*, 2000.
- [28] David Hernandez-Garduno and Jose Silva-Martinez. A CMOS 1Gb/s 5-tap transversal equalizer based on inductorless 3 rd-order delay cells. *Digest* of Technical Papers - IEEE International Solid-State Circuits Conference, pages 2005–2007, 2007.
- [29] William R Forni, David Harnishfeger, Scott Kaylor, J P Micheal, Narendra Rao, Mark Rohrhaugh, Mike Ross, L Gary, Kavch Parsi, Robert P Burns, Alan Chaiken, J Mark, and O Perez. A PRML ReadIWrite Channel IC : Using Analog Signal Processing for 200 Mb/s HDD. *IEEE Journal of Solid-State Circuits*, 31(11), 1996.

- [30] YW Chang and CN Kuo. Tunable delay compensation circuit in polar loop transmitter for WiMAX applications. In 2010 Asia-Pacific Microwave Conference Proceedings (APMC), pages 426–429, 2010.
- [31] T Halvorsrod, W Luzi, and TS Lande. A log-domain µbeamformer for medical ultrasound imaging systems. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 52(12):2563–2575, 2005.
- [32] F Memon, G Touma, J Wang, S Baltsavias, A Moini, C Chang, M F Rasmussen, A Nikoozadeh, Jung Woo Choe, A Arbabian, R B Jeffrey, E Olcott, and B T Khuri-Yakub. Capsule ultrasound device. *Ultrasonics Symposium* (*IUS*), 2015 IEEE International, pages 1–4, 2015.
- [33] F Memon, G Touma, J Wang, S Baltsavias, A Moini, C Chang, M F Rasmussen, A Nikoozadeh, Jung Woo Choe, A Arbabian, R B Jeffrey, E Olcott, and B T Khuri-Yakub. Capsule ultrasound device: further developments. *Ultrasonics Symposium (IUS), 2015 IEEE International*, pages 1–4, 2015.
- [34] Robert Lang, Steven A Goldstein, Itzhak Kronzon, Bijoy K Khandheria, and Victor Mor-Avi. ASE's Comprehensive Echocardiography. Elsevier, second edition, 2015.
- [35] Ira O Wygant, Xuefeng Zhuang, David T Yeh, Omer Oralkan, a Sanli Ergun, Mustafa Karaman, and Butrus T Khuri-Yakub. Integration of 2D CMUT arrays with front-end electronics for volumetric ultrasound imaging. *IEEE transactions on ultrasonics, ferroelectrics, and frequency control*, 55(2):327–42, feb 2008.

- [36] Kailiang Chen, Hae Seung Lee, and Charles G. Sodini. A Column-Row-Parallel ASIC Architecture for 3-D Portable Medical Ultrasonic Imaging. *IEEE Journal of Solid-State Circuits*, 51(3):738–751, 2016.
- [37] M. Karaman and M. O'Donnell. Synthetic aperture imaging for small scale systems. *IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control*, 42(3):429–442, may 1995.
- [38] Choye Kim, Changhan Yoon, Jong-ho Park, Yuhwa Lee, Won Hwa Kim, Jung Min Chang, and Byung Ihn Choi. Evaluation of Ultrasound Synthetic Aperture Imaging Using Bidirectional Pixel-Based Focusing : Preliminary Phantom and In Vivo Breast Study. 60(10):2716–2724, 2013.
- [39] Matthew O'Donnell and L. J. Thomas. Efficient Synthetic Aperture Imaging from a Circular Aperture with Possible Application to Catheter-Based Imaging. *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, 39(3):366–380, 1992.
- [40] Jørgen Arendt Jensen, Svetoslav Ivanov Nikolov, Kim Løkke Gammelmark, and Morten Høgholm Pedersen. Synthetic aperture ultrasound imaging. *Ultrasonics*, 44 Suppl 1:e5–15, dec 2006.
- [41] Milen Nikolov and Vera Behar. Analysis and Optimization of Medical Ultrasound Imaging Using the Effective Aperture Approach. *Cybernetics and information technologies*, 5(2), 2005.
- [42] L F Nock and G E Trahey. Synthetic receive aperture imaging with phase correction for motion and for tissue inhomogeneities. I. Basic principles.

*IEEE transactions on ultrasonics, ferroelectrics, and frequency control,* 39(4):489–95, jan 1992.

- [43] Jean Provost, Clement Papadacci, Juan Esteban Arango, Marion Imbault, Mathias Fink, Jean-Luc Gennisson, Mickael Tanter, and Mathieu Pernot.
  3D ultrafast ultrasound imaging in vivo. *Physics in medicine and biology*, 59(19):L1–L13, oct 2014.
- [44] Peter J A Frinking, Ayache Bouakaz, Johan Kirkhorn, Folkert J. Ten Cate, and Nico De Jong. Ultrasound contrast imaging: Current and new potential methods. *Ultrasound in Medicine and Biology*, 26(6):965–975, 2000.
- [45] Seong Ho Chang and Gyu Hyoung Cho. Phase-Error-Free Quadrature Sampling Technique in the Ultrasonic B-Scan Imaging System and Its Application to the Synthetic Focusing System. *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, 40(3):216–223, 1993.
- [46] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. *IEEE Transactions on Signal Processing*, 50(6):1417–1428, jun 2002.
- [47] Karthik Ranganathan, Mary K. Santy, Travis N. Blalock, John A. Hossack, and William F. Walker. Direct sampled I/Q beamforming for compact and very low-cost ultrasound imaging. *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, 51(9):1082–1094, 2004.

- [48] Kim Løkke Gammelmark and Jørgen Arendt Jensen. Multielement synthetic transmit aperture imaging using temporal encoding. *IEEE Transactions on Medical Imaging*, 22(4):552–563, 2003.
- [49] Emamnuel j Candès. Compressive sampling. Int. Congress of Mathematics,
  3, pp. 1433-1452, Madrid, Spain, 2006, (3):pp. 1433–1452, 2006.
- [50] E Candes, J Romberg, and T Tao. Stable Signal Recovery from Incomplete and Inaccurate Measurements. *Communications on Pure and Applied Mathematics*, LIX:1207–1223, 2006.
- [51] Emmanuel J. Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. *IEEE Transactions on Information Theory*, 52(2):489–509, 2006.
- [52] David L. Donoho. Compressed sensing. *IEEE Transactions on Information Theory*, 52(4):1289–1306, 2006.
- [53] Yonina C. Eldar and Gitta. Kutyniok. *Compressed sensing : theory and applications*. Cambridge University Press, 2012.
- [54] Ronen Tur, YC Eldar, and Zvi Friedman. Innovation rate sampling of pulse streams with application to ultrasound imaging. *IEEE Transactions on Signal Processing*, 59(4):1827–1842, 2011.
- [55] Moshe Mishali and Yonina C. Eldar. From theory to practice: Sub-Nyquist sampling of sparse wideband analog signals. *IEEE Journal on Selected Topics in Signal Processing*, 4(2):375–391, 2010.

- [56] Eliahu Baransky, Gal Itzhak, Noam Wagner, Idan Shmuel, Eli Shoshan, and Yonina Eldar. Sub-nyquist radar prototype: Hardware and algorithm. *IEEE Transactions on Aerospace and Electronic Systems*, 50:809–822, 2014.
- [57] Noam Wagner, Yonina C. Eldar, and Zvi Friedman. Compressed beamforming in ultrasound imaging. *IEEE Transactions on Signal Processing*, 60(9):4343–4657, 2012.
- [58] Tanya Chernyakova and Yonina Eldar. Fourier-domain beamforming: The path to compressed ultrasound imaging. *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, 61(8):1252–1267, 2014.
- [59] Jonathon Spaulding, Yonina C Eldar, and Boris Murmann. Mixer-base subarray beamforming for sub-nyquist sampling ultrasound architectures. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 882–886, South Brisbane, Queensland, Australia, 2015. IEEE.
- [60] R Agilesh and Raghav Menon. Design of 2D ultrasound Scanner Using Compressed Sensing and Synthetic Aperture (CS- SA) technique. *International Journal of Engineering Research & Technology*, 3(4):1604–1608, 2014.
- [61] Yonina Eldar. Innovation Rate Sampling of Pulse Streams with Application to Ultrasound Imaging. Retrieved October 13, 2015, from http://webee.technion.ac.il/people/YoninaEldar/xampling\_pulse\_streams\_medical\_imaging.php, 2010.

- [62] Moshe Mishali and Yonina C. Eldar. Xampling: Compressed Sensing of Analog Signals. In *Compressed Sensing: Theory and Applications*, page 58. Cambridge University Press, mar 2011.
- [63] P. Acevedo, A. Durán, and E. Rubio. Image Quality Improvement Performance Using the Synthetic Aperture Focusing Technique Data. In *Acoustical Imaging, Volume 30*, pages 325–333. 2011.
- [64] Hao Yen Tang, Yipeng Lu, Stephanie Fung, David A. Horsley, and Bernhard E. Boser. Integrated ultrasonic system for measuring body-fat composition. In *IEEE International Solid-State Circuits Conference*, volume 58, pages 210–211. IEEE, feb 2015.
- [65] Anshuman Bhuyan, Jung Woo Choe, Byung Chul Lee, Ira O Wygant, Amin Nikoozadeh, Omer Oralkan, and Butrus T. Khuri-Yakub. Integrated circuits for volumetric ultrasound imaging with 2-D CMUT arrays. *IEEE Transactions on Biomedical Circuits and Systems*, 7(6):796–804, 2013.
- [66] Krzysztof Iniewski. Integrated Microsystems: Electronics, Photonics, and Biotechnology. CRC Press, 2011.
- [67] Penelope Allisy-Roberts and Jerry R. Williams. Farr's Physics for Medical Imaging. Elsevier Health Sciences, 2007.
- [68] David E Dausch, John B Castellucci, Derrick R Chou, and Olaf T von Ramm. Theory and operation of 2-D array piezoelectric micromachined ultrasound transducers. *IEEE transactions on ultrasonics, ferroelectrics, and frequency control*, 55(11):2484–92, nov 2008.

- [69] Jonny Johansson and Jerker Delsing. Energy and pulse control possibilities using ultra-tight integration of electronics and piezoelectric ceramics. *IEEE Ultrasonics Symposium*, 3(c):2206–2210, 2004.
- [70] Jonny Johansson, Martin Gustafsson, and Jerker Delsing. Ultra-low power transmit/receive ASIC for battery operated ultrasound measurement systems. *Sensors and Actuators A: Physical*, 125(2):317–328, jan 2006.
- [71] Thomas H. Lee. *The Design of CMOS Radio-Frequency Integrated Circuits*. Cambridge University Press, 2004.
- [72] Ira Wygant. A comparison of CMUTs and piezoelectric transducer elements for 2D medical imaging based on conventional simulation models. *IEEE International Ultrasonics Symposium, IUS*, pages 100–103, 2011.
- [73] Gokce Gurun, Coskun Tekes, Jaime Zahorian, Toby Xu, Sarp Satir, Mustafa Karaman, Jennifer Hasler, and F Levent Degertekin. Single-chip CMUTon-CMOS front-end system for real-time volumetric IVUS and ICE imaging. *IEEE transactions on ultrasonics, ferroelectrics, and frequency control*, 61(2):239–50, feb 2014.
- [74] Linga Reddy Cenkeramaddi, Tajeshwar Singh, and Trond Ytterdal. Inverterbased 1V transimpedance amplifier in 90nm CMOS for medical ultrasound imaging. In *Norchip*, 2009.
- [75] Ira O. Wygant, Nafis S. Jamal, Hyunjoo J. Lee, Amin Nikoozadeh, Omer Oralkan, Mustafa Karaman, and Butrus T. Khuri-Yakub. An integrated circuit with transmit beamforming flip-chip bonded to a 2-D CMUT array for

3-D ultrasound imaging. *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, 56(10):2145–2156, 2009.

- [76] R.a. Noble, R.R. Davies, D.O. King, M.M. Day, A.R.D. Jones, J.S. McIntosh, D.a. Hutchins, and P. Saul. Low-temperature micromachined cMUTs with fully-integrated analogue front-end electronics. 2002 IEEE Ultrasonics Symposium, 2002. Proceedings., 2(c):1045–1050, 2002.
- [77] Chao Chen, Zhao Chen, Zu Yao Chang, and Michiel A P Pertijs. A compact 0.135-mW/channel LNA array for piezoelectric ultrasound transducers. *European Solid-State Circuits Conference*, 2015-Octob:404–407, 2015.
- [78] L.L. Lay, S.J. Carey, and J.V. Hatfield. Pre-amplifier arrays for intra-oral ultrasound probe receiving electronics. *IEEE Ultrasonics Symposium*, 2004, 00(c):1753–1756, 2004.
- [79] Hojong Choi, Xiang Li, Sien-Ting Lau, ChangHong Hu, Qifa Zhou, and K Kirk Shung. Development of integrated preamplifier for high-frequency ultrasonic transducers and low-power handheld receiver. *IEEE transactions* on ultrasonics, ferroelectrics, and frequency control, 58(12):2646–58, 2011.
- [80] J. Morizio, S. Guhados, J. Castellucci, and O. von Ramm. 64-Channel Ultrasound Transducer Amplifier. In *Southwest Symposium on Mixed-Signal Design*, 2003, pages 228–232, 2003.
- [81] Y. Yañez, M.J. Garcia-Hernandez, J. Salazar, a. Turo, and J.a. Chavez. Designing amplifiers with very low output noise for high impedance piezoelectric transducers. *NDT & E International*, 38(6):491–496, 2005.

- [82] Mashhour Bani Amer. Novel design of low noise preamplifier for medical ultrasound transducers. *Journal of Medical Systems*, 35(1):71–77, 2011.
- [83] Hans Herman Hansen. A 33 uW Sub-3 dB Noise Figure Low Noise Amplifier for Medical Ultrasound Applications. PhD thesis, Norwegian University of Science and Technology, 2011.
- [84] Behzad Razavi. *Design of analog CMOS integrated circuits*. McGraw-Hill Education, 1 edition, 2001.
- [85] Behzad Razavi. RF Microelectronics. Prentice Hall, 2nd edition, 2011.
- [86] Thomas H. Lee. *The Design of CMOS Radio-Frequency Integrated Circuits*. Cambridge University Press, 2004.
- [87] Ron Mancini. Voltage-Feedback Op Amp Compensation. In Op Amps for Everyone, pages 77–97. Texas Instruments, 2003.
- [88] Kendall L. Su. Analog Filters. Springer Science & Business Media, 2002.
- [89] Yichuang Sun. *Design of high frequency integrated analogue filters*. IET, jan 2002.
- [90] M Tavakoli, L Turicchia, and R Sarpeshkar. An ultra-low-power pulse oximeter implemented with an energy-efficient transimpedance amplifier. *IEEE transactions on biomedical circuits and systems*, 4(1):27–38, 2010.
- [91] Ihsan Ciçek, Ayhan Bozkurt, and Mustafa Karaman. Design of a frontend integrated circuit for 3D acoustic imaging using 2D CMUT arrays.

*IEEE transactions on ultrasonics, ferroelectrics, and frequency control,* 52(12):2235–2241, 2005.

- [92] Teng-Chuan Cheng and Tsung-Heng Tsai. CMOS Ultrasonic Receiver With On-Chip Analog-to-Digital Front End for High-Resolution Ultrasound Imaging Systems. *IEEE Sensors Journal*, 16(20):7454–7463, oct 2016.
- [93] Ji Yong Um, Yoon Jee Kim, Seong Eun Cho, Min Kyun Chae, Byungsub Kim, Jae Yoon Sim, and Hong June Park. A single-chip 32-channel analog beamformer with 4-ns delay resolution and 768-ns maximum delay range for ultrasound medical imaging with a linear array transducer. *IEEE Transactions on Biomedical Circuits and Systems*, 9(1):138–151, 2015.
- [94] Barrie Gilbert. Current Mode, Voltage Mode, or Free Mode? A Few Sage Suggestions. Analog Integrated Circuits and Signal Processing, 38:83–101, 2004.
- [95] Emmanuel Michael Drakakis, Alison J. Payne, and Chris Toumazou. Logdomain filtering and the Bernoulli cell. *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*, 46(5):559–571, 1999.
- [96] Y.P. Tsividis, V. Gopinathan, and L. ToTh. Companding in signal processing. *Electronics Letters*, 26(17):1331, 1990.
- [97] Yannis Tsividis. Externally linear, time-invariant systems and their application to companding signal processors. *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, 44(2):65–85, 1997.

- [98] B. Gilbert. Translinear circuits: a proposed classification. *Electronics Letters*, 11(1):14, 1975.
- [99] D. R. Frey. Log-domain filtering: an approach to current-mode filtering. *IEE Proceedings G Circuits, Devices and Systems*, 140(6):406–416, 1993.
- [100] Emmanuel M Drakakis, Alison J Payne, and Chris Toumazou. "Log-Domain State-Space": A Systematic Transistor-Level Approach for Log-Domain Filtering. *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, 46(3):290–305, 1999.
- [101] D. Perry and G.W. Roberts. The design of log-domain filters based on the operational simulation of LC ladders. *Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on*, 43(11):763–774, 1996.
- [102] D. Frey. Current mode class AB second order filter. *Electronics Letters*, 30(3):205, 1994.
- [103] A Katsiamis, E Drakakis, and R Lyon. A biomimetic, 4.5µW, 120dB, log-domain cochlea channel with AGC. *IEEE Journal of Solid-State Circuits*, 44(3):1006–1022, 2009.
- [104] Chris Toumazou, F.J. Lidgey, and David Haigh. *Analogue IC Design: The Current-mode Approach*. IET, 1992.
- [105] E. M. Drakakis, A. J. Payne, C. Toumazou, A. E J Ng, and J. I. Sewell. High-order lowpass and bandpass elliptic log-domain ladder filters. In *ISCAS 2001* 2001 IEEE International Symposium on Circuits and Systems, Conference Proceedings, volume 1, pages 141–144, 2001.

[106] Vincent W. Leung and Gordon W. Roberts. Analysis and Compensation of Log-Domain Biquadratic Filter Response Deviations due to Transistor Nonidealities. In *Research Perspectives on Dynamic Translinear and Log-Domain Circuits*, pages 41–56. Springer US, Boston, MA, 2000.