Single-FPGA 3D Ultrasound Beamformer by Yüzügüler, Ahmet Caner et al.
Single-FPGA 3D Ultrasound Beamformer
A. C. Yu¨zu¨gu¨ler∗, W. Simon∗, A. Ibrahim∗, F. Angiolini∗, M. Arditi∗, J.-P. Thiran∗† and G. De Micheli∗
∗ E´cole Polytechnique Fe´de´rale de Lausanne (EPFL), Switzerland
†Department of Radiology, University Hospital Center (CHUV) and University of Lausanne (UNIL), Switzerland
ABSTRACT
In medical diagnosis, ultrasound (US) imaging is one of the
most common, safe, and powerful techniques. Volumetric (3D)
US imaging, an emerging technique, is even more attractive than
standard 2D imaging, as it allows for imaging without the local
presence of a trained sonographer finely positioning the probe. This
would be particularly useful in rescue operations, remote areas and
developing countries. Unfortunately, present-day 3D imagers are
expensive, bulky and power-hungry, confining them to hospitals.
There is therefore a strong motivation to develop efficient electronics
to enable a portable US platform that is small, cheap, and battery-
operated. Beamforming (BF) is the most computationally expensive
of 3D imaging. Both commercial [1] and research [2] imagers
have dealt with the challenge by reducing the number of receive
channels, hence simplifying the computation through the usage of
far fewer elements. This comes at the cost of image quality, and the
resulting machines are nonetheless still non-portable and expensive.
In turn, the bottleneck of the BF process is the calculation of
acoustic delays, which requires up to trillions of square roots per
second. We propose a drastically more efficient architecture [3]. With
geometric considerations, each delay is calculated from a small set
of square roots (mapped onto CORDICs), plus two additions. In this
demo, we will show the reconstruction of a 2.5M-voxel volume,
supporting a transducer with 32×32 receive channels. We have fitted
the architecture into a single Kintex UltraScale KU040 [4], which
is unprecedented. We also extrapolated the utilization of a 80×80
instance on a Virtex UltraScale XCVU190 [4]. Table I shows the
implementation results. Fig. 1 shows our beamformer custom block
connected to the other FPGA subsystems. The delay calculation
architecture is shown in Fig. 2. The demo setup is presented in Fig. 3,
where the 3D beamformer is implemented on the FPGA, while the
pre- and post-processing stages are currently performed on Matlab.
MicroBlaze+
!
!
!
!
Local++
Memory+
(128k)+
!
!
!
!
Clock+
!
!
!
!
!
Interrupt+
Controller+
!
!
!
!
!
Debug++
!
!
!
!
Reset+
!
!
!
!
!
!
!
AXI+
Interconnect+
!
!
!
!
!
UART+
!
!
!
!
Memory+
Interface+
!
!
!
!
!
!
!
Ethernet+
Subsystem+
!
!
!
!
+
+
+
Ethernet+DMA+
s_axi+
s_s2mm+
m_axi_sg+
m_mm2s+
m_s2mm+
m_mm2s+
s_axi+
tx+
rx+
mb_debug+
interrupt+
clk_out+ mb_reset+
dlmb+
ilmb+
m_axi_dp+
m_axi_ip+
interrupt+
debug+
clk+
reset+
dlmb+
ilmb+
s0_axi+
s1_axi+
s2_axi+
s3_axi+
s4_axi+
s_axi+
m0_axi+
m1_axi+
m2_axi+
m3_axi+
m4_axi+
!
!
!
!
!
AXI+Timer+
intr[3:0]+
s_axi+ interrupt+
m5_axi+
s_axi+ uart+ rs232_uart+
ddr4_sdram+s_axi+ c0_ddr4+
mdio+
sgmii+
mac_irq+
sgmii+
mac_irq+
mdio_mdc+
interrupt+
interrupt+
2!
!
!
!
!
!
!
BEAMFORMER!
!
s_axi+
+m6_axi+
Fig. 1. FPGA diagram: Beamformer custom IP and support subsystems.
TABLE I
BEAMFORMER ARCHITECTURE RESULTS.
∗Kintex UltraScale KU040 implementation results.
∗∗Virtex UltraScale XCVU190 extrapolated results.
Supported Logic Regs BRAM DSP Clock Volume
Channels LUTs Rate
32×32∗ 78% 25% 100% 0.3% 125 MHz 50 vps
80×80∗∗ 86% 19% 43% 0.3% 125 MHz 50 vps
CORDIC&
SQRT&
reg[0]&
BRAM% Adder%
Adder%
reference&&
delay&
C2&&
coef&
C1&&
coefs&
LUT%L %UT%L %
reg[31]&
BRAM% Adder%
Adder%
reference&&
delay&
C2&&
coef&
C1&&
coefs&
LUT%L %UT%L %
DelaySteer&0& DelaySteer&31&
x32& x32&
x32&
Delay&Generator&
delay[0][31]& delay[31][31]&x32&
(a)
x32$
$
Delay$
Generator$
readaddr$
delay[32][32]$
Adder[0]$
dout$BRAM[0]
[0](
BRA [0]
[0](
BRAM[0]
[0](
BRAM$
[0][31](
BRAM[0
][0](
BRA [0
][0](
BRAM[0
][0](
BRAM$
[31][31](
Adder[31]$
x16$
dout$
AX
I$B
U
S$
Voxel$Adder$
x32$
readaddr$
(b)
Fig. 2. Proposed architecture of the delay computation block. The receive
delay is computed by applying steering coefficients to a reference delay (a),
then the calculated 32×32 delays are used to reconstruct a voxel (b).
or#
Ethernet#
FPGA%
3D#Beamformer#
Pre0processing#and#
#post0processing#
Kintex#UltraScale#KU040#
Fig. 3. The setup of the beamformer demo. The beamformer is implemented
on a Kintex UltraScale KU040 FPGA [4].
ACKNOWLEDGMENT
The authors acknowledge Swiss Confederation funding through the
UltrasoundToGo project of the Nano-Tera.ch initiative.
REFERENCES
[1] Philips Electronics N.V., “iE33 xMATRIX echocardiography system,”
www.healthcare.philips.com.
[2] J. Jensen, H. Holten-Lund, R. Nilsson, M. Hansen, U. Larsen, R. Dom-
sten, B. Tomov, M. Stuart, S. Nikolov, M. Pihl, Y. Du, J. Rasmussen, and
M. Rasmussen, “Sarus: A synthetic aperture real-time ultrasound system,”
Ultrasonics, Ferroelectrics, and Frequency Control, IEEE Transactions
on, vol. 60, no. 9, pp. 1838–1852, Sep 2013.
[3] W. Simon, A. C. Yu¨zu¨gu¨ler, A. Ibrahim, F. Angiolini, M. Arditi, J.-P.
Thiran, and G. De Micheli, “Single-FPGA, scalable, low-power, and high-
quality 3D ultrasound beamformer,” in The 26th International Conference
on Field-Programmable Logic and Applications (FPL), 2016.
[4] Xilinx Inc., “Ultrascale FPGA: Product tables and product
selection guide,” 2016, http://www.xilinx.com/support/documentation/
selection-guides/ultrascale-fpga-product-selection-guide.pdf#KU.
