A new architecture for a single-chip multi-channel beamformer based on a standard FPGA by Tomov, Borislav Gueorguiev & Jensen, Jørgen Arendt
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
General rights 
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners 
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. 
 
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. 
• You may not further distribute the material or use it for any profit-making activity or commercial gain 
• You may freely distribute the URL identifying the publication in the public portal  
 
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately 
and investigate your claim. 
   
 
Downloaded from orbit.dtu.dk on: Dec 17, 2017
A new architecture for a single-chip multi-channel beamformer based on a standard
FPGA
Tomov, Borislav Gueorguiev; Jensen, Jørgen Arendt
Published in:
Proceedings of IEEE Ultrasonics Symposium
Link to article, DOI:
10.1109/ULTSYM.2001.992011
Publication date:
2001
Document Version
Publisher's PDF, also known as Version of record
Link back to DTU Orbit
Citation (APA):
Tomov, B. G., & Jensen, J. A. (2001). A new architecture for a single-chip multi-channel beamformer based on a
standard FPGA. In Proceedings of IEEE Ultrasonics Symposium (Vol. 2, pp. 1529-1533). IEEE. DOI:
10.1109/ULTSYM.2001.992011
A new architecture for a single-chip multi-channel beamformer based
on a standard FPGA
Borislav Gueorguiev Tomov∗and Jørgen Arendt Jensen
Center for Fast Ultrasound Imaging, Ørsted•DTU, Build. 348,
Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
Abstract
A new architecture for a compact medical ultra-
sound beamformer has been developed. Combination
of novel and known principles has been utilized, lead-
ing to low processing power requirements and simple
analog circuitry. Usage of a field programmable gate
array (FPGA) for the digital signal processing pro-
vides programming flexibility.
First, sparse sample processing is performed by
generating the in-phase and quadrature beamformed
signals. Hereby only 512 samples are beamformed
for each line in an image. That leads to a 15-fold
decrease in the number of operations and enables the
use of Delta-Sigma (∆Σ) modulation analog-to-digital
converters (ADC).
Second, simple second-order ∆Σ modulation ADC
with classic topology is used. This allows for simple
analog circuitry and a very compact design. Several
tens of these together with the corresponding pream-
plifiers can be fitted together onto a single analog in-
tegrated circuit.
Third, parameter driven delay generation is used,
using 3 input parameters per line per channel for either
linear array imaging or phased array imaging. The de-
lays are generated on the fly. The delay generation
logic also determines the digital apodization by using
2 additional parameters. The control logic consists of
few adders and counters and requires very limited re-
sources.
Fourth, the beamformer is fully programmable.
Any channel can be set to use an arbitrary delay curve,
and any number of these channels can be used together
in an extendable modular multi-channel system.
A prototype of the digital logic is implemented us-
ing a Xilinx Virtex-E series FPGA. A 5 MHz center
frequency is used along with an oversampling ratio of
14. The sampling clock frequency used is 140 MHz
and the number of channels in a single Xilinx 1 mil-
lion gate FPGA XCV600E is 32. The beamformer uti-
lizes all of the BlockRAM of the device and 33 % of
its Core Logic Block (CLB) resources.
Both simulation results and processed echo data
form a phantom are presented.
1 Introduction
Making sophisticated technology more accessible is an im-
portant consideration in system design. Since digital elec-
tronics is rapidly evolving, moving processing functions from
analog to digital electronics is a powerful approach which al-
lows for increased flexibility and compactness of the mixed-
signal devices.
∆Σ modulation (DSM) [1] is one of the techniques that
make it possible to decrease the complexity of the analog
interface electronics by using digital logic. A DSM ADC
consists of consecutive stages containing low-pass filters and
decimators. The reconstructed samples represent the input
signal at equidistant time instances. Ultrasound beamform-
ers though require non-regular sampling because of the delay
profiles in receive. A number of researchers have tried to
incorporate DSM ADCs into ultrasound beamformers. Free-
man et al. [2] have developed a modified modulator architec-
ture in order to facilitate the delay profiles in the beamform-
ing without interrupting the modulation process. Since the
oversampling ratio (OSR) is crucial for the amplitude reso-
lution of a DSM ADC, the same research group suggested
base-band demodulation [3]. Kozak and Karaman [4] have
proposed a beamformer featuring DSM with a non-uniform
sampling clock.
In the present paper, several novel techniques are com-
bined in a new beamformer architecture. First, sparse sample
processing is employed, leading to about 15-fold decrease in
the necessary operations as only 512 samples per image line
are processed. The samples are chosen at the precise time in-
stances, discretized by the sampling frequency of the DSM.
Second, each channel uses a circular buffer at the output of
∗E-mail: bt@oersted.dtu.dk
1
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on April 12,2010 at 13:54:32 UTC from IEEE Xplore.  Restrictions apply. 
Tomov and Jensen 2
ts1(t) q1[n]
ts2(t) q2[n]
ts3(t) q3[n]
ts4(t) q4[n]
DSM Sum
r[n]
hQ[n]
sQ[n]^
hI[n]
sI[n]^
Input signals One−bit signals Filtered signals
Figure 1: Signal processing of the proposed beamformer, illustrated with 4 channels
the DSM for extraction of the necessary data. Thus, the struc-
ture does not impose any restrictions or requirements on the
order or topology of the DSM, allowing for flexibility and
interchangeability of the analog front-end. Third, the delay
generation is parametric and allows independent on-the-fly
delay generation for each channel. The calculation scheme is
inspired by the Bresenham drawing algorithm [5].
2 Principles behind the suggested
beamformer architecture
2.1 Rationale for the sparse sample processing
The ultrasound images are displayed on raster devices - CRT
or LCD displays, which rarely use a resolution beyond that of
a TV (525 lines for NTSC) or VGA (640x480 pixels). There-
fore, on such displays a beamformed line in an image is rep-
resented by no more than 512 points. Thus, it is sufficient
to have the correct envelope and phase of the RF signal in
512 equidistant points along the beamformed line to present
a correct B-mode or color-flow image. A similar approach
was proposed in a different context by Karaman et al. [6].
2.2 Usage of the Delta-Sigma modulator in the
beamformer
The principle of the DSM implies that the appropriately fil-
tered output of a DSM approximates the input signal, and the
approximation improves with increasing the oversampling ra-
tio (OSR). If a filter is applied directly on the DSM out-
put stream, valid output samples can be reconstructed at any
clock cycle. In this way, the delay resolution in a beamform-
ing process will be equal to the period of the inherently high
modulation frequency.
The signal processing is illustrated in Fig. 1. The analog
input signals sk(t) (k is channel index) from different chan-
nels are modulated into bit streams qk[n] in the DSM. In order
to perform beamforming at a given point indicated by arrows
in the plots of sk(t), sequences of bits (shown in black) are
extracted from the streams qk[n], at places that correspond to
the appropriate channel delays. The length of the sequences
is equal to the length of the reconstruction filters that will be
used. The selected sequences are summed into sequence r[n].
The latter is then weighted by in-phase hI [n] and quadrature
hQ[n] filters to yield selected samples of the in-phase sˆI [n]
and quadrature sˆQ[n] reconstructed streams. The matched fil-
ter from the classic beamforming is used as in-phase filter and
its Hilbert transform is used the quadrature filter, since they
suppress the quantization noise to a sufficient degree.
2.3 Delay generation
A delay calculation scheme is suggested that allows on-the-
fly delay generation. It approximates the analytic delay curve
for a given imaged line and receiver element. A similar ap-
proach with different calculation scheme has been suggested
by Feldka¨mper et al. [7] for increasing the delay resolution in
beamformers.
The geometry behind the delay calculation algorithm is
shown in Fig. 2. The distance to the focus point P along the
scan line is denoted d and the echo path is denoted dr. The
full path of the ultrasound wave is denoted p. The aperture
distance between the emission center and the receiving ele-
ment is denoted with x and the angle between the scan line
and the normal to the transducer surface is denoted ϕ.
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on April 12,2010 at 13:54:32 UTC from IEEE Xplore.  Restrictions apply. 
Tomov and Jensen 3
sc
an
 lin
e
0Transducer
P
d p dr
x
ϕ
Figure 2: Delay calculation geometry
The echo path dr can be expressed as:
dr = p−d =
√
(x−d sinϕ)2 +(d cosϕ)2. (1)
After some transformations the equation
p2−2d(p− xsinϕ)− x2 = 0 (2)
is obtained, which describes the imaged line. The term xsinϕ
is constant for a given line inclination and element and is de-
noted k. For converting the variables into units of clock cy-
cles (for calculations in hardware), both sides of (2) have to
be multiplied by ( fs
c
)
2
, where fs is the sampling frequency
and c is the speed of sound. Eq. (2) becomes:
f (pN ,dN)≡ p2N −2dN(pN − kN)− x2N = 0, (3)
where the index N denotes that the variable unit is clock cy-
cle.
In order to keep the focus on the imaged line, the delay
generation logic has to keep f (pN ,dN) as close to 0 as pos-
sible, therefore it should increase pN by 1 or 2 for each unit
increase of dN . The choice1 is made by evaluating the sign of
the function f (pN +1,dN +1). It can be seen that:
f (pN +1,dN +1) = f (pN ,dN)−2dN +2kN −1 < f (pN ,dN)
(4)
and
f (pN +2,dN +1)= f (pN ,dN)+2pN−4dN +2kN > f (pN ,dN).
(5)
Therefore the following algorithm is suggested:
1. The initial values dN(1) = dstart fsc , pN(1) = pstart
fs
c
,
and kN = xsinϕ fsc are supplied.
Sum
buffer
Apod.
Transducer
DSM
q n[ ]
sQ n[ ]
TGC
delay
gen.
( )s t
  nIh [ ]
  nhQ[ ]
s  n[ ]I
Figure 3: Beamformer architecture
2. If f (pN +1,dN +1) > 0, then pN(n+1) = pN(n)+1,
else pN(n+1) = pN(n)+2.
3. If the end of the line is not reached, go to 2.
The described algorithm approximates the analytical dy-
namic delay curve within ±1 clock cycle.
3 Beamformer architecture
The suggested beamformer architecture is shown in Fig. 3.
The received RF signal s(t) from every active transducer ele-
ment is amplified by a variable-gain amplifier (used for time-
gain compensation) and is converted into high-frequency 1-
bit digital signal q[n] in the DSM. That data stream is written
to a circular buffer. At time instances determined by the delay
generation logic, sequences from the stream are read. After
multiplication by the apodization coefficient (the weight) of
that channel, the aligned sequences corresponding to a given
line point are summed across all channels. The result is then
filtered for extracting the in-phase and the quadrature compo-
nents sˆI [n] and sˆQ[n].
4 Implementation tradeoffs and
choices
The described architecture was implemented in hardware
with the following beamformer parameters:
Parameter Value
Speed of sound 1540 m
s
Center frequency f0 5 MHz
Excitation 2 sinusoids at f0
Oversampling ratio (OSR) 14
Number of channels 32
1In case of tilted line, pN could also stay unchanged at some increase of dN . The decision requires evaluation of the sign of f (pN ,dN +1).
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on April 12,2010 at 13:54:32 UTC from IEEE Xplore.  Restrictions apply. 
Tomov and Jensen 4
−60 −40 −20 0 20 40 60
−60
−40
−20
0
Frequency, MHz
N
or
m
al
iz
ed
 a
m
pl
itu
de
, d
B
Figure 4: Frequency response of perfect matched filter (solid
line) a shortened one (dashed line), along with typical DSM
output spectrum (gray).
An important design decision is the choice of the in-phase and
the quadrature filters. The matched filter from classic beam-
forming ( time reversed excitation convolved twice with the
impulse response of the transducer) provides excellent sup-
pression of the quantization noise. The length of the filters
though is constrained by the amount of clock cycles that are
available for producing a reconstructed sample. That number
is inversely proportional to the density of the beamformed
points. For instance, if 512 points should represent a depth
range of 0.15 m, there are between 26 and 53 clock cycles
available to the filter block for producing in-phase and the
quadrature reconstructed samples.
In the current design, the filtering operation is parallelized
in four, so in-phase and quadrature filters with length up to
104 can be used. A perfect matched filter for the simula-
tion setup has length of 168. Therefore, a number of pseudo-
matched filters were investigated. The frequency response
of the perfect matched filter and that of a shortened one (3
central frequency sinusoids, Hamming window weighted) is
shown in Fig. 4.
The point spread functions (PSF) of single scaterers in the
transmit focal point were obtained through simulations using
the two mentioned filters. They were compared against the
reference beamforming PSF (Fig. 5) .
The implementation target is the Xilinx Virtex-E FPGA
device family which features quite a high number of dual-
ported fast SRAM that can be used as buffers for the DSM
output stream. A number of specific design choices are made:
1. The output word of the delay buffer is wider than the
input one. This speeds up the reading and allows for
parallel processing of the data at the highest possible
clock rate.
2. Each calculation cycle in the delay generation requires
two clock cycles, therefore a more aggressive compu-
tational scheme is employed: dN is increased by 2 for
each computation cycle and the sign of f (d+2, p+1),
f (d+2, p+2) and f (d+2, p+3) determines whether
pN should be increased by 1, 2, 3, or 4.
−20 0 20
−60
−40
−20
0
Lateral
N
or
m
. a
m
pl
itu
de
, d
B
Distance, mm
68 70 72
−60
−40
−20
0
Depth, cm
Axial
Figure 5: Simulated PSF: classic beamforming (solid
line), ideal matched filter (dashed line), shortened filter
(grey line).
3. The output data from the DSM is a 1-bit wide in the
current implementation and the apodization does not
require multiplications. It uses one register instead.
4. The sum operation across all channels is pipelined in
order to incorporate numerous inputs and to process
them at high clock frequency. The multiplication op-
eration is pipelined also and works at the modulation
clock frequency.
5. A chain of beamformers can be used, each of them re-
ceiving partially beamformed sample from a neighbor,
summing it with its own partially beamformed sample,
and passing it further on.
5 Phantom data processing results
A set of element traces sampled at 40 MHz was obtained us-
ing the experimental sampling system RASMUS [8]. That
data was resampled at 140 MHz and 200 MHz, and was
beamformed according to the suggested architecture. A com-
parison between the beamforming approaches is shown in
Fig. 6. The element number is 32 and the F-number is be-
tween 2.5 and 10. It can be seen that the quantization noise of
the DSM limits the picture contrast and increasing the OSR
improves that. Improvement can also be achieved by employ-
ing a more sophisticated modulator architecture.
6 Conclusion
A novel flexible beamformer architecture utilizing DSM is
suggested. The beamformer can be housed in one standard
FPGA, which can easily be programmed and upgraded. Com-
bined with a simple analog front end, the whole design can
be implemented by three chips (one of them containing the
transmit amplifiers). A standard portable PC can be used for
display, making it a very inexpensive system.
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on April 12,2010 at 13:54:32 UTC from IEEE Xplore.  Restrictions apply. 
Tomov and Jensen 5
Ax
ia
l d
ist
an
ce
, m
m
Reference beamforming
−20 −10 0 10 20
15
20
25
30
35
40
45
50
55
Ax
ia
l d
ist
an
ce
, m
m
DSM beamforming, OSR=14
−20 −10 0 10 20
15
20
25
30
35
40
45
50
55
Lateral distance, mm
Ax
ia
l d
ist
an
ce
, m
m
DSM beamforming, OSR=20
−20 −10 0 10 20
15
20
25
30
35
40
45
50
55
Figure 6: Images created with classic beamforming and with
the suggested beamforming at different OSR. The dynamic
range is 40 dB
Acknowledgment
This work was supported by grant 9700883 and 9700563
from the Danish Science Foundation, by B-K Medical A/S,
Gentofte, Denmark, by the Thomas B. Thrige Center for Mi-
croinstruments, and by the Danish Research Academy.
The authors would like to thank all colleagues at the Cen-
ter for Fast Ultrasound Imaging for their cooperation and
valuable comments on the paper.
References
[1] J. C. Candy and G. C. Temes. Oversampling Delta-Sigma
Data Converters - Theory, Design and Simulation. IEEE
Press, 1992.
[2] S. R. Freeman, M. K. Quick, M. A. Morin, R. C. Ander-
son, C. S. Desilets, T. E. Linnenbrink, and M. O’Donnell.
Delta-sigma oversampled ultrasound beamformer with
dynamic delays. IEEE Trans. Ultrason., Ferroelec., Freq.
Contr., 46:320–332, 1999.
[3] S. R. Freeman, M. K. Quick, M. A. Morin, R. C. Ander-
son, C. S. Desilets, T. E. Linnenbrink, and M. O’Donnell.
Heterodyning technique to improve performance of
delta-sigma-based beamformers. IEEE Trans. Ultrason.,
Ferroelec., Freq. Contr., 46:771–790, 1999.
[4] M. Kozak and M. Karaman. Digital phased array beam-
forming using single-bit delta-sigma conversion with
non-uniform oversampling. IEEE Trans. Ultrason., Fer-
roelec., Freq. Contr., 48:922–931, 2001.
[5] J. E. Bresenham. Algorithm for computer control of a
digital plotter. IBM Systems J., 4:25–30, 1965.
[6] M. Karaman, A. Atalar, and H. Ko¨ymen. VLSI circuits
for adaptive digital beamforming in ultrasound imaging.
IEEE Trans. Med. Imag., 12:711–720, 1993.
[7] H. T. Feldka¨mper, R. Schwann, V. Gierenz, and T. G.
Noll. Low power delay calculation for digital beamform-
ing in handheld ultrasound systems. Proc. IEEE Ultra-
son. Symp., 2:1763 –1766, 2000.
[8] J. A. Jensen, O. Holm, L. J. Jensen, H. Bendsen, H. M.
Pedersen, K. Salomonsen, J. Hansen, and S. Nikolov.
Experimental ultrasound system for real-time synthetic
imaging. In Proc. IEEE Ultrason. Symp., volume 2, pages
1595–1599, 1999.
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on April 12,2010 at 13:54:32 UTC from IEEE Xplore.  Restrictions apply. 
