A new architecture for a compact medical ultrasound beamformer has been developed. Combination of novel and known principles has been utilized, leading to low processing power requirements and simple analog circuitry. Usage of a field programmable gate array (FPGA) for the digital signal processing provides programming flexibility.
Introduction
Making sophisticated technology more accessible is an important consideration in system design. Since digital electronics is rapidly evolving, moving processing functions from analog to digital electronics is a powerful approach which allows for increased flexibility and compactness of the mixedsignal devices.
∆Σ modulation (DSM) [1] is one of the techniques that make it possible to decrease the complexity of the analog interface electronics by using digital logic. A DSM ADC consists of consecutive stages containing low-pass filters and decimators. The reconstructed samples represent the input signal at equidistant time instances. Ultrasound beamformers though require non-regular sampling because of the delay profiles in receive. A number of researchers have tried to incorporate DSM ADCs into ultrasound beamformers. Freeman et al. [2] have developed a modified modulator architecture in order to facilitate the delay profiles in the beamforming without interrupting the modulation process. Since the oversampling ratio (OSR) is crucial for the amplitude resolution of a DSM ADC, the same research group suggested base-band demodulation [3] . Kozak and Karaman [4] have proposed a beamformer featuring DSM with a non-uniform sampling clock.
In the present paper, several novel techniques are combined in a new beamformer architecture. First, sparse sample processing is employed, leading to about 15-fold decrease in the necessary operations as only 512 samples per image line are processed. The samples are chosen at the precise time instances, discretized by the sampling frequency of the DSM. Second, each channel uses a circular buffer at the output of
Input signals One−bit signals Filtered signals Figure 1 : Signal processing of the proposed beamformer, illustrated with 4 channels the DSM for extraction of the necessary data. Thus, the structure does not impose any restrictions or requirements on the order or topology of the DSM, allowing for flexibility and interchangeability of the analog front-end. Third, the delay generation is parametric and allows independent on-the-fly delay generation for each channel. The calculation scheme is inspired by the Bresenham drawing algorithm [5] .
2 Principles behind the suggested beamformer architecture
Rationale for the sparse sample processing
The ultrasound images are displayed on raster devices -CRT or LCD displays, which rarely use a resolution beyond that of a TV (525 lines for NTSC) or VGA (640x480 pixels). Therefore, on such displays a beamformed line in an image is represented by no more than 512 points. Thus, it is sufficient to have the correct envelope and phase of the RF signal in 512 equidistant points along the beamformed line to present a correct B-mode or color-flow image. A similar approach was proposed in a different context by Karaman et al. [6] .
Usage of the Delta-Sigma modulator in the beamformer
The principle of the DSM implies that the appropriately filtered output of a DSM approximates the input signal, and the approximation improves with increasing the oversampling ratio (OSR). If a filter is applied directly on the DSM output stream, valid output samples can be reconstructed at any clock cycle. In this way, the delay resolution in a beamforming process will be equal to the period of the inherently high modulation frequency.
The signal processing is illustrated in Fig. 1 . The analog input signals s k (t) (k is channel index) from different channels are modulated into bit streams q k [n] in the DSM. In order to perform beamforming at a given point indicated by arrows in the plots of s k (t), sequences of bits (shown in black) are extracted from the streams q k [n], at places that correspond to the appropriate channel delays. The length of the sequences is equal to the length of the reconstruction filters that will be used. The selected sequences are summed into sequence r [n] . The latter is then weighted by in-phase h I [n] and quadrature h Q [n] filters to yield selected samples of the in-phaseŝ I [n] and quadratureŝ Q [n] reconstructed streams. The matched filter from the classic beamforming is used as in-phase filter and its Hilbert transform is used the quadrature filter, since they suppress the quantization noise to a sufficient degree.
Delay generation
A delay calculation scheme is suggested that allows on-thefly delay generation. It approximates the analytic delay curve for a given imaged line and receiver element. A similar approach with different calculation scheme has been suggested by Feldkämper et al. [7] for increasing the delay resolution in beamformers.
The geometry behind the delay calculation algorithm is shown in Fig. 2 . The distance to the focus point P along the scan line is denoted d and the echo path is denoted d r . The full path of the ultrasound wave is denoted p. The aperture distance between the emission center and the receiving element is denoted with x and the angle between the scan line and the normal to the transducer surface is denoted ϕ. s c a n l i n e 0 Transducer The echo path d r can be expressed as:
After some transformations the equation
is obtained, which describes the imaged line. The term x sin ϕ is constant for a given line inclination and element and is denoted k. For converting the variables into units of clock cycles (for calculations in hardware), both sides of (2) have to be multiplied by ( 
where the index N denotes that the variable unit is clock cycle. In order to keep the focus on the imaged line, the delay generation logic has to keep f (p N , d N ) as close to 0 as possible, therefore it should increase p N by 1 or 2 for each unit increase of d N . The choice 1 is made by evaluating the sign of the function f (p N + 1, d N + 1). It can be seen that:
and
(5) Therefore the following algorithm is suggested: 
3. If the end of the line is not reached, go to 2.
The described algorithm approximates the analytical dynamic delay curve within ±1 clock cycle.
Beamformer architecture
The suggested beamformer architecture is shown in Fig. 3 . The received RF signal s(t) from every active transducer element is amplified by a variable-gain amplifier (used for timegain compensation) and is converted into high-frequency 1-bit digital signal q[n] in the DSM. That data stream is written to a circular buffer. At time instances determined by the delay generation logic, sequences from the stream are read. After multiplication by the apodization coefficient (the weight) of that channel, the aligned sequences corresponding to a given line point are summed across all channels. The result is then filtered for extracting the in-phase and the quadrature compo-
Implementation tradeoffs and choices
The described architecture was implemented in hardware with the following beamformer parameters: An important design decision is the choice of the in-phase and the quadrature filters. The matched filter from classic beamforming ( time reversed excitation convolved twice with the impulse response of the transducer) provides excellent suppression of the quantization noise. The length of the filters though is constrained by the amount of clock cycles that are available for producing a reconstructed sample. That number is inversely proportional to the density of the beamformed points. For instance, if 512 points should represent a depth range of 0.15 m, there are between 26 and 53 clock cycles available to the filter block for producing in-phase and the quadrature reconstructed samples.
In the current design, the filtering operation is parallelized in four, so in-phase and quadrature filters with length up to 104 can be used. A perfect matched filter for the simulation setup has length of 168. Therefore, a number of pseudomatched filters were investigated. The frequency response of the perfect matched filter and that of a shortened one (3 central frequency sinusoids, Hamming window weighted) is shown in Fig. 4 .
The point spread functions (PSF) of single scaterers in the transmit focal point were obtained through simulations using the two mentioned filters. They were compared against the reference beamforming PSF (Fig. 5) .
The implementation target is the Xilinx Virtex-E FPGA device family which features quite a high number of dualported fast SRAM that can be used as buffers for the DSM output stream. A number of specific design choices are made:
1. The output word of the delay buffer is wider than the input one. This speeds up the reading and allows for parallel processing of the data at the highest possible clock rate. 3. The output data from the DSM is a 1-bit wide in the current implementation and the apodization does not require multiplications. It uses one register instead.
4. The sum operation across all channels is pipelined in order to incorporate numerous inputs and to process them at high clock frequency. The multiplication operation is pipelined also and works at the modulation clock frequency.
5. A chain of beamformers can be used, each of them receiving partially beamformed sample from a neighbor, summing it with its own partially beamformed sample, and passing it further on.
Phantom data processing results
A set of element traces sampled at 40 MHz was obtained using the experimental sampling system RASMUS [8] . That data was resampled at 140 MHz and 200 MHz, and was beamformed according to the suggested architecture. A comparison between the beamforming approaches is shown in Fig. 6 . The element number is 32 and the F-number is between 2.5 and 10. It can be seen that the quantization noise of the DSM limits the picture contrast and increasing the OSR improves that. Improvement can also be achieved by employing a more sophisticated modulator architecture.
Conclusion
A novel flexible beamformer architecture utilizing DSM is suggested. The beamformer can be housed in one standard FPGA, which can easily be programmed and upgraded. Combined with a simple analog front end, the whole design can be implemented by three chips (one of them containing the transmit amplifiers). A standard portable PC can be used for display, making it a very inexpensive system. 
