Abstract-A compact medical ultrasound beamformer architecture that uses oversampled 1-bit analog-to-digital (A/D) converters is presented. Sparse sample processing is used, as the echo signal for the image lines is reconstructed in 512 equidistant focal points along the line through its in-phase and quadrature components. That information is sufficient for presenting a B-mode image and creating a color flow map. The high sampling rate provides the necessary delay resolution for the focusing. The low channel data width (1-bit) makes it possible to construct a compact beamformer logic. The signal reconstruction is done using finite impulse reponse (FIR) filters, applied on selected bit sequences of the delta-sigma modulator output stream. The approach allows for a multichannel beamformer to fit in a single field programmable gate array (FPGA) device. A 32-channel beamformer is estimated to occupy 50% of the available logic resources in a commercially available midrange FPGA, and to be able to operate at 129 MHz. Simulation of the architecture at 140 MHz provides images with a dynamic range approaching 60 dB for an excitation frequency of 3 MHz.
conversion using little chip area, provides robust performance, and is compatible with the digital CMOS fabrication process. The dynamic range of the conversion depends to a great extent on the selected oversampling ratio. Presently, converters based on the DSM principle are widely used in audio applications, and their extensive use in video and high-frequency applications is a matter of time, depending to a large extent on the progress in integrated circuit technology.
In this paper, a novel extendable beamformer architecture for use with oversampled 1-bit A/D converters will be presented. It allows a complete 32-channel beamformer to be implemented using a single, standard field programmable gate array (FPGA) chip.
In Section II the memory requirements and the necessary processing power is assessed for a conventional digital beamformer architecture. The principles behind the new architecture will be described in Section III. The performance of the architecture is compared to the conventional beamformer performance in Section IV by processing synthetic and real ultrasound echo data. The implementation choices are described in Section V. The potential benefits and limitations of the architecture are discussed in Section VI.
II. Conventional Beamformer Architecture
In the commonly used ultrasound scanners, images are created line by line. A focused ultrasound pulse with central frequency f 0 of 3 to 12 MHz (for general applications) is transmitted into the tissue along a particular beam line. An image line then is created by continuously focusing along that beam line in receive.
A typical architecture of a modern digital receive beamformer is shown in Fig. 1 . The received echoes are digitized at a frequency of 20 to 60 MHz (usually at four times f 0 ) and stored in a delay buffer. At each clock cycle, appropriately delayed samples from each channel are chosen and combined, using a weighted sum, into a focused sample. The delay applied to each channel is calculated as the difference in the times of flight from the current focal point to the receive element for that channel and to the phase center of the aperture. The delay resolution, the quantizer precision, and the apodization of the aperture determine the quality of the beamforming [5] - [7] . For achieving sufficient delay resolution, interpolation between samples is used. After summing, the samples pass through a matched filter whose function is to maximize the signal-to-noise ratio (SNR) of the signal. The envelope of the signal is calculated as the square root of the sum of the squares of the in-phase and the quadrature (90-degree phase-shifted) components. The most accurate way of obtaining the quadrature component is to pass the echo signal through a Hilbert transform filter, because it provides 90 degree phase shift at all frequencies. After that stage, decimation may be applied so that less data have to be processed in the subsequent stages. The envelope then is compressed logarithmically and put into an image buffer as an image line. An image typically consists of 100 or more lines. Scan conversion is applied to map the data to the rectangular image display on a screen. The in-phase (the original) and the quadrature components are used further for flow estimation.
To perform dynamic receive focusing, a digital beamformer needs one sample index and one inter-sample precision parameter per produced sample for every contributing channel. These two parameters can merge naturally into one index with subsample precision that will be decoded by the focusing logic. Another parameter is the weighting (apodization) coefficient for each channel. To maintain a constant F-number and minimize sidelobe levels, the apodization function changes with depth.
A beamformer that reconstructs all samples along the beam axis has to produce:
samples, where f s is the sampling frequency of the analogto-digital converters (ADC), c is the speed of sound, and d is the image depth. If no optimizations are used with respect to memory, a N -channel digital beamformer that produces L lines per image, with image depth d, has to store PNL index values and PNL weight coefficients. The necessary calculations per channel are as follows: two multiplications per channel per sample (which means per clock cycle) in the case of linear interpolation, one addition for producing the contribution from that channel. If better interpolation is desired, an interpolation filter is used, and more multiplications and additions are necessary. The apodization can be implemented either by using one additional multiplication or by including the apodization coefficient in the interpolation coefficients. The sum of all channels is obtained using an inverted binary tree of pipelined adders with The matched filtering is performed using a FIR filter with K coefficients, so K multiplications and K − 1 additions are needed per reconstructed in-phase sample. If the quadrature component is created using a Hilberttransformed matched filter, the same number of operations are needed for that too. Because the in-phase and quadrature signals can be used directly in an autocorrelation blood velocity estimation scheme, the further processing for generating flow estimation data will not be considered. The reconstruction of the in-phase and quadrature component requires 2 KPNL multiplications and (K − 1) PNL additions per image.
In a typical imaging situation, the image depth could be up to 20 cm. For a sampling frequency of 20 MHz and speed of sound c = 1540 m/s, the number of samples to beamform is P = 5195. The corresponding amount of memory for a 64-channel system-making 100 lines per image, using 8-bit coefficients and 16-bit index-is approximately 126 MB. In practice, algorithmic approaches are sought and applied [8] - [10] , which result in reductions by several orders of magnitude in the memory requirements.
If the transmission follows immediately after reception from the 20 cm depth, the pulse repetition frequency (image line rate) is 3850 Hz, and the frame rate is 38.5 Hz.
The matched filter for the received echo signal in a conventional imaging situation will be the emitted signal convolved twice with the impulse response of the transducer. If the excitation is two cycles of a sinusoid at 5 MHz, and the transducer has a 60% bandwidth of about 5 MHz, the length of the matched filter K is 37. With the given parameters, the beamforming requires ≈ 2.57·10 9 multiplications per second and ≈ 2.52·10 9 additions per second. The matched filtration requires 2(K + 1)PNL ≈ 97 · 10 9 multiplications per second and KPNL ≈ 47.8 · 10 9 additions per second. Real-time processing, therefore, is possible only with dedicated integrated circuits today.
III. Techniques for Compact and Efficient Beamforming
Digital beamformers offer high image quality and flexibility at the expense of using a lot of computational resources. Optimization of the signal processing can lead to significant savings in power, chip area, and cost. In this section, the principles behind a new, efficient architecture will be described.
A. Sparse Sample Processing
In modern scanners, much more information is processed than what is actually displayed on screen. The ultrasound images are shown on raster displays, which rarely have a vertical resolution beyond that of a television (525 lines for NTSC) or VGA (640 × 480 pixels). Therefore, on such displays an image line is represented by no more than several hundred pixels. Sparse sample processing in the form of pixel-based focusing was proposed by Karaman et al. [11] . According to that approach, samples are produced for focal points that correspond to raster display pixels. These focal points generally do not lie on the straight line representing the beam axis, except for linear array imaging; therefore, the information for focusing is hard to derive in a recursive fashion. The present approach processes samples that correspond to equidistant focal points lying on the beam axis. In this way, it is possible to calculate the focusing information in a recursive fashion.
The achievable image depth with an ultrasound scanner is determined by its transmit power, the level of the noise introduced by the analog front end, the number of channels and the filters used. These determine the SNR budget of the sampling system. For an imaging situation in which the frequency-dependent propagation attenuation in tissue is b dB/(cm·MHz) and the SNR budget is A decibels, the image depth at which the SNR becomes 0 dB is:
where f 0 is the excitation frequency. The achievable image depth can be expressed in wavelengths (λ = c/f 0 ) as follows:
The axial resolution of an imaging system can be evaluated by creating the image (point spread function) of a point reflector. The expected echo signal in that situation is the excitation waveform convolved twice with the transducer impulse response. The matched filter applied on the received radio frequency (RF) data in this calculation is the time reversal of the expected echo signal. The image data is produced by filtering the echo signal with the matched filter and calculating the envelope of the result. For a case in which the excitation waveform is one period of a sinusoid at a central frequency f 0 and the transducer has 60% bandwidth measured at −6 dB around the same frequency, the imaging system axial resolution at −3 dB is approximately 1.87λ and approximately 2.67λ at −6 dB.
For avoiding loss in signal information, the distance between the reconstructed samples has to be less than the calculated axial resolution. From the calculations for the achievable image depth and for the axial resolution, the necessary number of samples per line can be calculated. For a sampling system with SNR budget of 150 dB, operating at 3 MHz in a medium with b = 1 (cm·MHz) and c = 1540 m/s, d λ is approximately 487. Sampling that distance at each λ requires 487 samples to be reconstructed along the image line.
As is the case for the conventional beamforming, the reconstruction of the envelope of the signal requires its in-phase and quadrature components. In the sparse sample processing approach, the quadrature signal cannot be produced by filtering of the in-phase signal because the latter is an undersampled representation of the echo signal. Therefore, both components have to be created at the same stage. This is achieved by using in-phase and quadrature reconstruction filters, as explained below.
B. Beamforming Using Oversampled Signals
A delta-sigma modulator approximates the input signal by feeding back the error into the decision loop, and shaping the quantization noise spectrum away from the band of interest. Appropriate filtering applied on the modulator output bit stream suppresses the noise, and valid samples can be reconstructed at the same or lower sampling rate.
Because performing a large number of multiplications and summation at a high clock frequency is not economical and the target data rate is much lower than the DSM sampling frequency due to the necessary oversampling, the output samples usually are produced by passing the modulator output through consecutive stages of comb filtering and decimation.
The oversampling conversion offers several advantages for ultrasound beamforming over the use of multibit ADC. First, the delta-sigma modulators can be integrated in large numbers on a chip, with the requirement of one input and one-bit output per modulator. Second, the intersample interpolation that is used with multibit flash ADC can be avoided because the delay resolution of a DSM beamformer is determined by the sampling rate of the modulators, which is inherently high. Third, the time-gain compensation and/or channel weighting can be incorporated to a certain degree (25 dB of gain range has been demonstrated [12] ) in the modulator by varying the amplitude of the feedback voltage.
The reconstruction process in DSM beamformers can take place after summation of the aligned echo signals from the channels. Although the reconstruction in this case is applied on a multibit stream, the implementation is still more compact than in the case in which separate sample reconstruction is performed on each channel.
In dynamic receive focusing, only one channel (corresponding to the transducer element from which the beam/line originates) can have a linear delay development in time. All other channels have nonlinear delay development; therefore, samples from the DSM output streams have to be skipped or repeated. This introduces errors in the reconstructed values from the beamformer.
Previous Approaches for Oversampled Beamforming:
Freeman et al. [13] developed a modified modulator architecture in which the amount of feedback of the modulator is controlled by the delay logic of the beamformer in order to compensate for the skipped/repeated samples due to focusing. Such an architecture requires specially designed modulators and, therefore, cannot be easily upgraded with improved generic modulators.
Kozak and Karaman [14] proposed sampling with nonuniform sampling clock, specific for each channel, so that the delays are incorporated and all channels produce the same number of samples per image line. That solution requires a large memory for controlling the sampling clock. Also, it does not attempt to compensate for the introduced discontinuities in the DSM bit streams.
Both of these approaches come close to using the performance potential of the oversampled converters, at the expense of more complex beamformer structure, and by disrupting the modulation process.
Approach with Preserved Modulation Process:
The new oversampled beamformer architecture differs from the previously developed ones in that only the necessary amount of samples for display are reconstructed, using FIR filters that yield in-phase and quadrature signal components.
The signal processing is illustrated in Fig. 2 . The analog input signals s k (t) (k being the channel index) from different channels are modulated into bit streams q k [n] in the DSM. In order to sum the echoes coming from a certain focal point (indicated by arrows in the plots of s k (t)), sequences of bits (shown in black) are selected from the streams q k [n], at positions that correspond to the appropriate channel delays. The length of the sequences is equal to the length of the reconstruction filters that will be used. The selected sequences are summed into sequence r[n], which then is filtered by the in-phase h I [n] and quadrature h Q [n] filters to yield in-phaseŝ I [n] and quadratureŝ Q [n] components of the signal from the chosen focal point. In Fig. 2 , all possible reconstructed in-phase and quadrature samples are shown in gray. In accordance with the sparse sample reconstruction approach, only one out of several tens of possible samples is reconstructed. The in-phase and quadrature components of the signal convey information about its phase. Subsequent sample reconstructions for the same position reveal the presence and the amount of phase change in the echo signal from that position and can be used for velocity estimation.
C. Reconstruction Filters
In general, the DSM reconstruction filter has to be inversely matched to the noise transfer function (NTF) of the modulator, e.g., if the NTF is band-rejecting (pushing noise away from a given center frequency), the filter should be band-pass with the same center frequency.
The best filter for a known signal in the presence of white noise is the matched filter, which is a time-reversed and delayed version of the expected signal [15] . In an ultrasound beamformer, the expected signal from a point reflector is the transmitted excitation convolved twice with the impulse response of the transducer. Because the amplifiers in transmit and receive have much greater bandwidth than the transducer, their impulse response is not a limiting factor and is not taken into account.
The matched filter for the expected echo signal should be able to filter out the quantization noise because it has a band-pass transfer function centered around the central frequency of the useful signal, as shown on Fig. 3 . The transfer function of the matched filter drops below −60 dB for frequencies above twice the central frequency. Therefore, the matched filter, sampled at the DSM sampling frequency, is used as an in-phase reconstruction filter, and a Hilbert transformation of it is used as a quadrature reconstruction filter.
IV. Image Quality
The image quality of the proposed beamformer was compared to that of a conventional digital beamformer. First, the necessary oversampling ratio (OSR) was calculated. Second, echo signals were processed using oversampled beamforming and using conventional beamforming.
A. Calculating the Necessary Sampling Frequency
The target image quality parameters and the number of channels are shown in Table I .
The delay resolution of a beamformer has a high impact on its ability to focus in a given direction, while rejecting signals from other directions. According to the most restrictive of the published formulae, given in [6] , the worst-case discrete quantization sidelobe level (due to periodic phase errors over the array) in a beamformer is described as:
where:
is the incoherent power gain of an N -element array with apodization coefficients w n , n = 1 . . . N, ϕ is the beam angle from the normal, λ is the wavelength, m = fs f0 is the ratio of the sampling frequency and the central frequency, L is the aperture size, and r is the distance along the beam.
The maximum random quantization sidelobe level (due to random phase errors over the array) is:
is the equivalent noise bandwidth, (CPG being the coherent power gain of an N -element array). The maximum sidelobe level is SL max = max(SL focus , SL peak ).
Calculating these values for the particular case, the following results are obtained: In the near field the random quantization sidelobes are prevalent, and for achieving sidelobe level of −60 dB, the necessary delay resolution should be 25.6 times smaller than the period of the ultrasound pulse. The calculated f s = 76.8 MHz provides −30 dB sidelobe level in transmit and −30 dB sidelobe level in receive.
Apart from the sidelobe level, the sampling frequency also determines the level of the quantization noise. The quantization noise power of a multibit ADC with quantization step δ, assuming white noise, is:
The coherent sum of the signals across the array would sum up the signal amplitudes and the channel noise powers. Therefore, the SNR improvement in the summed signal will be determined by the apodization profile as follows:
A 64-channel array with Hamming apodization can provide G SNR ≈ 16.7 dB, while uniform apodization yields 18 dB. That gain in SNR relieves the requirements toward the sampling frequency.
In the following calculations, formulae for the SNR of a DSM modulator from Johns and Martin [16] and Norsworthy et al. [17] are used.
Having the requirement for 60 dB signal SNR after summation and array contribution of 16.7 dB, the channel SNR has to be 60 − 16.7 = 43.3 dB. Using a second order modulator, the necessary oversampling ratio defined as OSR = [17] , the necessary OSR is estimated to be about 19. For the desired application regarded in this paper, f 0 = 3 MHz and the upper limit of the bandwidth of interest is f high = 1.3 × f 0 = 3.9 MHz. Therefore, the necessary sampling frequency according the more strict requirement is f s ≈ 148.2 MHz.
Because an expanding aperture will be used, combining only several channels should provide sufficient SNR. The chosen initial number of channels is four and, by summing their signals, the noise is suppressed by 6 dB. The remaining 54 dB of SNR can be obtained with an oversampling ratio of 32 (using Figure 4 .13 in [17] ). That translates to a sampling frequency of 249.6 MHz.
The chosen target sampling frequency for the simulations and implementation was 140 MHz. For that fre- quency, the image SNR was expected to be close to 60 dB when all channels are in use.
B. Simulation Results
The ultrasound field simulation program Field II [18] was used for generating echo data from scatterers at different depths. The simulation parameters are given in Table II . The echo signals then were beamformed using floating-point beamforming and using a DSM beamformer. The apodization was applied before DSM (i.e., in the analog domain), and was not quantized. It did not vary with depth.
Point Spread Function:
The point spread functions (PSF) obtained by conventional and oversampled beamforming are shown on Figs. 4 and 5. As can be seen, the resolution is approximately the same, and the noise level in the DSM beamformation lies at about −60 dB due to quantization noise.
Blood Flow Simulation:
Due to the sparse sample processing, flow estimation on DSM beamformed data can be performed only using an autocorrelation approach. The suitability of the DSM beamformation for flow estimation was evaluated by simulating parabolic flow below a transducer and creating the velocity profile along the normal to the transducer. The parameters of the imaging setup, including excitation and matched filters, are the same as in the PSF simulation. The characteristics of the simulated flow phantom and the pulse repetition frequency are given in Table III . The phantom did not contain any stationary scatterers. The conventional beamformation was preceded by quantizing equivalent to that of a 12-bit ADC.
The echo signals were scaled to −30 dB relative to the maximum possible input signal amplitude for the corresponding A/D converters. No stationary echo canceling was applied as there were no stationary scatterers. The results from flow estimation using conventionally beam- formed data and DSM beamformed data are shown in Fig. 6 . It can be seen that the shapes of the velocity profiles obtained through oversampled and conventional beamforming for a given number of firings are similar, which shows that the DSM beamforming with sparse sample processing can replace conventional beamforming successfully.
C. Phantom Image Comparison
A set of echo RF data, sampled at 40 MHz, was obtained using the experimental sampling system RASMUS [19] . The target was a tissue mimicking phantom model 525 (Danish Phantom Design, Jyllinge, Denmark) with at- tenuation coefficient of 0.5 dB/(MHz·cm). The phantom consisted of randomly distributed background scatterers (backscattering material) and wire targets. The transducer was Vermon PA35/3D (Vermon, Tours, France). It is a rotating phased array, here used without rotation. An aperture of 40 adjacent elements was used. That data was resampled at 140 MHz and was beamformed according to the suggested architecture. The result, along with a conventionally beamformed image, is shown in Fig. 7 .
V. Implementation
In order to obtain performance and logic utilization figures for the suggested architecture, it was implemented in the hardware description language VHDL and synthesized with target FPGA device XCV2000E-7 (Xilinx, Inc., San Jose, CA). The functional blocks were tested only separately for correct operation. In this section, the implementation parameters, choices, and results will be described.
The structure of the beamformer is illustrated in Fig. 8 . The functional blocks of a channel are sample buffer, apodization multiplier, and delay/weight generator. The channel outputs are connected to a pipelined adder, followed by in-phase and quadrature filters. The target beamformer parameters are shown in Table IV. The excitation was chosen to be the same as in the simulations.
The length of the filters for the in-phase and quadrature components is constrained by the number of 140 MHz clock cycles that are available for producing a sample. That number (denoted form here on N r ) is inversely proportional to the density of the beamformed samples. For illustration purposes, its maximum value for a given imaging setup can an be calculated as:
where d max is the image depth, N s is the number of reconstructed samples, and c is the speed of sound. The minimal available number of clock cycles is observed for the outer channels, between the first and the second read operation they have to perform. That is the number that is used in the calculation for the size of the reconstruction filters.
With the given image geometry and sampling rate, the minimum number of available clock cycles between two consecutive reconstructed samples is 33, when using expanding aperture (maintaining F-number of 1 until all elements are used).
The desired length of the FIR filters is 168 coefficients if they should represent the matched filter for the chosen excitation and transducer impulse response. That length was obtained by truncating the tails of the matched filter 40 dB below its maximum amplitude. Therefore, the processing path is parallelized by four, which allows shorter, approximately matched FIR filters of length up to 132 coefficients to be used. The options for the parallelization factor is discussed further in Section VI.
A. Delay Buffer
For the FPGA implementation of the sparse sample processing beamformer, using a Xilinx FPGA device is beneficial because it incorporates quite a large number of dualported memory blocks called Block SelectRAM+ that provide simultaneous read and write capability with different word sizes. In the 4× parallelized case, the single bit samples are written one at a time but are read four samples at a time.
Because the requested start address (from the delay generator) for the read operation is specified with onesample precision, an alignment unit has to be used so that the first produced four-sample word from the buffer memory contains samples 1 to 4, starting with the specified address; the second, samples 5 to 8, and so on. Such an alignment unit is created using a set of eight, two-stage latches. The structure of the sample buffer and the alignment unit is shown in Fig. 9 . The two least significant bits of the start address determine the multiplexer positions in the alignment unit during the present read sequence. The more significant address bits are used as a read address for the four-sample words and are increased by one in each clock cycle. In the first clock cycle after a valid address is selected, an initial 4-bit word is read into the alignment register. On every following clock cycle, the four bits that are read from the sample buffer are shifted by four positions. That register provides a valid, aligned 4-bit word after the second clock cycle. Thus, in 33 clock cycles, up to 128 samples can be read.
B. Delay Generation Logic
The authors presented several delay generation techniques with reduced memory requirements in [10] , and an analytical recursive delay generation algorithm developed by Feldkämper et al. [9] was adopted. Efficient approximate recursive algorithms are also known [20] .
The delay generator logic generates independent sample indexes for each channel. These sample indexes are used as start addresses for the reading from the sample buffer. Because the sample buffers are organized as circular buffers, care should be taken to avoid overwriting data that is about to be used at a later time instant. This is done by either using sufficiently large sample buffers or limiting the maximum delay (index difference) between channels. Using an expanding aperture in receive effectively accomplishes the latter.
The computation logic for the delay generator consists of four adders and one comparator plus control circuitry. The number of parameters per line per channel is four (12-bit words).
C. Channel Apodization
The apodization (aperture smoothing, tapering) can be applied either on the digital data or in the analog domain before DSM. Varying the gain of the preamplifiers or the DSM on each channel requires one additional digital-toanalog converter (oversampled or using pulse-width modulation) and output line per channel, which complicates the beamformer structure.
Applying the weight coefficient in the digital domain means that, after the apodization block, the channel data bit-width is equal to that of the weight coefficient (the DSM stream had width of one bit before that). Thus, the sum operation across the channels is performed on multibit numbers rather than on 1-bit numbers. For maintaining high-operation speed, the adders are pipelined and their latency increases.
The channel weighting block consists of two 5-bit registers containing representations of the current weighting coefficient and its 2's complement. The value of the modulated signal (1 or 0) determines which register content will be used in the summation across channels that follow. The weighting coefficients are generated in a recursive fashion, using the same calculation scheme and entry parameters as the delay generation logic [21] .
D. Sum Across the Channels
The sum operation across all channels is pipelined in order to incorporate numerous inputs and to process them at high clock frequency. The first stage in the pipeline contains 5-bit adders that sum the weighted outputs from the channels. The adder pipeline is five levels deep and the output is 10-bits wide.
E. In-Phase and Quadrature Filters
The 120-tap filter structure is illustrated in Fig. 10 . The 10-bit coefficients are stored in random access memory (RAM) blocks of the FPGA and can be reloaded from an external source, for example a computer. The quadrature filters use the same filter structure and are applied simultaneously to the sum data.
F. Implementation Results
The software package Xilinx ISE Series 4.2i was used in combination with Synopsis FPGA Express (Synopsys, Inc., Mountain View, CA) for compiling the VHDL code. After compilation, the estimated gate count for the 32-channel beamformer is 1,274,116.
The estimated maximum operation frequency of a 32-channel beamformer for target device Virtex E XCV2000E-7BG560C by Xilinx, Inc., is 129 MHz. That estimate takes into account only the logic switching delays. After taking into account the signal routing delays, the estimated maximum operation frequency is 71.6 MHz. The estimated power consumption of the beamformer logic for a clock frequency of 140 MHz is 1.4 W.
The highest number of beamformer channels that can fit in the XCV200E device is 57, at which point the beamformer suffers a severe performance drop due to complex and suboptimal routing of signals.
Several approaches exist for achieving higher operating frequency. One of these is to use a faster FPGA device. Another is to exert more control over the placement process in order to keep the routing lengths (and signal delays) low. If close placement of logic block is not possible but increased latency is acceptable, registers can be inserted manually at appropriate places.
VI. Discussion
From the simulation plots, it can be seen that the quantization noise of the DSM limits the image contrast. Improvement can be achieved through using a more sophisticated modulator architecture or increasing the OSR. With the suggested beamformer architecture, it is easy to connect higher order delta-sigma modulators with the same output data size without changes in the beamformer. The data flow principle allows straightforward expansion (reimplementation) for accommodating modulator data widths of two or more bits. Beamforming multiple beams in one firing cycle can be done by connecting several beamformers in parallel. In such a setup, each 1-bit modulator output should be connected to the corresponding inputs in different beamformers. The fact that single bit digital signals are propagated is very convenient for this kind of expansion.
Use of longer excitations or full length matched filter would necessitate wider parallelization, i.e., the matched filtering will have to be implemented with more multiplication blocks working in parallel. In the selected target FPGA, the next suitable parallelization factor after four is eight, because the available output word widths for the Block SelectRAM+ can be powers of only 2. Other parallelization factors can be implemented if the memory is read at a higher rate and more complex alignment logic is used. With increased filter length, the architecture allows beamforming with coded excitation signals, e.g., chirps. Because of the higher noise level in the image beamformed using oversampling, the corresponding velocity estimates contain higher error compared to the conventional imaging. Improvement in this area can be achieved by using samples with a lower level of the quantization noise. The ways to achieve that have been outlined above.
The operation speed (and the OSR) of the architecture can be increased by using a faster FPGA device. The large difference in the operation frequency estimates shows that the size of the design has negative influence on the achievable performance, unless manual placement is used to minimize the longest paths.
VII. Conclusions
A novel, flexible beamformer architecture using oversampling has been presented. A 32-channel beamformer can be implemented in one standard FPGA, which can be programmed easily and upgraded. Such a beamformer offers significant space reductions compared to a conventional multibit beamformer and can be used for building an efficient and compact ultrasound scanner. He is also developer of the Field II simulation program. He has been a visiting scientist at Duke University, Stanford University, and the University of Illinois at Urbana-Champaign. He is currently full professor of Biomedical Signal Processing at the Technical University of Denmark at Ørsted•DTU and head of Center for Fast Ultrasound Imaging. He has given courses on blood velocity estimation at both Duke University and University of Illinois and teaches biomedical signal processing and medical imaging at the Technical University of Denmark. He has given several short courses on simulation, synthetic aperture imaging, and flow estimation at international scientific conferences. He is also the co-organizer of a new biomedical engineering education program offered by the Technical University of Denmark and the University of Copenhagen. His research is centered around simulation of ultrasound imaging, synthetic aperture imaging and blood flow estimation, and constructing systems for such imaging.
