Abstract-This work presents a novel method for digital ultrasound beamforming based on programmable table look-ups, in which vectors containing coded focusing information are efficiently stored, achieving an information density of a fraction of bit per acquired sample. Timing errors at the foci are within half the period of a master clock of arbitrarily high frequency to improve imaging quality with low resource requirements. The technique is applicable with conventional as well as with ∆Σ converters.
I. Introduction
I n the last two decades, many factors have motivated the research on digital ultrasound beamforming for phased arrays. Among them, the increasing availability of highperformance analog and digital electronic devices has allowed the migration from the pioneering technologies to the digital domain. Furthermore, the demand for more compact and less power-consuming equipment, the need of higher integration levels specially for two-dimensional (2-D) transducer arrays, and portable instruments together with the objective of reaching the highest image quality in real time at low cost, have driven many research works.
Most beamformers operate in emission by appropriately delaying and, possibly, weighting the excitation of the transducer elements, to build a narrow ultrasound beam steered to a given direction [1] . In reception, the function of a digital beamformer is to combine the signals received from the set of transducer elements to build a focused Ascan at all depths in a given direction (dynamic focusing with steering). To this purpose, the timing of the receiving electronics ideally should be continuously varied so that the contributions of the signals received by every element to the coherent sum correspond to echo signals coming from the same spatial point.
In practice, there are not known means to perform this function continuously, and several approaches have been proposed to approximate the ideal by time quantization. Several authors have addressed the issue of the allowed timing error for a given image quality [2] - [5] . From these studies, it is commonly assumed that timing resolution must be kept below 1/32 to 1/16 the fundamental period of the signal, which is equivalent to about four to eight times the Nyquist sampling frequency. Because the increased cost and power consumption at such a high sampling rate with parallel multibit A/D converters, other approaches have been followed. A widespread technique samples at the Nyquist rate the radio frequency (RF) signal and uses interpolation to get the required timing resolution [6] - [8] . Applied to base-band signals, a reduction of the sampling rate by a factor of three or four is achieved. However, this requires complex demodulation for highest performance [9] , [10] , or quadrature sampling if narrowband transducers are used, or a decrease on image quality is allowed [11] . In these cases, the acquired samples are passed thorough an interpolator or a phase rotator and are not used for direct summation.
A different approach generates an individual sampling clock for every acquisition channel, whose phase is varied so that the signal is sampled at the assumed arrival instant from every focus in the steered direction [12] , [13] . Following this method, no redundant information is acquired neither processed, no interpolation is required and distributed small first-in first-out (FIFO) memories automatically perform the signal alignment for the coherent summation [14] .
More recently, single-bit delta-sigma (∆Σ) modulators have been proposed, offering several advantages, among them the possibility of integration of the A/D converters of several channels together with the digital beamforming circuits into a single chip [15] , [16] . The high oversampling rate used (on the order of 32), allows one to extract bit sequences long enough to reconstruct the instantaneous analog value at the focusing points [17] . As in the former case, a controlled-phase clock can be used to fix the start of the bit sequence.
These phased clocks require a time resolution of the order of 1/32 the signal period for a good image dynamic range. Note, however, that the time interval between output samples can be kept at the Nyquist limit, about 1/4 of the signal period. This reduces the amount of information to be processed, as well as the speed required for the electronics.
A simple approach to phased clock generation is to build a 1-bit, look-up table for every channel and steering direction, marking with a 1 the sampling instants. However, this requires a large amount of fast focusing memory [18] .
An elegant solution computes the sampling instants on the fly using a set of logic blocks involving adders, registers, and multiplexers [19] , [20] . Being a logic-intensive hardwired algorithm, its main drawback is the lack of flexibility for some applications. For example, phase aberration correction [21] would require small modifications to the sampling instants to compensate for small velocity variations along the propagation path, layered structures present discontinuities that invalidate the method, etc.
We propose a different approach here that takes advantage of the availability of memory together with logic in field programmable gate arrays (FPGAs). The balanced use of memory and logic leads to an efficient utilization of silicon area and provides a higher degree of freedom than with other options.
The technique, called progressive focusing correction (ProFoC), is based in coding the sampling instants in a very compact form. The depth of focus can be used to improve the information density to a very small fraction of bit per sample by dynamically changing the distance among foci. The ProFoC technique can be used with conventional A/Ds as well as with ∆Σ modulators, being suitable for 1-D and 2-D array transducers. This paper focuses on the timing errors resulting from the ProFoC technique under different conditions, the required resources, and some operating strategies. In particular, the ProFoC technique is suited for implementation in commercially available FPGAs with reconfiguration capabilities (i.e., SRAM based), exploiting its flexibility and operating modes to address different application needs as well as for research purposes.
To demonstrate these capabilities, a prototype was built, housing all the beamforming functions for eightelements in a single FPGA (Spartan-II XC-2S200, Xilinx Inc., Norwood, MA) using conventional 10-bit A/Ds (AD9318, Analog Devices, San Jose, CA). The system is scalable to any number of array elements by module pipelining.
The theoretical basis for the new method is described in Section II. Section III shows their strengths and limitations. Section IV addresses the dynamic version of the ProFoC technique, which achieves an important reduction on focusing memory requirements. In Section V, the imaging and research possibilities offered by the new technique are addressed. Implementation is the subject of Section VI and extracted conclusions are summarized in Section VII. 
II. The Static Progressive Correction Focusing Technique
A two-step interpolation procedure is commonly followed to achieve the high time resolution required by large, dynamic-range, wide-band beamformers. A coarse delay is provided first with a time resolution of a sampling period; then a fine delay is introduced by up-sampling by a factor M (typically M = 8 to 16). This step is usually carried out by inserting M − 1 zero-valued data between two consecutive samples, performing the coherent sum, low-pass filtering the output sequence and down-sampling by decimation [ Fig. 1(a) , for a single element]. In this process, the circuits in the shaded area must perform at high speed, that is, at M times higher than the sampling frequency. Furthermore, coding of the coarse and fine delays usually require multibit control words for every focus and/or a rather complex circuitry.
By contrast, in the approach followed in this work, the signal received by every element is sampled at time instants determined by an individual clock generator that follows the timing of the ultrasound propagation path from the array to every focus and to every element [ Fig. 1(b) ]. Besides requiring no interpolation, focusing information can be coded with a single bit, as it is shown below. This can be stored in a small look-up table with very simple decoding circuitry, which is the only part requiring highspeed devices; the remaining components operate at the sampling frequency.
For simplicity, consider an N -element linear array in a homogeneous medium with sound velocity c. Foci are defined with a constant interval ∆R starting at some minimum range R 0 (Fig. 2) . The beamformer operates with a master clock, whose period T X determines the timing resolution. It is convenient to choose:
where ν is an integer indicating the number of master clock periods between two consecutive foci. The output sampling period T S should be conveniently chosen as an integer multiple of the master clock period, T S = sT X . This way, the number of samples acquired between two consecutive foci is also an integer, m = ν/s. When m = 1, all the output samples are focused. The round trip time from the origin to a focus F i at (R i , θ) and to the element at x k is:
The time interval at element k between the arrival of echoes from two consecutive ranges R i − ∆R and R i is:
With ∆R small enough:
and, by substitution of (1):
This is a function monotonically growing from R i = 0 to R i → ∞ and independent of sound velocity. Dropping the index i the limits are:
The value x k / x k in (6) denotes the sign of x k , and θ participates with its own sign. In particular, if no steering is carried out (θ = 0), the function of (5) increases from νT X /2 to νT X when R grows from zero to infinity; in the other extreme, 0 ≤ ∆T k ≤ νT X for a ±90
• steering angle. This way, the sampling instant at a focus can be determined by waiting a number of master clock periods in the range [0, ν] from the sampling instant of the precedent focus. If b bits are available to code this interval, with a = 2 b − 1, it will be shown that, for ranges above certain minimum value R 0 , the time interval between two consecutive foci is between ν − a and ν master clock periods, that is:
The right hand of (8) is the same as (7). To verify the left part, it is required:
Using the identity:
which, with:
becomes simplified to: ⎛
and:
By substitution of (12), it is found:
This equation yields the minimum range in which the time interval between two consecutive foci can be coded with b bits, with a = 2 b − 1. Choosing a value R 0 (θ) = max[R 0 (x k , θ)], the first sample of element-k for steering angle θ is obtained at:
This guarantees that all consecutive foci can be acquired with a timing error below half a period of the master clock for all the channels. The set of T 0 values can be used to execute a dynamic aperture function, thus extending the minimum range to nearly zero. For every focus F i at (R i , θ) the sampling instant at element-k is coded as:
where [·] ↑↓ represents rounding to the nearest integer and T ik is given by (2) . The focusing code Q ik is a b-bit integer that indicates the number of master clock periods which must be advanced the sampling instant for focus i at element k from a nominal value ν. The range must be greater than the minimum given by (15) , and the timing errors will be kept below half a master clock period at every focus. However, in practical situations, some common minimum range is chosen for all the steering angles and elements. As R 0 (θ) increases with x k and, if the transducer is centered on x = 0, the worst case is for x k = D/2, that is, half the full aperture. The variation of R 0 (θ) with θ is found by partial derivation of (15) and equating to zero:
where θ m is the steering angle which maximizes R 0 (θ). With the identities:
and substituting in (15) with max(x k ) = D/2, it is found, for all θ:
This expression yields the conservative minimum range from which the ProFoC technique can be successfully applied for all the elements and steering angles using the full aperture. The initial delay required to start using the Q-code vectors for every array element can be found by substitution of R 0 in (16). Table I shows the value of F # = R 0 /D for several values of b and ν. The blank entries in Table I correspond to values in which ν < 2a. Note that function (21) shows a minimum for ν = 2a, so that no gain in F # is obtained by increasing the bit-width of the focusing codes in the range a < ν < 2a.
Increasing the width of the focusing codes with ν > 2a allows the choice of larger values for ν with a marginal effect on the F # value. Furthermore, focusing memory requirements can be minimized by an appropriate selection of the b and ν values. For example, the number of focusing memory bits used for a given range when b = 2, ν = 16 is half those required with b = 1, ν = 4, for a similar minimum F # value. However, several other factors must be considered for setting up these parameters as will be explained.
III. Application of the Static Approach
It can be useful to show by means of a typical example the strengths and limitations of the static approach just described, as well as the general procedure to set up a given application. To this purpose, let us consider a f R = 5 MHz, N = 128-element array with an inter-element pitch d = λ/2, in a medium with sound velocity c = 1470 m/s. The master clock frequency is chosen to be f X = 160 MHz, the maximum inspection depth is R max = 150 mm, and the maximum steering angle is θ max = ±45
• . The relative time resolution µ = f R /f X = 1/32 is low enough for high dynamic range imaging.
From a practical point of view, the sampling frequency f S is chosen at about 4 times the transducer fundamental frequency just to meet the Nyquist criterion and, conveniently, as a submultiple of the master clock frequency. In this case, f S = 20 MHz satisfies both criteria, being the sampling period T S = sT X with s = 8. The s value puts a limit to the minimum of ν given ν = m · s, where m is the number of samples per focus. When all the samples are focused, ν = s.
The width of the focusing codes can be chosen as an application parameter. Here, for m = 1, the set b = 1, ν = 8 allows operation with F # = 0.76; and b = 2, ν = 8 allows imaging with F # = 0.52, for better resolution in the near field. The cost is that focusing memory capacity has to be doubled because two bits are required for every focusing code. However, setting b = 2, ν = 16 gets a F # = 0.64 that is slightly better than the value obtained in the first case with the same memory requirements. Now, however, only one out of two samples are strictly focused because m = ν/s = 2.
With b = 1, ν = 8, (m = 1) the minimum range is R 0 = 14.2 mm and the number of acquired samples up to R max = 150 mm is M = 3695. Fig. 3 shows the maximum timing error for every transducer element at four ranges: R 0 , 1/3R max 2/3R max and R max for θ = 45
• . In all the cases, the timing error is within ±3.125 ns (half the master clock period). Fig. 4 shows a fragment of the focusing memory for one out of every 16 elements, for θ = 0
• and θ = 45
• . Every 1-valued code represents a correction from the nominal ν master clock periods to ν − 1. It can be noted that some elements require frequent corrections, but others can be sampled at the nominal frequency in the range shown (about 11% of the full range). Every steering direction has a different set of focusing codes. However, the ProFoC technique can take advantage of the depth of focus to share every focal correction code among a given number m of samples. This way, information density is increased to η = b/m bits per sample. Because ν increases with m, the minimum range R 0 increases as well (15) . samples decreases as R 0 increases if the maximum range is kept constant. The number of foci and, therefore, focusing codes, diminishes at a rate slightly greater than 1/m.
In the example, with m = 1, an amount of 3695 focusing codes were required for an equal number of acquired samples. If m = 4, the number of focusing codes required reduces to 837, less than 1/4 the above figure because the minimum range has grown from 14.2 to 27 mm. The number of acquired samples has now decreased to 3346.
The timing error at intermediate samples when m > 1 is not constrained to half a master clock period, as it is at the foci. However, this error is bounded because the maximum incremental delay between samples for every element is a monotonically increasing function, and the error at the extremes of the chord joining two consecutive foci is limited to ±T X /2. The incidence of these errors in the image quality, however, is very limited, as has been shown [22] . • are plotted. It is seen that the maximum error grows quickly from 0.5T X to 0.9T X as m is raised from 1 to 2. Then, it stabilizes at about T X for all values of m. The rms error remains nearly constant and approximately equal to 0.15T X for 4 ≤ m ≤ 64, and lower for m < 4.
IV. The Dynamic Progressive Focusing Correction Technique
Results
The minimum range for application of the ProFoC technique for all the array elements and steering angles is given in (21) . With ν = m · s:
where a = 2 b − 1, as usual. Then, the set of ranges ρ = {R 0 (1), R 0 (2), . . . , R 0 (n)} is created by application of (22) with m = 1, 2, . . . , n, with R 0 (n) ≥ R max . These ranges correspond to the breaking points in which the number of samples per focus, m, is increased by 1. Every breakpoint limits one section S of variable length. Within a section, a constant number of samples per focus are acquired. This way, the full range R 0 to R max is covered with the minimum F # while simultaneously reducing the number of focusing codes required.
From every breakpoint and ahead, all the sampling instants at every focus and element can be represented with the number of bits specified. Moreover, the timing error is kept below half a master clock period at the foci and, at the intermediate samples, the maximum error is bounded. The cost is that some added information to set the starting point of every new section must be provided, although this information can be shared by all the elements for all the steering angles, as will be explained later.
In the example considered, with b = 1, a total of S = 71 sections are required to cover the range from 14.2 to 150 mm for all steering angles, with F # = 0.76. The same 3695 samples are acquired as with the static approach, but the number of foci NF has reduced from this figure to 392, nearly an order of magnitude below.
When setting b = 2, a total of S = 82 sections are required, covering the range from 9.7 mm to 150 mm for all steering angles, with F # = 0.52. Now, 3818 samples are acquired due to a lower minimum range, and NF has reduced from this figure to 232, about 1/16 the focusing memory requirements of the static approach.
Because two bits are now used to represent every focusing code, an amount of 464 bits are required in the latter case, about a 18% over the requirements of b = 1 (392 bits). However, in some applications the increased memory requirements would be largely compensated by a significantly lower F # (from 0.76 to 0.52). Using b = 3 is not advised in this case as ν is not less than 2a.
The D-ProFoC technique shows a high information density η, which in this example is of the order of 0.1 bit/sample. Of course, this density can be increased if the application supports a higher F #, as is frequently the case, by dropping one or more sections at the beginning of the ordered set ρ. Table II gives some figures that show how minimum range can be traded off with memory requirements. In Table II , the first row represents the case in which all the sections are used, that is, the first section start with m 0 = 1 (all the sections in set ρ are used). In the next row, section 1 is dropped, so acquisition starts with m 0 = 2. The other two rows consider the cases of also dropping sections 2 and 3, respectively. Figures show the minimum range R min in millimeters, the corresponding F #, and the number of foci NF in the range [R min , R max ] with R max = 150 mm. The column Q-bits indicates the number of bits required to store all the focusing information of a single scan line for every element, apart from the information required to delimit sections.
In general, focusing memory requirements are low, and the F # achievable, in general, is quite acceptable, especially when b = 2. It is worth noting that, for an equal range (for example, 14.2 to 150 mm), using two-bit focusing codes provides higher information density than with b = 1 (278 bits versus 392 bits). Fig. 7 shows the number of foci NF at every section for the example considered with b = 1. It decreases rapidly, taking the values 142, 56, 31, 21, 15. A constant value of 1 is reached at section 30. From this breaking point and ahead, all sections have a single focus, a saturation situation. The first section accounts for 36% of the total memory size, so that when it is dropped, the memory requirements are reduced accordingly.
The breakpoints tightly follow the theoretical limits given by (22) until the saturation is reached (Fig. 8) . From this point, the start of every section is farther than strictly required from the minimum theoretical value. This fact suggests than the number of foci could even be decreased. For example, it is possible to increase by two or more the number m of samples shared by every focus. However, decoding would be more complicated, having not been implemented. However, the number of samples acquired in every section M S follows the function M S (m) = m · NF (m), where m is the section number, equivalent to the number of samples per focus. In the case considered, the values are 142, 112, 93, 84, 75. . . as shown in Fig. 9 . This function is rather irregular due to the magnifying effect of m in M S for small variations in NF . The growth shown from m = 30 and ahead is a consequence of the saturation condition. These irregularities have little impact on performance and do not affect timing errors. The maximum and rms timing errors in the whole range when using the dynamic approach are considered. Fig. 10 shows the maximum timing error at every sample for the steering angles θ = 0
• and θ = 45 • , b = 1. The error is below 1.4T X in the first case and about T X in the latter, being limited to 0.5T X along the first section as expected. Fig. 11 shows the rms timing error for the same conditions, which is a small fraction of the master clock period.
V. Imaging and Research Possibilities
Until now it has been shown that the ProFoC technique offers a large number of possibilities in the static as well as in the dynamic variants. These are useful for exploring new imaging strategies and for a diversity of applications. For example, the static approach can be used to obtain the "gold standard" image using m = 1, that is, all the acquired samples get focused, changing the b value to get a given F #. This is the more resource consuming method, requiring nearly 3700 b-bit focusing codes for a single steering direction in the considered example.
Current state-of-the-art FPGAs include tens to hundreds of Block RAMs 1 , with capacities ranging from 4 Kb to 18 Kb each. In a minimum, low-cost case, a full Block RAM would be required to store the focusing codes for a single steering direction and an element. However, an image is composed by hundreds of steering directions, and arrays are made of hundreds of elements. Only the higherend devices would provide enough resources to store all the focusing information. Nevertheless, when acquiring the "gold standard" image, strict real time is not required, especially when operating with static phantoms to compare research results. Then, a new set of focusing codes can be loaded between every scan-line acquisition, which can be very fast: for a 4 K × 1 bit memory organized as 128 × 32 bits, 16 K writes are required for a 128-element array and scan line. On a standard PCI-bus this can be achieved in about 500 µs, which is twice the ultrasound time of flight up to 190 mm in a biological medium. This is fast enough for research purposes and even for some real world applications (non-destructive evaluation, for example). On the other extreme, the dynamic approach with b = 2 or more and a limited F # has the lowest memory requirements. For example, b = 2 and F # = 0.85 requires only 244 bits of memory for dynamic focusing in every steering direction. A 4 Kbit memory is enough to hold 16 Q-vectors for just as many steering directions. A 32 Kbit memory would store all the information for a 128-scan line sector. Embedded memory in FPGAs is continuously increasing, which makes the ProFoC technique quite attractive. However, for 3-D imaging using 2-D arrays, the available memory resources could not be enough to store all the required focusing information.
A possible solution is found using a modified version of the binomial approximation [23] . In our case, a sector image is decomposed in several subsectors, each one covering an angle of ±α around a main direction at angle θ (Fig. 12) .
The sampling instants in a main direction, given by (2), determine the focusing code vector for every array element and steering angle; (2) is repeated here as T Q for range R, element at x:
The sampling instants at a secondary direction θ S such as θ − α ≤ θ S ≤ θ + α are:
The binomial approximation uses the two first terms of the infinite series expansion:
applied to (24) yields:
That is, the sampling instants at a secondary direction are obtained from the Q-vector of the main direction after an initial delay T ini :
where T 0 is given by (16) . This way the code vector of a main direction θ is shared by all the steering directions within the subsector θ ± α, by simply changing the initial delay. For example, a ±45
• sector could be covered by a set of 15 vectors, with α = 3
• . Fig. 13 shows the maximum timing errors in master clock periods when using this approach for offset angles of 1
• , 2 • , and 3
• from two main directions (θ = 0 • and θ = 42
• ) as a function of R/D using the full aperture and m = 1. At the main directions (offset angle = 0 • ), the maximum error is below ±0.5T X . The errors in the near field for all other offset angles are mostly due to the extreme elements and can be reduced by using dynamic aperture. Fortunately in a sector image, the near field is oversampled in a lateral direction, so several techniques can be used to reduce the effects of these errors on the image [24] .
A distinct feature of the ProFoC technique, with the variants described, is the possibility of modifying the focusing algorithm for special applications. These are derived from the programmable look-up table nature of the method. For example, sampling instants can be advanced or delayed to compensate for tissue velocity variations. Also and, especially with the dynamic approach, section boundaries can be defined at the interfaces of a layered structure, perhaps modifying the algorithm used for setting the number m at these boundaries. These are possibilities that require further research, but the hard task of having a hardware support for this purpose has been solved, as it is presented in Section VI.
VI. Implementations
Several architectures can be devised to build the ProFoC technique. A block schematic of a simple and modular implementation is shown in Fig. 14 , which can be used both for the static and the dynamic approach. The latter will be discussed first.
The fundamental control element is the focusing memory (FOCMEM). At every address, the b-bit wide focusing code Q corresponding to a focus is stored together with a single bit field P indicating if this focus is the last one in the current section. The sequence of P -code bits is zero at all the foci, except at that corresponding to an end-ofsection, for which it is set to one. The bit P is used to increment the contents m of counter-M every time the end of a section is reached (shown with a dotted line). However, note that a significant improvement can be achieved if the single-bit sequence P is stored in the external section memory SECMEM, common to all the elements (shown with a continuous line) instead of in FOCMEM. This can be done because, due to (22) , the section boundaries are at the same focus index for all the elements and steering angles. The size of the section memory SECMEM must be enough to store the maximum number of foci in a given application, typically well below of 1 Kbit (see Table II ).
The small 1-bit wide FIFO shown in Fig. 4 compensates for timing differences in the acquisition instant at a given focus in different elements. To reduce the FIFO length, an efficient implementation shares a common SECMEM among some number of adjacent elements that are beamformed in the same device (a submodule). Every time any of the involved FIFOs becomes empty, a new P -code bit is read from SECMEM and written in all the submodule FIFOs.
In the upper part of Fig. 4 , a programmable-length shift register 2 (SHR), is used to generate the sampling clock CKS, driven by the master clock. Logic to avoid the blocking condition has been omitted for clarity. The sampling interval equals the programmed length L times the master clock period.
A programmable modulus-M counter is used to produce an overflow signal incF every m sampling clocks, which increments the counter-NF of the focus number. The content nf of this counter addresses FOCMEM, which in turn selects and provides the corresponding focusing code q to the (bus-wide) and gate. Simultaneously, every time a new focus is reached (incF is asserted), the flip-flop (FF) is set high, which produces q = q, the output of the focusing memory. In the next sampling clock, FF is reset so that q = 0. The FF remains in this state until a new focus is reached.
This way, the SHR of length L (and, hence, the sampling interval) is set to v-q master clock periods at the focus, and to v at all the other samples.
In the static approach, the counter-M is replaced by a register, and bit P is not used at all. Some other improvements can be done to this basic architecture as, for example, the uniform distribution of the q cycles among the m samples [25] .
To experimentally verify the proposed method, a basic beamformer with eight 10-bit A/Ds channels and all the beamforming functions implemented in a single, low-cost FPGA has been developed. A 32-channel system is built with four basic beamformers submodules plugged into a motherboard (Fig. 15) , in which every submodule takes the beamformed data from the preceding one, performs the coherent sum with its own data, and provides the results to the next one in a pipelined chain. Analogously, 32-channel systems can be pipelined to build bigger systems in a fully scalable approach. The architecture described in [26] is used as an efficient support which guarantees that all operations are synchronized with a ±2 ns tolerance (accounting for clock skew, delay routing, etc.).
An internal master clock of 160 MHz is obtained from the 40 MHz system clock inside every submodule FPGA, keeping the sampling timing errors below 3.125 ns (plus tolerance). All parameters are programmable for research as well as for application purposes, using a specifically developed Windows-based graphical user interface. Fig. 16 shows the internal architecture of a submodule. The amplified and filtered RF signals V 1 to V 8 enter the A/D converters, being digitized at the time instants determined by the sampling clocks CKS1 to CKS8. The sampled data is first multiplied by a programmable weighting factor w 1 to w 8 (0 ≤ w i ≤ 1) to perform the apodization function. The sampling clocks are generated in the logic block named SCG using any of the described techniques from the focusing code-vectors Q 1 to Q 8 , stored in the corresponding focusing memories. These memories are implemented in single 4 K × 1 bit Block Rams, with different organizations to match the width of the focusing codes.
A tree-like structure of FIFOs and adders carry out the coherent summation in pipeline. This way the data storage requirements are minimized, with FIFO lengths increasing by a factor of two at every stage. A final adder with a larger FIFO performs the composition of the submodule results with those provided by a precedent one. The length of this FIFO is enough to store results of current submodule while waiting for those provided by precedent submodules in the motherboard. A similar arrangement has been set up in the motherboard to combine the results with those of the precedent one. This way, the system is a hierarchical set of pipelines for increased throughput, scalability, and best resource utilization. All the logic and memory is integrated in the FPGAs (five devices for every 32-channels). A sustainable processing rate of 40 MS/s is achieved independently of the number of channels configured in the application.
Control logic to perform dynamic aperture, initial delay, and other processing functions, which have been included in the submodules, are not shown in that block diagram. Fig. 17 shows the measured errors on a focus arbitrarily chosen at R ≈ 50 mm, θ = 15
• , with regard to the exact, nonquantized values for the 32-channels in a motherboard. Some small deviations from the theoretical maximum of ±3.125 ns are found (in elements 4, 11, and 14, with values of 4.8, −3.9, and 3.5 ns, respectively), which probably are caused by differences in routing delays and clock skew.
VII. Conclusions
The ProFoC technique has been described with several variants. It is a method for ultrasound beamforming, which allows trading hardware resources with the minimum usable F # and timing errors. The ProFoC technique opens new opportunities for dynamic focusing in heterogeneous media such as biological tissue or layered structures. In this sense, it is a research tool, although it can be used directly in any conventional application. Its operation is based in coding the sampling instants at every channel for every focus at polar coordinates (R, θ) in a very efficient way: only a small fraction of a bit per acquired sample is required in most cases. Every focusing code can be shared among a programmable number of samples and, with the dynamic approach, the progressive increase on the depth of focus is used to further focusing memory reductions. Timing errors are kept bounded and, particularly at the foci, they are limited to half the period of a master clock of arbitrarily high frequency.
Being based on programmable, look-up tables, the ProFoC technique is by far more flexible than those approaches based on hardware implemented algorithms. However, because memory is the most regular integrated structure, a higher integration level is expected. The method has been tested by developing a hardware submodule prototype that integrates in a low-cost FPGA all the beamforming functions for eight-channels, together with apodization, dynamic aperture, and some other processing functions. Submodules can be chained to build a system for any number of array elements. Measurements have been carried out on a fully populated 32-channel motherboard, verifying that the hardware behaves as expected from theory.
