Recent developments in the use of composite ceramics for high frequency arrays have reached the stage where large area, high frequency sonar receiver arrays can be produced cost effectively [1] . Typically these consist of hundreds of individual sensors with excellent element-to-element matching, in both gain and phase, over bandwidths in excess of an octave centred on frequencies in the range 30 to 500 kHz (See Figure 1) . This composite sensor technology allows high performance sonar imaging systems to be developed with good spatial and angular resolution and wide angular cover. A typical system may use a hydrophone array of say 200 staves, each quantised to around 18-20 bits and processed in the frequency band 100 to 200 kHz to provide 1 degree wide beams over say a +/-80 degree angle of cover. Replica correlation must be provided on each beam output for target detection, followed by some overall feature extraction and image display system.
The aggregate processing load presented by such real-time data acquisition, processing and display systems is high. For the generic system outlined in the previous paragraph, hardware throughputs around 5.1O lO arithmetic operations per second are needed to implement simplistic time domain DSP algorithms in real time (see Figure 2 ). Significant 'crunching power' is also required in the data acquisition systems area for signal conditioning, band definition and data communications. In practical applications, the beamformer needs to perform dynamic focusing in both range and bearing to minimise echo feature smearing. These additional processing functions can increase the DSP processing load by several orders of magnitude. Figure 3 outlines some of the problems encountered in these types of system. In many applications, space and power for systems is limited, particularly if the system is mounted on an underwater vehicle (maybe autonomous, maybe remote controlled). Typical power/space envelopes available in such applications are of the order 100 watts and one cubic foot for the complete electronics system. The following paragraphs outline ongoing development work in this high resolution, high frequency sonar area, with particular emphasis on practical systems using a mixture of custom silicon, high density FPGAs and application specific DSP devices.
Data Acquisition Systems.
The electronic noise floor from the sensor arrays outlined above falls well below the ambient acoustic noise floor in the ocean in the frequency bands of interest, whilst maximum signal levels are determined by the transmit source level used for the system. To support both extremes it is necessary in practice to handle instantaneous dynamic ranges in excess of 100 dB so that the highlight structure details (and hence the ability to classify) of close-in bottom features are maintained, particularly when a number of systems operate in consort or in close proximity. Hence the data acquisition problem in HF imaging sonars is particularly acute. The wide bandwidths and high dynamic range required rule out the use of commercially available data acquisition components. A number of novel techniques based on the use of custom silicon band-pass sigma-delta converters have been developed for these types of application.
The classical low-pass sigma-delta converter [2] is shown in Figure 4 -it consists basically of a three-port noise shaper, an analogue to digital converter (ADC), a digital to analogue converter (DAC) and a digital decimator/filter. The noise shaper is an analogue circuit designed so that the forward gain in the band of interest is very high. Hence the quantisation noise introduced by the ADC is reduced at the output by the loop gain of the system, in the much the same way that output distortion is reduced in an operational amplifier circuit by heavy negative feedback.
The oversampled ADC output data is filtered and decimated to select the required signal band -over this band, the effective dynamic range is increased over and above that of the basic ADC by the high loop gain and feedback action around the noise shaper outlined above.
In most applications, for example for digital audio, the noise shaper response characteristic is designed to provide a low pass signal transfer function but for this application noise-shaping that maximises effective dynamic range in a band of frequencies remote from dc is required. This band-pass design can be achieved by first designing a low pass noise shaping system with the required bandwidth and then applying classical analogue filter low-pass to band-pass transformation techniques. The design of bandpass converters raises some interesting problems. For example, the ADC/DAC system must be oversampled to maintain Nyquist stability around the loop: in order to minimise data transfer bandwidths from the array, some DSP is needed locally to the converters to band-shift the element data and generate complex base-band signals that are only modestly oversampled. A number of techniques are available to realise this band-shift/decimate process if the fractional bandwidths are small but for wider bandwidths we require here, we have developed novel techniques based on a combination of low complexity digital filtering, sparse multipliers and digital noise shaping. These together provide alias-and image free signal digitisation (to the 18 to 20 bit level) with relatively low complexity and minimal transfer bandwidths -see Figure 5 . Figure 6 shows plots of the converter output with full scale signal, open circuit and short circuit inputs. These indicate a dynamic range equivalent to around 19 bits. Figure 7 shows the partially filtered bandpass converter spectrum -the increasing quantisation noise on either side of the required signal pass band is clearly visible. The plot shows the results from 128 individual converters fitted in a composite array in shallow water (14 feet depth) overlaid. The signal excitation in the pass band in the pass band is from a transmitter about 100 feet away from the array. A fast FM chirp waveform was used to minimise the effects of local multi-path signals. Figure 8 tabulates the gain and phase measured at each element in this array -it can be seen that the array element/band-pass ADC combination provides excellent channel to channels matching (around 0.5 dB rms in gain and 1.2 degrees rms in phase). Figure 9 plots the theoretical transient response of the ADC.
The digitised element data feeds to the signal processing sub-system for beamforming and detection processing. For the type of system considered, the data rates are fairly high, typically around 30 to 40 million data samples per second (90-120 megabytes/second).
Beamforming and Detection Processing.
The beamforming/detection system must be capable of providing sustained real-time throughputs to match the digitised element data rate. If simplistic time domain beamforming techniques are used, based on interpolated space-time methods [3] , then each beam sample requires around N*10 complex arithmetic (multiply-accumulate) operations -where N is the number of elements used in the array in this case 200. To provide reasonable angular cover, upward of 200 individual beams may be needed, giving an aggregate processing load in excess of 5.10 10 operations per second. For a typical current generation DSP processor, running at around 500 MHz, this implies that the beamforming system would need around 100 processors -such a system is interesting from the engineering viewpoint but may not be particularly practical. (Trying to make it work would certainly be a good way to keep large numbers of DSP programmers warm, well fed and off the streets).
Some way to reduce the processing load is needed. The complexity of the time domain approach outlined above is of the order (N 2 ): using a 2D-FFT based approach can potentially reduce this complexity to the order (N log N) but has some problems.
Direct implementations based on the 2D-FFT can most easily be applied to planar arrays and even then the "beam outputs" from the process have a maximum response axis (MRA) that is a function of frequency (i.e. the 2D-FFT maps from element-space to k-space, rather than to beam-space). In the classical 2D-FFT beamformer, the 2D-FFT process is often followed by an interpolation scheme, usually using 2D-FIRs, that map from k-space to beam-space. This produces beams approximating to those from an equivalent time domain beamformer (i.e. MRAs that are invariant with frequency with beamwidths that decrease with frequency). However, in practical hardware terms this 2D-FIR interpolation is often as processor intensive as the time domain process that we are trying to avoid, so again can be of limited practical utility.
The fundamental problem with using 2D-FFTs for beamforming results from the fact that the discrete Fourier transform is based on the integral roots of unity exp(-j2.pi/N). Hence all FFT calculations are constrained to traverse around the unit circle in the s-domain. If the transform could be modified to allow arbitrary integration paths in the s-domain, the 2D-FFT algorithm could be modified to map directly into beam-space. This modification can be achieved by defining a fractional Fourier transform based on the fractional roots of unity exp (-j 2.pi/N.alpha), where alpha is arbitrary. This transform, the fractional DFT [4] , is mathematically similar to the chirp-Z transform and can be implemented using a fast algorithm, with complexity approaching the FFT algorithm. Applying this fractional DFT approach to our current beamforming problem is straightforward. Figure 10 shows comparative outputs from systems using the straightforward 2D-FFT method and the 2D-fractional FFT approach. In both cases, the arrays are excited by broadband energy sources at broadside and at +/-45 degrees. For the 2D-FFT, it can be seen that the effective MRA changes with frequency whilst for the 2D-fractional FFT scheme the MRAs are constant with frequency whilst the beam-widths increase as the frequency falls, identical to those from a time domain system. Both frequency domain implementations require approximately the same processing power.
The replica correlation processing load can also be reduced using a fast frequency domain implementation. As the beamforming process requires time-to-frequency domain and frequency-to-time domain transformations, the replica correlation processing can conveniently be wrapped up into the beamforming process. The overall processing flow for this simple beamforming and detection processing is shown in Figure 11 .
The processing required is essentially a block process -for each transmission received echo data from each hydrophone is gathered and processed. For a typical system, range swathes out to around 500 to 1000 metres would provide sufficient safe clearance. Depending on actual system parameters used, this requires that data blocks of around 64k to 128k complex samples for each element in the array must be processed for each transmission. We will assume for the rest of this discussion that 64k complex blocks are to be processed.
Although these 64k data blocks could be segmented and processed with small (1 to 4k point) FFTs, using conventionaI overlap-and-add transform techniques, it is more efficient to process the complete data block directly. Hence for each array element we calculate a 64k point FFT to get into the frequency domain. The fractional FFT beamforming algorithm is then implemented using a 256 point complex fast convolution across the elements for each frequency cell. This process generates frequency domain beam data, so we can implement fast replica correlation using a 64k point vector multiply on each beam output, followed by a 64k point IFFT for each beam to get back to the time domain.
The more complex data flow for a system with dynamic focusing in both range and bearing is shown in Figure 12 , with correlation performed directly on the element data and additional processing steps for included in the beamforming for the array dynamic focussing.
It can be seen in both cases that the processing schemes have fairly regular structures and require processor architectures that supports large FFT block sizes efficiently (ideally 64k points complex and bigger) as well as the small transforms and vector operations for the fractional FFT beamforming.
It will be appreciated that the aggregate processing load is high (Figure 13) , remembering that the complete process flow shown in Figures 11/12 must be performed for each outgoing transmission (maybe every second or less). The total processing load for the dynamically focused system amounts to the equivalent of around 285,000 1k point complex FFTs per second, i.e. a 1k point FFT time of around 3 microseconds. It is enlightening to compare candidate processors for this application. Table 1 lists a number of available DSP devices ranked on the time they take to calculate a 1k point complex FFT. The times quoted for commercially available processors have been estimated in the main using manufacturers published data, so may be in error. Processing times for the faster processors are considered in more detail in Table 2 . For the times calculated in Table 2 , it has been assumed that the ADI and TI processors can perform 64k point complex FFTs with no memory wait states, so the values quoted are possibly optimistic (see manufacturers data sheets for more detail regarding size of on-chip memory, etc.). The BCVP (Butterfly Complex Vector Processor -outlined in Appendix 1) and the NCVP are application specific sonar processing modules, based on an initial custom silicon design [5] .
Practical Systems.
A number of systems for various operational scenarios have been developed and deployed on both surface and submarine platforms over the last few years. All use the techniques outlined above and have been used to process both planar and conformal array geometries.
A typical system consisted of four data acquisition modules, feeding data via FDDI links to a compact PCI [6] 3U rack containing FDDI receivers, memory, an NCVP processor and a Pentium 4-based CPU for data display on a passive compact PCI backplane (See Figure 14) . System throughputs have been optimised by writing custom PCI-busmaster drivers that allow sustained data transfers between PC cards at full PCI bus rates. The system MMI and display uses a commercial RAD environment running under an open operating system.
The overall system throughput is currently constrained by available PCI bus bandwidths. Current work is concentrating on extending the system capability by using fabric backplane technology [7] that supports simultaneous concurrent PCI transfers between multiple processors.
Measured beam-patterns from a system are shown in Figure 15 . These were measured in deep quiet water and show excellent side-lobe performance from the system. Figure 16 shows the correlator output response with low bandwidth transmit signals (9.375kHz BW, 13.6533milliseconds pulse length) whilst Figure 17 shows the results from increasing transmit pulse bandwidth to 75 kHz. These measurements were made in shallow water (around 14 feet depth) and show the clear advantage of wide bandwidth in resolving multi-path signals. Figure 18 shows the results a submarine mounted system looking at a point target at around 1200 yards. The top traces show the correlator outputs, whilst the bottom ones show focused beam outputs for an angular cover of around +/-45 degrees, over a range bracket of 375 yards starting at 1000 yards. 
