Abstract-Ultrasonic imaging algorithms, including detection and compression, are computationally complex and difficult to implement in hardware for real-time applications. In this paper, we present an ultrasonic reconfigurable subband decomposition processor (RSDP) that can employ wavelet filters for frequency diverse signal processing. This architecture enables parallel implementation of a lifting-based discrete wavelet transform. The configurability of the architecture applies to the selection of wavelet kernels and scales for subband decomposition, thresholding operation for compression, and the postprocessing detection algorithm. The underlying hardware design makes use of the fact that both compression and detection applications share the same algorithm fundamentals. A unified architecture has been designed that implements signal decomposition and reconstruction with forward and inverse discrete wavelet transforms. After the forward transform step, a windowing operation is applied to discriminate frequency bands for target detection. Using the same architecture, a thresholding operation is applied to wavelet coefficients for data compression. The flexibility and the modular design make this reconfigurable architecture an effective and practical solution for real-time ultrasonic imaging applications. The resulting architecture is adaptable, fast, and suitable for a system-on-a-chip implementation that requires minimal logic resources.
Dynamically Reconfigurable Architecture
Design for Ultrasonic Imaging Erdal Oruklu, Senior Member, IEEE, and Jafar Saniie, Senior Member, IEEE Abstract-Ultrasonic imaging algorithms, including detection and compression, are computationally complex and difficult to implement in hardware for real-time applications. In this paper, we present an ultrasonic reconfigurable subband decomposition processor (RSDP) that can employ wavelet filters for frequency diverse signal processing. This architecture enables parallel implementation of a lifting-based discrete wavelet transform. The configurability of the architecture applies to the selection of wavelet kernels and scales for subband decomposition, thresholding operation for compression, and the postprocessing detection algorithm. The underlying hardware design makes use of the fact that both compression and detection applications share the same algorithm fundamentals. A unified architecture has been designed that implements signal decomposition and reconstruction with forward and inverse discrete wavelet transforms. After the forward transform step, a windowing operation is applied to discriminate frequency bands for target detection. Using the same architecture, a thresholding operation is applied to wavelet coefficients for data compression. The flexibility and the modular design make this reconfigurable architecture an effective and practical solution for real-time ultrasonic imaging applications. The resulting architecture is adaptable, fast, and suitable for a system-on-a-chip implementation that requires minimal logic resources.
Index Terms-Compression, detectors, field-programmable gate array (FPGA), reconfigurable architectures, wavelet transforms.
I. INTRODUCTION

U
LTRASONIC target detection and classification in the presence of high scattering noise (clutter) is a significant and challenging problem. Another challenge for a real-time ultrasonic imaging application is the large amount of data that must be processed and compressed for image formation and/or image transmission for remote analysis by experts through wireless or wired communication channels or computer networks. In this paper, we present a reconfigurable architecture for ultrasonic signal compression and target detection. This design is based on the development of a run-time configurable architecture, which provides increased flexibility and adaptability. In addition, this architecture manages the high computational load of real-time applications while minimizing area and power consumption. Target detection algorithms are based on the premise that clutter echoes exhibit randomness and are more sensitive to frequency shifts than target echoes [1] . Therefore, frequency diverse signal decomposition methods such as discrete wavelet transform (DWT) can be used for differentiating the target information from the clutter echoes.
In an earlier work [2] , a comparative study of DWT kernels for target detection has been performed. The results indicate a target-to-clutter ratio (TCR) enhancement of 6-13 dB when the measured TCR is 0 dB or less. Wavelet filters can also be beneficial to ultrasonic signal compression due to their energy compaction properties. It has been shown [3] , [4] that the DWT can achieve with high fidelity signal compression ratios up to 90% when applied to ultrasonic broadband echoes. For hardware realization of ultrasonic detection and compression applications, we have designed a reconfigurable subband decomposition processor (RSDP) that can implement various wavelet kernels for subband decomposition of the ultrasonic data. An earlier version of the reconfigurable architecture was proposed in our earlier work [5] . In this paper, an improved architecture is presented with a case study using a field-programmable gate array (FPGA) implementation. Furthermore, detailed discussions of the wavelet-based detection algorithm and the RSDP hardware architecture are given. The advantage of the RSDP architecture is the freedom to fine-tune the target detection and compression algorithms for different environments. The DWT component of this architecture is based on the lifting scheme [6] . The lifting scheme requires two to four times fewer arithmetic operations than the conventional filter convolution architecture for the DWT [7] .
Section II describes the algorithms that are used for a frequency-diverse ultrasonic target detection, which involves subband decomposition and frequency band selection for TCR enhancement, as well as DWT-based compression methods. A novel, reconfigurable, and pipelined architecture that is capable of implementing detection and compression is presented in Section III. A case study is also given in Section III-E. Section IV concludes this paper.
II. BACKGROUND
A. Subband Decomposition and Target Detection
In the ultrasonic imaging of materials, an effective method of obtaining frequency diverse information is through split spectrum processing (SSP) of the broadband echoes [1] , [8] . SSP can be implemented via subband signal decomposition. In Rayleigh scattering, where the signal wavelength is significantly larger than the microstructure of materials that consists of randomly distributed reflectors and grains, the detected echoes exhibit randomness in amplitude and are sensitive to shifts in the transmitted frequency. In contrast, targets are often larger in size and less vulnerable to variation in the transmitted frequency. In general, target echoes exhibit different distributions as a function of frequency when compared with microstructure scattering. Therefore, at any given time, the outputs of bandpass filters can be represented as a random feature vector that contains information that is related to target and grain echoes.
The SSP procedure has several steps (see Fig. 1 ). The first step is data acquisition. The experimental setup for data acquisition utilizes a pulse generator to produce the electrical impulses to drive the ultrasonic transducer. The pulse receiver is used to receive the backscattered ultrasonic echoes. The received signal is then digitized and passed through several bandpass filters using forward and inverse fast Fourier transforms (FFTs) to split the spectrum into different subbands (i.e., observation channels). The output signals from the subbands are then passed into a postdetection processor for target detection [1] , [8] . This processor can employ different techniques such as signal averaging, Bayesian classifiers, and order statistic filters, including minimization. For the minimization algorithm, minimum amplitudes of the observation channels are obtained for each particular time, i.e.,
where x i (n) is the observation channel output corresponding to a certain frequency band. In [2] , the DWT has been shown to offer similar subband decomposition performance with efficient hardware implementations and, therefore, is a viable alternative to the FFT in detection algorithms.
B. DWT for SSP
Since the DWT does not have a prefixed kernel as in the FFT, a wavelet kernel can be chosen based on the detection or compression performance of the wavelet filters depending on the ultrasonic application. In particular, the compactness properties of the DWT allow a region of interest to be determined in a time-frequency representation, which is essential for target detection [2] . TCR enhancement is governed by the degree of the compactness of the target echo. The clear benefit of using the DWT is the capability of fine-tuning the wavelet kernel for compacting the target echo information while spreading the clutter energy over a 2-D plane.
The DWT decomposes the digitized ultrasonic signal into subbands and provides a time-frequency representation. The task of the target detection algorithm is to select a number of windows to discriminate the target echoes from the clutter echoes. Here, a window represents a group of wavelet scales that function as a bandpass filter similar to bandpass filtering in SSP. Inverse DWT is applied to each window operation, and the resulting time-domain signals are then fed into the postprocessing block. The postprocessor in the final stage is a decision block that reconstructs the time-domain signal from the incoming channels according to order statistics rules. In the reconstructed signal, the target echo is made more visible due to the vulnerability of clutter echoes to the change in wavelet scales.
For ultrasonic experimental results, we have used 2048 data points, which correspond to a maximum of 11 wavelet scales in the wavelet transform domain. An important question is determining the frequency bands (scales) to be used for postprocessing. Since the clutter echo spectrum is shifted toward higher frequencies [8] , the target echo is expected to be the dominant information in lower frequencies. Wavelet domain scales in Fig. 2 confirm that the lowest scales (high frequencies) are mostly clutter information, whereas the higher scales represent the low-pass approximation of the ultrasonic data. Inspection of Fig. 2 also shows that the intermediate scales 3-6 contain dominating target information. Therefore, intermediate scales are a desirable choice for postprocessing. Increasing the number of scales to be used in the algorithm increases the chance of integrating scales that do not carry target information into the reconstruction step.
The following steps are critical for target detection applications that incorporate DWT decomposition.
1) Choose an appropriate wavelet kernel to maximize the target echo compactness. 2) Identify the wavelet scales that carry target echo spectrum information. 3) Determine how many windows are to be utilized for signal reconstruction. 4) Find the number of scales that has to be integrated in each window.
It is important to evaluate the performance of the wavelet kernels for target detection since they differ in their compactness properties. Here, compactness means that the signal can be represented by fewer wavelet coefficients and scales. To understand the compactness of the kernel, we examined the similarity between the signal echo and the wavelet kernel. If the wavelet kernel is similar to the ultrasonic target echo, then the target echo becomes dominant in a particular scale. A desirable property of wavelet kernels is retaining target echo information in as many frequency scales (subbands) as possible. This is an important property for the minimization detection algorithm.
C. Performance Results
For performance analysis and testing, the experimental A-Scan data from a steel block (type 1018, a grain size of 50 μm) are acquired and analyzed. A Panametric (type 5052) pulser/receiver is used to drive the ultrasonic transducers, and it is used to receive the ultrasonic echoes. The received echo signals are then converted to digital data for SSP. The A-scan measurements were conducted using a broadband unfocused ultrasonic transducer of 0.5-in diameter with a 5-MHz center frequency. Data were acquired with a 100-MHz sampling rate, and each sample is 8 bits. The steel block has several holes (1.5 mm in diameter) at known separate locations. All the A-scan measurements probe the hole positions within the steel block. For performance analysis, the TCR is calculated by finding the maximum target echo (reflection from hole location) amplitude in the reconstructed signal, i.e.,
where T is the maximum target echo amplitude, and C is the maximum clutter echo amplitude. This value is compared with the largest amplitude of clutter echoes. Table I and Fig. 3 show the performance of Daubechies (D4, D10), Symmlet (8,10), Coiflet (1,5), Battle-Lemarie (1,5), and Vaidyanathan wavelets and the FFT on different sets of experimental ultrasonic data consisting of a target echo that is highly masked by a clutter (i.e., microstructure scattering) for detection analysis. The number of filters in the filter bank for the FFT is eight. For the DWT, three windows are used for signal reconstruction. For postprocessing, absolute minimization is applied to the filter-bank outputs. It can be seen that the detection performance of wavelet transforms closely matches that of the FFT-based SSP method, and, for any given experimental data, up to 13-dB TCR enhancement is possible using wavelet decomposition.
In general, the algorithm performs well for multiple targets as long as the target echoes are not within each other's proximity. If multiple target echoes are close enough to interfere with each other, then the performance of the algorithm may deteriorate. The minimization postprocessing algorithm detects the presence of the targets but may not differentiate individual targets. Nevertheless, other postprocessing techniques such as neural networks can be employed to resolve the existence of multiple targets [9] .
D. Ultrasonic Signal Compression
In ultrasonic imaging applications, it is desirable to use data compression techniques to reduce the data size while maintaining the signal integrity. Fig. 4 shows the ultrasonic data compression algorithm. The data compression of a given signal x(n) is successful when the redundant and noise components of x(n) are reduced or removed. The signalx(n) is the compressed representation of x(n).
Thresholding can be applied to transform the coefficients of the original ultrasonic signal for data compression. In the hard thresholding method, all coefficients that are smaller than τ are set to zero, and all coefficients that are greater than τ are kept the same, i.e.,X
For a signal that is corrupted by white Gaussian noise with variance σ 2 , it has been shown [10] that the optimal threshold for N number of sample points is given by τ , i.e.,
This threshold value can be used as the initial threshold value. Based on the compression ratios achieved, this value can dynamically be adjusted. Data compression performance of the DWT depends on the wavelet kernel and its compactness properties. The data compression performance of six different wavelet kernels [3] is shown in Fig. 5 .
This figure shows how much energy is concentrated in the five most dominant coefficients of the DWT as a function of the bandwidth of the ultrasonic signal. These results indicate that the Daubechies (Daub20) wavelet kernel has the best data compression performance, whereas the Haar wavelet kernel has the worst data compression performance. Furthermore, a robust compression architecture must be reconfigurable and must sup- port multiple wavelet kernels to achieve optimal performance for different ultrasonic testing environments.
III. RSDP ARCHITECTURE
Multipurpose design of the ultrasonic processor demands a reconfigurable architecture that is capable of realizing both target detection and data compression algorithms. In addition, this architecture requires adaptable wavelet-based subband decomposition for optimal performance. Therefore, the RSDP architecture has been designed to accommodate these objectives.
The RSDP design is based on the development of a dynamically configurable architecture, which provides increased flexibility and adaptability to ultrasonic imaging applications. The configurability of the architecture applies to the application type (i.e., compression or detection), selection of the subband transform method, algorithm parameters, thresholding operations, and the postprocessing algorithm. Fig. 6 shows the system components of the ultrasonic processor architecture. The input memory holds the ultrasonic data. The ultrasonic data are fed into the forward DWT block using processing elements (PEs). The intermediate results are stored in a buffer. If the selected operation mode is data compression, then a hard threshold is applied to the amplitudes of the coefficients by a sequencer block. Since a major portion of the transform coefficients are below the threshold value, this results in a significant reduction of the data size. If the operation mode is target detection, the intermediate results are processed by the sequencer, which selects certain wavelet scales based on the windowing method. This windowing operation discriminates those subbands where target information is dominant, and they are selected for signal reconstruction (up to three windows for the DWT). Therefore, the same sequencer block is utilized for both compression and detection. The inverse transform block uses the same hardware resources as the forward transform block, and they can be reconfigured for multiple inverse DWT operations. The postprocessing block applies order statistics methods such as minimization. The outcome is stored in the output memory to be transmitted or displayed.
The embedded processor core in Fig. 6 acts as a control logic unit and is designed to perform the following tasks: 1) determine the type of the operation that can either be data compression or target detection; 2) select the filter coefficients (i.e., the wavelet kernel) for the DWT; 3) apply amplitude thresholding and wavelet scale selection parameters; 4) reallocate hardware resources for inverse transform channels for target detection.
The framework of the wavelet kernel implementation in the RSDP is based on the lifting scheme [11] , [12] . The lifting scheme offers several advantages compared with conventional filter convolution architectures. It requires fewer arithmetic operations, and in-place calculation and integer-based operations are possible [7] , [12] .
A. Lifting Scheme Implementation of the DWT
The lifting scheme is computationally less complex, and any wavelet filter can be decomposed into a finite sequence of simple filtering steps, which are called lifting steps [13] .
This decomposition is a factorization of the polyphase matrix of the wavelet filter into prediction and update steps, which are implemented as alternating upper and lower triangular matrices and a constant diagonal matrix, i.e.,
where s i (z) and t i (z) are the Laurent polynomials, K is the scaling factor, and m is determined by the wavelet kernel factorization. For example, the Cohen-Daubechies-Feauveau 9/7 tap wavelet filter (CDF 9/7) can be factored into lifting steps as
where α = −1.58613, β = −0.05268, γ = 0.88291, δ = 0.44351, and ζ = 1.14960.
B. PE Design for the Lifting Scheme
The RSDP architecture consists of PE arrays that operate as multifunction data-path elements. Each PE is designed to be capable of realizing a single lifting step computation. Therefore, in theory, even a single PE is sufficient to complete a single wavelet decomposition level by sequential operations. However, the throughput can effectively be increased by introducing arrays of PEs into the system, where arrays simultaneously execute multiple decomposition levels.
For a universal PE design, it is important to analyze different types of update and predict filters s i (z) and t i (z), respectively, in the lifting factorization. These filters may have different implementations and data-path requirements depending on the wavelet kernel [14] - [16] , as shown in Fig. 7 . To realize the circuits in Fig. 7 , the PE architecture is designed to include two adder units, two registers, four multiplexers, and a multiplier unit as a single data-path element (see Fig. 8 ). Control signals for multiplexers enable the desired type of lifting circuits. The registers are used to provide the filter taps. If there are more than two taps in either s i (z) or t i (z), these filters require a combination of more than one PE for a single lifting step.
C. System Data-Path Design
The RSDP architecture supports lifting factorizations with a variable number of lifting steps through the dynamic reconfiguration of the PE interconnection network. Although a crossbar network, the Clos network, or tree structures [17] can be implemented for PE interconnection, these network types are costly in terms of bus size and logical units. Consequently, a simpler network communication is designed for the RSDP architecture. In this network, arrays of PEs are formed by feeding the output result either to the next neighboring PE or to an output buffer memory. Each PE generates two outputs-s (approximation or a low-pass result) and d (detail or a high-pass result). For the last PE in an array, the d output is stored in the wavelet coefficient memory, and the s output is forwarded to the beginning of the next PE array (see Fig. 9 ). To achieve this I/O operation, multiplexers are used for controlling the input and output destinations.
In summary, subband decomposition using the RSDP has the following steps.
1) The wavelet filter is decomposed into lifting steps, and lifting factorization is obtained. 4) For a faster implementation, more PE arrays can be constructed, which can execute different wavelet decomposition stages in parallel. The low-pass output signal from each array is fed into the next array input. Each array expects two samples from the previous array, and the sampling rate is half of the previous stage. Therefore, execution at each stage can simultaneously be finished with a minimum amount of latency. 5) A splitter block at the beginning of each PE array (see Fig. 9 ) is required to split the input signal into even and odd samples. For an inverse DWT, a merger block is used to combine the coefficients from the wavelet memory and the output of the previous stage. 6) For each PE array, a buffer memory is used to store the high-pass signal outputs of each stage (see Fig. 9 ). 7) Windowing operation for detection is accomplished by zero-padding the coefficients outside the window using the sequencer block in Fig. 10 . The width and the location of windows are also configurable based on the processing profiles that are stored in the memory. 8) For compression applications, the sequencer can be utilized to perform thresholding on the amplitudes, as shown in Fig. 10 . By retaining only the coefficients above the threshold and adjusting this threshold value, desired compression ratios can be achieved. 9) For an inverse transform, the ultrasonic target detection algorithm requires up to three inverse DWT channels. Therefore, the available PE resources should be partitioned for each channel for concurrent operation. In the parallel-mode operation, the output of each channel can immediately be fetched to the postprocessing block, and the decision results can be obtained without waiting for all subsequent inverse transform channels to finish.
D. Control Logic
Each PE unit includes FSM logic for dynamic reconfiguration by an external control unit or a CPU. FSM logic is responsible for loading the constant lifting coefficient from the coefficient memory, selecting the correct lifting circuit type, and choosing the source input and the destination output for the desired PE array size. The FSM machine is connected to the embedded processor unit through a control bus, and the instructions that are issued by the processor are received by all PE elements via a shared bus. Fig. 11 shows the state diagram for the PE FSM logic. In the idle state, the PE waits for the assertion of either an execute signal for starting the execution or a configure signal for reconfiguring the PE execution parameters.
In the idle state, output latches are disabled, and dynamic power consumption is minimum in the data-path elements. If the execute signal is asserted, the PE changes its state to execute and starts processing the data. If the configure signal is asserted, the PE is now in the reconfigure mode. Initially, the PE checks the control bus and reads the control instruction word. The format of this instruction is shown in Fig. 12 . The most significant m bits are used to address PEs (2 m PEs are supported). If this address does not match the PE's inherent address, the PE Fig. 9 . Implementations of concurrent wavelet decompositions using PE arrays. Fig. 10 . Sequencer-thresholding and windowing operations for compression and detection applications. returns to the idle state. If the address is correct, the PE latches the configuration data. The next control bits IN and OU T are input and output selection bits for the multiplexers: IN = 0 selects the previous PE as the input source. IN = 1 selects the memory buffer as the input source. OU T = 0 selects the next PE for output destination. OU T = 1 selects the memory buffer. The next bits are arithmetic-logic unit configuration bits, which control the multiplexers in Fig. 12 for selecting the lifting step circuits. The least significant n bits contain the fixed lifting coefficient. After this coefficient is stored to a register, the PE sends an acknowledgement signal to the control unit and returns to the idle state. The control unit sends configuration instructions to all the required PEs and updates the resource table of PEs with the incoming acknowledgement signals. After all the PEs are configured, it issues the execute signal. In this setup, the reconfiguration overhead is directly proportional to the number of PEs that is required by the algorithm. If the configuration instructions are stored in a context memory, fetching and issuing instructions can be done very quickly. Therefore, run-time reconfiguration of PEs is feasible and efficient in the RSDP architecture. 
E. Case Study
The repetition rate in ultrasonic imaging systems dictates the processing time for real-time target detection or compression. For real-time systems, a typical value for a repetition rate is 1000 Hz, resulting in 1-ms time intervals for processing the acquired data. Fig. 13 shows the timing requirements for a typical application. Data acquisition takes 10 μs (considering 1-K samples acquired at a 100-MHz sampling rate). Consequently, the RSDP architecture has to process the data and store or transmit the results in 990 μs.
A hardware/software codesign scheme is utilized to obtain the proposed adaptable ultrasonic imaging architecture. The target platform is an FPGA chip with an embedded PowerPC CPU core. Fig. 14 shows the Xilinx Virtex-II Pro FPGA [21] device that is configured for RSDP operation. In this system, the embedded core processor is for preprocessing and synchronizing the streaming input data that are acquired through an ultrasonic sensor and an A/D block. The on-chip peripheral bus [22] allows communication between the processor, the on-chip memory, and the accelerator block. The accelerator block implements the required DWT data-path functions and the thresholding operations for compression via PEs. PEs are configured to establish systolic arrays for the desired DWT lifting scheme.
For this case study, the accelerator block incorporates 24 PEs to implement the ultrasonic imaging system (see Fig. 9 ). The TABLE II  FPGA SYNTHESIS RESULTS CDF 9/7 biorthogonal wavelet filter [23] is used for subband decomposition. This filter is known to work best with compression applications (i.e., JPEG2000 [24] , [25] ), and many efficient hardware realizations of the 2-D CDF 9/7 transform have been proposed particularly for image compression algorithms [26] . Ultrasonic data (1024 points) that are sampled at 100 MHz are processed by an array of PEs, which carry out the datapath computations that are required for each lifting step. Each PE has two 16-bit adder units and a 16-bit multiplier. PEs are pipelined to achieve higher throughput. These elements are configured during run-time initialization for different wavelet kernels. Depending on the wavelet kernel and the corresponding lifting factorization, a number of PEs are cascaded to form an array of PEs. For example, 9/7 filter factorization requires four lifting steps [13] , which are implemented by four PEs. For 1024 points, up to ten wavelet decomposition stages have to be completed for all the wavelet coefficients. If more than one PE array is available, the throughput can significantly be increased by concurrent operations. The lifting algorithm is sequential in nature: Second-stage calculations can immediately start after two low-pass results are generated from the first stage.
Once the pipeline is full, all the PE arrays start concurrent execution, and multiple wavelet decomposition levels are simultaneously computed. Since 24 PEs are available, six PE arrays can be used for 100% PE utilization (see Fig. 9 ). If the number of lifting steps is not equal to four, it is still possible to use more than one PE array by readjusting the size of the PE arrays. The number of inverse transforms that is required for signal reconstruction determines the number of windows that are used in the 2-D windowing stage. This number also dictates the throughput of the RSDP. In this case study, three windows are used; therefore, three inverse transforms are required. The same 24 PEs that are used in the forward DWT are reconfigured for inverse transform operations. In this case, two PE arrays are allocated for each inverse transform block. They operate in parallel, and all channel outputs are passed into a minimization postprocessing block.
F. Implementation Results
The PE accelerator block for the RSDP case study has been synthesized with Xilinx Virtex-II Pro (XC2VP30-7) FPGA using the Xilinx ISE 8.2 software. Table II shows the implementation results. The logic required for the PE fabric consumes up to 40% of the available FPGA resources. This indicates that more complex postprocessing methods, such as neural networks, can also be realized within the RSDP FPGA architecture [9] . Furthermore, the RSDP data-path design achieves a clock rate of 141 MHz, which meets the real-time operation requirements of ultrasonic imaging, including data acquisition, data processing, and data storage. The proposed architecture can execute the SSP algorithm (one forward DWT, three inverse DWTs, thresholding, and postprocessing) in less than 100 μs. As a comparison, a software-based implementation of the SSP algorithm is implemented (using FFTs for frequency decomposition) on an embedded Microblaze microprocessor running at 100 MHz in a Virtex-II Pro FPGA. This software implementation requires 37.72 ms to execute the SSP algorithm, which is clearly not sufficient for a repetition rate of 1 ms (see Fig. 13 ).
G. Throughput Analysis of RSDP Wavelet Implementation
For forward DWT implementation, the required clock cycles are computed according to the following equations for direct or parallel implementation using the RSDP (see Fig. 9 ). This relies on the assumption that each PE operation can be executed in a single clock cycle. The computational parameters are:
• number of data points N ;
• number of wavelet decomposition stages = NS (maximum number of stages = log 2 N ); • number of PEs = NP E; Using six arrays of four PEs, 568 clock cycles are required for 1024-point and 10-level wavelet decomposition (see Fig. 9 ), i.e., NP ass = 10 6 = 2 (10) Total_ClockCycles = 2 9 + 2 3 + (6 * 4 * 2) = 568.
Further improvement can be obtained by increasing the PE arrays; however, the throughput gain will be less significant. The throughput of the forward DWT could be the basis for evaluating the total performance for the RSDP architecture for a complex system.
IV. CONCLUSION
In summary, the RSDP realization offers several benefits to ultrasonic imaging applications. The detection or compression algorithms can be time-multiplexed to perform forward transform, bandpass filtering, thresholding, inverse transform, and postprocessing. Data-path elements have dynamically been reallocated and reconfigured for each task. The type of the DWT kernel is an important factor that affects the performance of both ultrasonic data detection and compression algorithms. With the reconfigurable data-path logic, several wavelet kernels can be supported and implemented during the algorithm run time. The window size and scale selections can adaptively be adjusted for better target detection performance. These concepts are crucial for a high-performance and adaptable ultrasonic imaging system. Therefore, the RSDP FPGA prototype that has been presented in this paper offers significant adaptability and area improvements compared with conventional logic implementations, and it is feasible to use it in portable ultrasonic devices for real-time applications. An application-specific integrated circuit implementation with integration of ultrasonic sensors can result in a very compact system-in-package solution for industrial and medical ultrasonic applications.
