Abstract-In this research, novel architectures based on different design approaches and arithmetic techniques such as direct mapping implementation, dynamic partial reconfiguration (DPR) mechanism, distributed arithmetic (DA) and systolic array (SA) will be developed for three dimensional (3D) medical image compression system. Moreover, solutions for processing large medical volumes will be investigated and power modelling of the architectures developed will be carried out on different field programmable gate array (FPGA) platforms. The ultimate aim of this research is to examine the most efficient reconfigurable architectures for 3D medical image compression. In this paper, the research framework and a case study addresses the performance of 3D Haar wavelet transform (HWT) with DPR mechanism are discussed and evaluated. Results obtained have shown the advantages offered by DPR and lead to a promising solution for implementing computationally intensive applications such as 3D medical image compression.
I. INTRODUCTION
There are various three-dimensional (3D) medical imaging modalities, such as magnetic resonance imaging (MRI), computerised tomography (CT) and positron emission tomography (PET) that have been widely used especially for cancer diagnosis [1] . As a result of increasing number of people to be diagnosed with cancer, to date there has been a considerable increase in the volume of medical image data generated in hospitals. For this reason, medical image compression is imperative since in numerous medical applications both efficient storage and transmission of data through high bandwidth digital communication lines are of crucial [2] . Moreover, complexity in data addressing and accessing, massive amount of data to be processed and requirement of several building blocks for its computationally intensive medical image compression operations have resulted a big restriction for hardware implementation in 3D medical image compression.
Hardware acceleration has become inevitable for providing the necessary performance that is demanded for medical image compression applications. As diverse as the challenges that has been explained, reconfigurable architectures for medical image compression have attracted much attention in research and development, because of the following justifications: First, medical image compression is of significant to deal with the increasing number of medical modalities and also medical image data management; second, hardware accelerator has become of immense for providing the necessary solution for medical image compression; finally, field programmable gate arrays (FPGAs) offer various advantages [3] , that lead to significant contribution towards processing large medical volumes.
In this research, novel architectures based on different design approaches and arithmetic techniques such as direct mapping implementation, dynamic partial reconfiguration (DPR) mechanism, distributed arithmetic (DA) and systolic array (SA) will be developed for 3D medical image compression. Moreover, solutions for processing large medical volumes will be investigated and power modelling of the architectures developed will be carried out on different FPGA platforms. The ultimate aim of this research is to examine the most efficient reconfigurable architectures for 3D medical image compression. In this paper, a case study addresses the performance of 3D HWT with DPR mechanism is discussed and evaluated.
II. CASE STUDY: 3D HWT WITH DPR In this section, the proposed system architecture as depicted in Fig. 1(a) to (d) is briefly explained, including the implementation of 3D wavelet compression system, the computation process of 3D HWT with transpose based computation, the flow diagram of pipelined direct mapping implementation of one-dimensional (1D) HWT and the top level architecture for 3D HWT using DPR.
A. 3D Wavelet Compression System
In 3D medical image compression system, a 3D wavelet transform is first applied to the 3D image data resulting in a 3D coefficients representation of the image. The coefficients are then quantized and finally entropy coding are applied to code the quantized data. In this research, efficient reconfigurable architectures with various design approaches for the transform block is the main concern in order to accelerate processing large medical volumes.
B. 3D HWT and Transpose
In 3D HWT computation, the input to the first 1D HWT is read row by row, and the 1D HWT is performed on each input vector as they are provided. The calculated values are sent to the transpose module T 1 which calculated the memory addresses for the transposition and stores the data into memory. The transpose T 1 acts as a memory forwarder and performs matrix transpose, since row vectors are provided by the 1D HWT. After transposition of the resultant matrix, another 1D HWT is performed on the coefficients which are stored in memory to yield the two-dimensional (2D) HWT coefficients. This is the conventional row-column 2D HWT computation. The 2D HWT computation is performed on each sub-image S 0 to S 7 for N = 8, where S 0 is the first sub-image and S 7 is the eighth sub-image of the input volume. The output coefficients of the 2D DWT are sent to the second transpose, T 2 . As described before all coefficients are stored into memory also the transpositions of T 2 are stored after transformation into memory.
978
Instead of using the logic and other embedded resources for the transpose implementation, optimisation of block random access memory (BRAM) has been considered in this work. This approach significantly improves utilisations of available storage resources, optimises system performance, and meets the design goals.
C. Reconfigurable and Static Area
There are two areas in the DPR framework: reconfigurable and static. The reconfigurable areas have been declared for 1D HWT and different transposition modules, while the static area consists of the data fetch unit and the memory controller.
The proposed system is implemented with the partial reconfiguration suites, ISE 9.2PR and PlanAhead 10.1 from Xilinx [4] . It uses the module based DPR where configuration frames are reconfigured and busmacros are used to connect the DPR areas with the static area [5] . This methodology has the restriction that all design files and reconfigurable modules must be available to the build environment to build partial modules.
The main advantage of DPR is that an implementation of a given design can be integrated into a smaller FPGA. This reduces cost, package size and power [6] , [7] .
In the 3D HWT case, the transposition module and the 1D HWT module can be changed. The transposition module will be changed during image calculation three times for each sub-image. First transposition T 1 performs the row to column transposition which are active till a sub-image is transposed. After the T 1 sub-image transposition the DPR area is reconfigured with the T 2 transposition which saves the sub-images and these operations will be repeated for all sub-images. After all sub-images are computed and transposed with T 2 , the transposition DPR is reconfigured with the straight transposition and the last 1D HWT is performed on all T 2 sub-images. The HWT DPR area can be reconfigured to switch between different transform sizes. The transform size N dependency is propagated from the HWT module to all connected modules, and offers the advantage that no other logic changes are necessary. Fig. 1(d) illustrates the details of the working system for the implementation of 3D HWT with DPR. The DPR module connections are performed with simple bus interfaces. Data fetch unit and HWT DPR area are connected with a defined data bit width bus, a request line and back signal free. The fetch unit sends data to the HWT core as long the free signal is active. HWT and transposition module are connected with the defined data bit width bus and an enable signal. In each cycle where the enable signal is active, data will be transposed and written into the memory.
III. EXPERIMENTAL RESULTS

A. FPGA Implementation
The implementation of 3D HWT with DPR mechanism on Xilinx University Program XUPV5-LX110T Development System provides significant results as illustrated in Fig. 2 with better saving of area and reduce the power consumption by 1.27% and 13.96% respectively. In terms of maximum frequency, DPR mechanism yielding 17.216% better maximum frequency than without DPR.
Concerning the generated bitstreams files and configuration times required, a full bitstream of 3,889,941 bytes is required for 3D HWT configuration and the shortest configuration time needed is also the worst at 4.8 ms. On the contrary, full partial bitstreams generated are significantly smaller and hence reducing the storage space required to store the various bitstreams. The results show that the file size for full partial bitstreams is reduced about 86.95% of a full bitstream and the configuration time is also reduced by 86.88% (N = 64).
In summary, by comparing the file sizes of the bitstreams, DPR has more efficient bitstream and as proven, smaller bitstream decreases the configuration time.
B. Discussions
In order to evaluate the relationship of the transform sizes towards the area, power consumption and maximum speed, there are four different transform sizes (N = 8, 16, 32, 64 and 128) which have been used for the FPGA implementation. Various transform sizes used are reflecting the various size of volumes data in 3D medical imaging.
Influence of transform size on area, power consumption and maximum frequency is depicted in Fig. 3 . For ease of visualisation, the graphs are plotted on a log scale to the base 10. Results indicate that the proposed 3D HWT without DPR requires more area, while by using DPR the area saving can be achieved between 2.75% to 12.87%. In terms of power consumption, non-partial reconfiguration consumes up to 1377.96 mW for N = 64 and it saves by 4.20% to 18.81% by performing partial reconfiguration.
Comparative study for both non-partial and partial reconfiguration processes shows an important conclusion concerning the advantages offered by DPR especially in processing large medical volumes. Analysis for the performance achieved for different parameters such as area utilised, power consumed and maximum frequency achieved clearly reveals that with DPR, complex designs can be implemented on limited hardware resources and hence lead to better performance achievements.
IV. CONCLUSIONS AND FUTURE WORKS This paper explains the research framework of efficient reconfigurable architectures for 3D medical image compression and discusses the finding for 3D HWT implementation with DPR. Results obtained have shown the advantages offered by DPR and lead to a promising solution for implementing computationally intensive applications such as 3D medical image compression. On going research is focusing on different design architecture strategies including DA and SA.
