ABSTRACT: Digital image processing(DIP) is the use of computer algorithms to perform image processing on digital images. The basic operation performed by a simple digital camera is, to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. This compression algorithm is frequently called for capturing and storing the images. This leads us to develop an efficient compression algorithm which will give the same result as that of the existing algorithms with low power consumption. Image compression is useful as it helps in reduction of the usage of expensive resources, such as memory, or the transmission bandwidth required. But on the downside, compression techniques result in distortion and also additional computational resources are required for compressiondecompression of the medical image data.
I. INTRODUCTION
Compression, the art and science of reducing the amount of data required to represent an image, is one of the most useful and commercially successful technologies in the field of digital image processing. Digital image and video compression is now very essential. Bio-Medical Image Compression would not be feasible unless a high degree of compression is achieved. Compression is useful as it helps in reduction of the usage of expensive resources, such as memory ( hard disks), or the transmission bandwidth required. In today's Age of competition where everything is reducing its size every minute, the smaller is the better. But on the downside, compression techniques result in distortion and also additional computational resources are required for compression-decompression of the data. Compression ratio (C) is defined as the ratio of the size of compressed data to that of the uncompressed data. =
Redundancy is the reduction in size in comparison of the uncompressed size. = 1 -(2) In image compression process the image ( , ) is mapped to a format to reduce spatial redundancy. The Discrete Hartley transform is used for mapping Next quantization is done, where the loss of information takes place. Since it is an irreversible process, we can omit this step for a lossless coding technique. The final step is symbol coding, where various coding techniques can be used to represent the information in minimum possible number of bits [1] This is shown in figure 1 . II.
IMAGE PROCESSING
Image processing involves minimizing the size in bytes of a graphics file without degrading the quality of the image to an unacceptable level. The reduction in file size allows more images to be stored in a given amount of disk or memory space. It also reduces the time and bandwidth required for images to be sent over the Internet or downloaded from Web pages.
There are several different ways in which image files can be compressed. For Internet use, the two most common compressed graphic image formats are the JPEG format and the GIF for mat. The JPEG method is more often used for photographs, while the GIF method is commonly used for line art and other images in which geometric shapes are relatively simple.
The steps involved in image compression are as follows:
First of all the image is divided into blocks of 8x8 pixel values. These blocks are then fed to the encoder from where we obtain the compressed image.
2.
The next step is mapping of the pixel intensity value to another domain. The mapper transforms images into a (usually non-visual) format designed to reduce spatial and 3.
Temporal redundancy. It can be done by applying various transforms to the images. Here discrete Hartley transform is applied to the 8x8 blocks.
4.
Quantizing the transformed coefficients results in the loss of irrelevant information for the specified purpose.
5.
Source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use, through use of specific encoding schemes. For retrieving the image back, the steps have to be reversed from the forward process. First the data is decoded using the decoder. Next inverse transform (IDHT) is calculated to get the 8x8 blocks. These blocks are then connected to form the final image. From the reconstructed image pixel values it is clear that some of the high frequency components are preserved. This indicates that the edge property of the image is preserved. It has the main advantage over DCT of reducing the memory content up to 50% since the inverse transform is identical to the forward transform. Also, it retains the higher frequency components, which restores the detailing of the image. Since it is a real valued function unlike DFT, the computational complexities are also lower than in DFT algorithms [2] . Fig.4 . Flow chart of the 8-point DHT in pipelined approach with delays So, for computing 8-point DHT the multiplication with 1/ 2 can be read from a ROM, while a block of pipelined adders perform the addition. It computes DHT in 5 pipelined stages. For first two stages, it consists of two 4-pointDHT modules that receive the odd and even indexed subsequence 1 and 2 and from the input buffer. In the third pipelined stage, multiplication with 1/ 2 is done for the required coefficients i.e. 21 23 . Next they are added and subtracted in the fourth stage. During 3 and 4 stages the rest of the coefficients are passed through a delay. Delay consists of simply registers i.e. they are stored in different registers and passed to the next stage. Finally the fifth pipelined stage is a parallel adder block which adds/subtracts the coefficients to give the desire output. [3] The block diagram of the described method is given in figure 2.
V. CORDIC BASED DHT
The CORDIC means Coordinate Rotation Digital Computer. CORDIC use simple shift and add operations for several computing tasks. It is generally faster than the other approaches when no hardware multiplier is available. In recent years, the CORDIC algorithms have been in use extensively for various applications, especially in FPGA implementation. The CORDIC provide an iterative solutions to perform vector rotations by arbitrary angles using only shifts and adds. The CORDIC algorithm can be operated in either vectoring mode or rotation mode. [4] Generalization of the CORDIC algorithm: The generalized CORDIC is formulated as follows The CORDIC is hardware-efficient algorithms for computation of trigonometric and other elementary functions that use only shift and add to perform. The CORDIC set of algorithms for the computation of trigonometric functions was designed by Jack E. Volder in 1959 Later, J. Walther in 1971 extended the CORDIC scheme to other functions. Depending on the configuration, the resulting module implements pipelined parallel-pipelined, word-serial, or bit-serial architecture in one of two modes: rotation or vectoring. In rotation mode, the CORDIC rotates a vector by a certain angle. This mode is used to convert polar to Cartesian coordinates. For eg consider the multiplication of two complex numbers + and ( ()  ()) .The result + , can be obtained by calculating the final coordinate after rotating a 2x2 vector [x y]T through an angle ( ) and then scaled by a factor r. This is achieved in CORDIC via a three-stair procedure: angle conversion, Vector rotation and scaling. The radix 2 system is taken because it avoids the use of multiplications while implementing the above equation. Hence a CORDIC iteration can be realized using shifters and adders only. The figure 5 shows the structure of a processing element which implements one CORDIC iteration the rotation mode and vectoring mode are two schemes for the CORDIC algorithm. In rotation mode, the aim is to rotate the given input vector ( , ) with a given angle. After n no's of iterations, is driven to zero and the total accumulated rotation angle is equal to desired angle Parallel pipelined architecture for CORDIC represents a version of the sequential CORDIC algorithm. Instead of reusing the same hardware for all iteration stages, the parallel architecture provides a separate processor for every iteration. An example of the parallel CORDIC architecture for rotation mode is shown in figure 6 . Each of the n processors present in the block performs a specific iteration, and a particular processor always performs the same iteration. All the shifters perform the fixed shift, so that it can be implemented in FPGA. [5] Every processor utilizes a individual arc tan value that can also be hardwired to the input of every angle accumulator in the absence of a state machine which provides simplicity to this type of architecture. The parallel architecture is much faster than the sequential architecture described in the "iterative Word-serial architecture" in figure 6 . It takes new input data and puts out the results at every clock cycle, introducing a latency of n clock cycles. The architecture which is used in the design of the DHT is this parallel-pipelined architecture because this architecture which provides high throughput and low power consumption. Then we can apply the above conditions to the DHT equation. The DHT is given = generation using circular and linear mode
The equation (30) can be implemented by CORDIC as follows. In order to compute the term makes the initial condition = 1/ , = 0 and = . In this block diagram apply the above condition we get the term as shown in figure 7 .
[ ] Multiplied with , and its summation according to above equation are shown figure 8. 
Initially the DHT was decomposed in terms of COS and SINE terms by using Euler's formula, then for the computation of these trigonometric components we use CORDIC processor. For hardware implementation, we developed Verilog code and compiled using ModelSim software. Further simulated and synthesized by using Xilinx ISE design suite version 12.0 and implemented on Spartan 6.0 FPGA. Finally synthesis report and delay report are noted down. From the results it is observed that, the total real time taken for execution is 1.00secs, the total CPU time taken for execution is 0.94 sec. The macro statistics of CORDIC requires only single ROM, one 4x8-bit ROM, 20-adders and subtractions, three 8-bit adders, one 8-bit subtraction, 32 registers, eight 2-bit registers, 24 8-bit registers and 2 -multiplexers. The simulation results are as shown in figure 9.
Fig 9 Simulation results

VII. CONCLUSION
In the present work, Discrete Hartley Transform for input matrix was implemented in FPGA using VHDL as the synthesis tool. The DHT was also calculated for 8-point input using two algorithms and their effectiveness were discussed, this primarily focuses on image compression with less computation and low power. The simulation results and design summary of DHT was obtained and it was shown that the architecture implemented is an efficient method which uses limited space and time. The hardware utilization is quite optimum and power analysis shows that the power requirement is also optimum. However if the input contents are large, they tend to overflow from the registers and hence error occurs. It can be rectified by saving the transformed coefficients in larger registers. Also due to quantization in the contents of the ROM, even-number outputs are more deviated from the desired results than the odd-numbered outputs.
