Steganography has become a predominantly employed information security technique in this modern era. Although the software methods which work on spatial as well as transform domain steganography offer a lot of data hiding options, the steganographic system on hardware platform shows enormous potential by means of multiple advantages such as high speed embedding, specific hardware dependency etc.,. In this paper we have proposed image steganographic architecture which employs data embedding technique in square sized cover image blocks that are stored in internal/external memory of a processor and reconfigurable hardware. The traversal path for information hiding in cover image uses an equisum arrangement of the quad block locations which is used as a Look up Table in our approach. The K-bit message will be hidden in row by row and column by column according to the order suggested by the quad block. The proposed approach is implemented in Cyclone II EP2C20F484C7 FPGA as well as ARM7RISC Processor. The timing analysis, hardware consumption and error metrics after implementation in both platforms have been estimated.
Introduction:
Information security manifests itself in many ways according to the situation and requirement. Regardless of who is involved, to one degree or another, all parties to a transaction must have confidence that certain objectives associated with information security have been met. Over the centuries, an elaborate set of protocols and mechanisms has been created to deal with information security issues when 807 
Sundararaman Rajagopalan et al. / Procedia Engineering 30 (2012) 806 -813
Sundararaman Rajagopalan,et., 
al/ Procedia Engineering 00 (2011) 000-000
the information is conveyed by physical documents or covers like image, video or audio files. Often the objectives of information security cannot solely be achieved through mathematical algorithms and protocols alone, but require procedural techniques and abidance of laws to achieve the desired result [1] . Steganography is an art of hiding information in fictious cover. Since one of the main objectives of steganography is to carry the maximum payload which are secret, images have been mostly used carriers of the secret message payload. The order of pixels of cover image chosen for information hiding is formulated in many ways. In one method, the whole image is considered and the pixel locations are pseudo randomly chosen (i.e.) a PN generator calculates and decides the order of pixels to be chosen for data embedding. Few other methods look out for specific threshold of pixels in the cover images, edges in images and other image processing based steganography approaches. In a paper by R.Amirtharajan et.al [2] , space filling curve based random colour image spatial domain steganography is discussed. The information hiding to be hidden is worked out using two methods namely Hilbert and Moore curves. Random image steganography could be performed in spatial [3, 4] or transform domain [5] , where the former gives high capacity and later offer better robustness. The broad classification based on the methods in image steganography is substitution [2] [3] [4] , Transform [5] , Spread Spectrum [6, 7] , Distortion, Statistical and cover generation methods.
Hardware steganographic modules implemented in a processor platform or reconfigurable hardware platform like FPGA offer some important additions to the existing benefits of software based stego systems. The Embedding rate is very high when compared to the systems in software domain. Few hardware based stego implementations have been reported in the past. In a paper by Hala A. Farouk and Magdy Saeb [8] which discusses about design and implementation of a Secret Key Steganographic Micro-Architecture Employing FPGA, where secret key steganographic technique implemented at Xilinx XC2S100tq144-6 FPGA is discussed. Another paper by Amirtharajan et.al [9] proposed hardware stego method using 2D Image processing with LSB substitution approach. A paper by R.Sundararaman et.al [14] , proposed LFSR based hardware stego on chip architecture on Cyclone II FPGA with an analysis on the Logic Element consumption and other timing issues. The RISC processor chosen for the implementation is LPC 2136 that contains the ARM7TDMI-S as its core which has 32KB of on-chip SRAM which is sufficient for storing the maximum image size 128×128 which has been taken for our implementation [11] [12] . Eliminating the need of external RAM and usage of internal SRAM memory for the storage of cover and embedding data offers great reduction in time taken for embedding [13] . In our current proposed method, a block based steganography algorithm has been implemented in both RISC processor and reconfigurable hardware platforms to analyze the performance of the stego system in two different chips. Detailed timing analysis, architectural difference analysis and hardware consumption for data embedding are some of the objectives of this work.
The rest of the paper is organised as follows. Section 2 discusses about the quad block equisum algorithm which is used as information hiding technique in our approach. Section 3 explains the hardware architectures in ARM and FPGA platforms involving quad block approach. The performance analysis of both platforms is elaborated in the section 4. Error metrics have been discussed in the section 5 and section 6 concludes the paper briefing the possible future work with this approach.
Quad block equisum stego algorithm:
This section discusses about the quad block equisum algorithm. The cover image is considered as a grayscale image that has dimension which can be divided by 16. The cover image is splitted into groups of 4×4 blocks of pixels. So a quad block contains 16 pixels. The quad blocks are arranged in such a way that the individual blocks contain non-overlapping rows or columns of pixels. The positions of pixels are assigned consecutively in left to right order in each row. Figure 1 (a) shows the structure of quad block after initial numbering .This initial block numbering is modified to make a shuffle in order to achieve equal sum of location numbers in all the directions namely row wise, column wise or diagonal wise. Figure 1 (b) shows the final modified structure of quad block. The shuffling of the block numbers has resulted in equal sum when each row positions or column positions or diagonal positions are added together. For example, when second row positions are added, the resulting sum will be 5+11+10+8 = 34. Similarly when diagonal positions are added together the sum will be 16+11+6+1 = 34. That"s why this technique is called equisum quad block approach. The formation of shuffled quad block is for deciding the order of embedding the message in pixels of the 4×4 cover image block. In this way the quad block numbers are acting as a guiding model for traversing in the 4×4 image block to hide the information. In order to hide the entire 64 bits in the 4×4 block, we need to use the equisum quad block to fix up the order in which the information bits to be hidden. The pixel in the position (4, 4) (i.e.) fourth row and fourth column is chosen first for embedding according to the equisum quad block position. The value of pixel in that location is 58. Now as K = 4, the first four bits of secret message have to be hidden by replacing the last four bits of the pixel having value 58. The binary representation of 58 is 00111010 2 . After inserting the first four bits of secret message X (i.e.) 0110, the pixel will be modified to 00110110 2 which is equivalent to 54 in decimal. Figure2 (b) which is the stego image block shows this transformation. Similarly the traversing continues on in the order shown by equisum quad block in figure 2(c) till the process of secret message hiding gets over in all the 16 locations. This process can be repeated for the entire 4×4 blocks of grayscale image. For example, a grayscale image of size 64×64 will contain 256 4×4 blocks. So the quad block algorithm has to be repeated 256 times in order to cover all the pixels.
Hardware stego architectures for data embedding:

FPGA based architecture for Quad block Equisum Algorithm Implementation:
Figure (3) shows the proposed architecture implemented in Cyclone II FPGA. The FPGA used for hardware implementation is Cyclone II FPGA EP2C20F484C7. The cover image is stored in an external SRAM chip which is interfaced with FPGA. For this hardware implementation we have considered 16×16, 32×32, 64×64 and 128×128 cover images and on each image the algorithm is applied to reach the target of full embedding capacity. The high speed asynchronous CMOS static RAM IS61LV25616 has been employed in this architecture in a move to store a number of cover images. This SRAM contains 256K locations with each location capable of storing 16 bits in a process to store two 8bit pixels. The 18 bit address lines from FPGA are important for access to the memory locations of SRAM. Initially the cover image is splitted into group of 4×4 blocks. The 16×16 image is splitted into 16 such blocks. Similarly a 128×128 cover image has to be grouped into 1024 4×4 blocks. The algorithm is tested on various cover images. Eight locations are needed to store a 4×4 block in SRAM. A location contains two grayscale pixels. The location 1 contains UB as 40 and LB as 82 which are the first two pixels of the 4×4 cover image block. To store a 128x128 image 8192 locations of SRAM has to be utilized. The block RAM which is internal RAM of FPGA can store 239616 bits of secret data, an approximate of 30,000 characters. After storing the cover image as 4×4 blocks in external SRAM, the equisum quad block hiding algorithm is applied on the cover image. The LUT present in FPGA directs the embedding logic to generate pixel traversal order. The LUT produces an 18 bit address corresponding to the count value during every 50 MHz clock pulse edge. Once the 18 bit address is generated, the next clock pulse reads the pixel data from the SRAM. The third clock pulse is needed to modify the corresponding pixel and to store it back into the location corresponding to stego image. The pixel to be modified in a SRAM location is indicated to the embedding logic by an internal signal called "u". signal "u" will be Logic "1" if upper byte has to be modified or it will produce Logic "0" in case lower byte needs to be modified by combining secret bit/bits. Totally three clock pulses are needed to choose and hide information in a pixel. A total of 48 clock pulses are required to hide information in entire 4x4 cover image block. In this way, the procedure has to be repeated until all the pixels are covered. The Embedding starts as soon as EMBED flag rises to Logic "1".The collection of secret message characters is stored in internal RAM. The bits of the secret data are chosen from internal RAM by a counter logic.
RISC Processor Implementation of Quad block Equisum Algorithm:
The embedded chip used is NXP Philips LPC 2136, a 32-bit RISC processor based on ARM7TDMI-S architecture. The processor contains 256KB of on-chip Non-Volatile FLASH program memory for the code storage. On-chip SRAM of 32KB caters the need of temporary data storage . Fig 4(a) and 4(b) show the schematic of on-chip memory and memory map details. Memory can be accessed as byte, 16-bit halfword or 32-bit word. Although the processor operating frequency is limited to 30MHz, when using the on-chip integrated oscillator, high speed operation up to 50 MHz is possible with the external oscillator source. The proposed algorithm has been implemented with 50MHz clock frequency. The size of on-chip FLASH code area exits in LPC2136 is much higher compared to the space required for our stego algorithm implementation.
Pseudo code for Algorithm in ARM Processor:
Choose the cover image and divide it to group of 4x4 blocks. Initialize LUT according to Equisum quad block traversal path. If (LSB == 1"b1) Do bitwise ORing with cover pixel to insert Logic "1";
Else Do bitwise ANDing with cover pixel to insert Logic "0"; Rotate secret byte towards right for one bit; } Repeat the process 8 times for a secret byte; } while (remaining cover pixels exist) The extraction of the secret data can be done by grouping the stego image quad blocks together and by applying the LUT based algorithm developed on Lab VIEW GUI. 
Performance Analysis:
In this section, the hardware consumption and timing analysis of both FPGA and ARM7 RISC processor have been analysed. The detailed hardware consumption on FPGA is shown in Table 1 .
Algorithm on FPGA -
4.2Algorithm on ARM Processor:
The memory footprint for implementing the Equisum quad block algorithm in ARM7 RISC processor is given in Table1. It has been observed that the memory foot print is equal for all the different sizes of cover images. The time taken for embedding 2 bits (K=2) and4bits (K=4) will also be equal to time taken for K =1 due to parallel processing nature of FPGA. For a 16×16coverimage 32 bytes of characters have been used to embed the information in all the 256 pixels considering k = 1.
Stego chip Master clock frequency = 50 MHz Clock cycles required to hide one bit = 3 cycles of 50 MHz Clock cycles required to hide one character = 3 x8 = 24 cycles of 50 MHz Time taken to hide entire information in cover image that is stored in SRAM = 32*24 *(1/ (50MHz)) =15.36 microseconds.
4.3.2ARM Timing Report:
Embedded C based Equisum Quad block algorithm has been tested in ARM7 using the KEIL µVision4 Integrated Development Environment (IDE). It has been inferred that when the size of the cover image and the number of embedding bits per pixel are kept constant, the change in cover image and embedding information does not have any impression on the embedding time. Using the compiler optimization technique, the algorithm implementation was optimized in order to achieve less execution time. Though this time optimization gives rise to code size, the memory foot print for the algorithm with various image sizes and K values occupies less than 1% of the available code area. 
Error Metrics -MSE and PSNR:
The Where X and Y are image coordinates, M and N are the dimensions of the image S xy is the generated stego-image and C xy is the cover image. Also C max holds the maximum value which is 255 for the gray scale images.
Table3.MSE and PSNR for Manchain Images
Sample Cover and stego images:
Fig5 (a), 5(b) and 5(c) represent original and stego images of Manchain 128×128 image. The worst case PSNR for K= 4 is 24.61dB [14] .The results obtained show an average PSNR of 32.5dB which is a very good value for K= 4. 
Conclusion and Future work:
The major objective of our work has been analysing feasibility of implementing such an algorithm in hardware platforms. While ARM platform is suitable for information hiding in images or videos in a mobile, the FPGA can be employed in High speed network applications such as banking sectors, defense etc.,. Where high payload has to be protected from the invaders. The Future work can be focused on testing the stego algorithm at higher frequencies, reduction of logic elements for a type of stego algorithm in FPGAs. Finding techniques for the parallel embedding of multiple pixels using the ARM processors with SIMD instruction capability might enhance the performance immensely.
.
