Research in digital watermarking is mature. Several software implementations of watermarking algorithms are described in the literature, but few attempts have been made to describe hardware implementations. The ultimate objective of the research presented in this paper was to develop low-power, high-performance, real-time, reliable and secure watermarking systems, which can be achieved through hardware implementations. In this paper, we discuss the development of a very-large-scale integration architecture for a high-performance watermarking chip that can perform both invisible robust and invisible fragile image watermarking in the spatial domain. We prototyped the watermarking chip in two ways: (i) by using a Xilinx field-programmable gate array and (ii) by building a custom integrated circuit.
various transformations that have been used extensively as alternatives to the spatial domain are the discrete cosine transform (DCT), the Fourier transform (FT), and the wavelet transform (WT). Frequency-based methods have several advantages over spatial domain methods [10] , [11] , [12] . For example, DCT domain techniques are more robust to attacks, and the perceptible quality of DCT domain watermarked images is better. On the other hand, spatial domain watermarking algorithms have less computational overhead than frequency domain algorithms. Spatial domain watermarking algorithms can also be faster in terms of computational time and hence are more suitable for real-time applications. Thus, we have focused on spatial domain watermarking because our ultimate goal is to develop VLSI architectures and chips such that real-time watermarking in the framework of electronic components would be possible.
Digital watermarks can be divided into visible and invisible types, based on human perception [7] , [13] , [14] , [15] . A visible watermark is a secondary translucent image overlaid onto the primary image [16] , [17] . An invisible watermark, on the other hand, is completely imperceptible. An invisible robust watermark is embedded in such a way that alterations made to the pixel value are not noticeable and can be recovered only with the appropriate decoding mechanism [12] . An invisible fragile watermark is embedded in such a way that any manipulation or modification of the image would alter the watermark [18] . In this paper, we deal with both invisible robust and invisible fragile watermarking algorithms and discuss the development of a watermarking chip with these two capabilities.
Each of these watermarking algorithms has its own applications; thus, all are equally important. Over the past decade, numerous watermarking algorithms have been invented, and their software implementations have been demonstrated [19] , [20] . However, only a few hardware implementations are presented in the literature. A hardwarebased watermarking system can be designed on a field programmable gate array (FPGA) board, Trimedia processor board [21] , or custom integrated circuit (IC) [22] . The choice between an FPGA and a cell-based IC is a trade-off among cost, power consumption, and performance [22] , [23] . In this paper, we present both FPGA-based prototyping and a custom IC design to facilitate high-performance real-time watermarking at the source end when the image is captured by an electronic component like a digital camera.
The rest of the paper is organized as follows. Section II highlights the contributions of this paper. Section III discusses hardware-based watermarking systems described in the current literature. Section IV discusses the watermarking algorithms in detail. The architecture proposed for implementing the algorithms is described in Section V. Section VI discusses the FPGA prototype of the proposed watermarking chip. Section VII discusses the design and implementation of the custom VLSI chip based on the proposed architecture. The subsequent sections, Section VIII and Section IX, discuss the experimental results and conclusions, respectively.
II. CONTRIBUTIONS OF THIS PAPER
The contributions of this paper are multifold. First, we introduce an invisible robust image watermarking algorithm and an invisible fragile image watermarking algorithm, both of which are spatial domain watermarking algorithms.
The algorithms were first validated by using MATLAB simulation to evaluate their performance. The algorithms were developed such that their hardware implementation is simple yet provides high performance, enabling the June 19, 2007 DRAFT hardware to perform watermarking in real time. Two hardware prototypes were developed as follows: (1) FPGAbased implementation and (2) custom IC implementation. We followed a top-down approach in our FPGA-based prototyping. For a compact custom IC design with a minimal silicon area, we followed a bottom-up approach.
We synthesized the prototype watermarking encoder chip in a Xilinx FPGA using VIRTEX technology. For the custom IC design, we used various Cadence tools with 0.35µm complementary metal-oxide semiconductor (CMOS) technology. A typical watermarking chip has four distinct modules: watermark encoder, watermark decoder, watermark generator, and controller that controls the operations. Most of the invisible watermarking algorithms described in the current literature as well as the algorithm presented in this paper use pseudorandom numbers as watermarks. Therefore, we focused on the structural design aspects of a watermarking generator that uses linear feedback shift registers (LFSRs). The encoder chip implemented using 0.35µm CMOS technology consumes 2.0mW when operated at 3.3V supply and 0.5GHz frequency.
III. RELATED RESEARCH
The current literature is rich in watermarking algorithms developed for various types of media, such as image, video, audio, and text data, and their software implementations. The algorithms work in various domains like spatial, DCT, and wavelet and insert-extract different types of watermarks including invisible robust, invisible fragile, and visible. These watermarking algorithms primarily work off-line; i.e., the images are first acquired and then the watermarks are inserted before the watermarked images are made available to the user. Thus, in this approach, there is a gap between image capture and image transmission. The objective of this research work was to develop a hardware-based watermarking system to bridge this gap. The watermark chip will be fitted in any electronic component that acquires the images, which are then watermarked in real time while capturing. In this section, we briefly discuss the few hardware-based watermarking systems mentioned in the current literature. These hardwarebased watermarking systems were designed and implemented on an FPGA board, Trimedia processor board, or custom IC using different CMOS technologies.
Strycker et al. [24] have proposed a real-time watermarking algorithm in the spatial domain for television broadcast monitoring. They address the implementation of a real-time watermark embedder and detector on the Trimedia TM-1000 very long instruction width (VLIW) processor developed by Philips Semiconductor. In the insertion procedure, pseudorandom numbers are added to the incoming video stream based on the luminance value of each frame, and watermark detection is based on the calculation of the correlation values. Mathai et al. [23] describe a VLSI chip designed with 0.18µm CMOS technology implementing the above video watermarking algorithm.
A DCT domain invisible watermarking chip is presented by Tsai and Lu [25] . The watermarking system embeds a pseudorandom sequence of real numbers with a selected set of DCT coefficients and is extracted without using the original image. The chip is implemented with TSMC 0.35µm technology and has a die size of 3.064 × 3.064mm 2 and 46,374 gates. The chip is estimated to consume 62.78mW of power when operated at 50M Hz frequency with a 3.3V supply.
Garimella et al. [26] have proposed a watermarking VLSI architecture for invisible fragile watermarking in the June 19, 2007 DRAFT spatial domain. In this scheme, the differential error is encrypted and interleaved along with the first sample. The watermark can be extracted by accumulating the consecutive least significant bits (LSBs) of the pixels and then decrypting them. The extracted watermark is then compared with the original watermark for image authentication.
The application specific integrated circuit (ASIC) is implemented using 0.13µm technology. The area of the chip is 3453 × 3453µm 2 , and the chip consumes 37.6µW of power when operated at 1.2V . The critical path delay of the circuit is 5.89ns.
Mohanty et al. [22] have proposed another watermarking hardware architecture that can insert two visible watermarks in images in the spatial domain. This architecture can insert either of the two watermarks depending on the requirements of the user. The chip is implemented with 0.35µm technology and occupies an area of 3.34 × 2.89mm 2 and consumes 6.9286mW when operated at 3.3V and 292.27M Hz.
Fan et al. [27] have proposed a visible watermarking design based on an adaptive discrete wavelet transform (DWT). They propose efficiently reduced operational and resource-sharing techniques using an existing algorithm.
Host image and watermark are transformed into three-level multi-resolution structures. The host image signal is divided into two sequences with the same pattern length. Processing time is reduced by using a two-path parallel processing architecture. The signal is sent to different processing elements by the demultiplexers. The watermark image is embedded by modifying the coefficients of the image.
In this paper, we describe a VLSI architecture that implements both Invisible robust and invisible fragile watermarking functionalities in the spatial domain. In invisible robust watermarking, a ternary watermark is embedded in the original image with an encoding function that involves addition of a scaled gray value of neighboring pixels.
A binary watermark generated from pseudorandom numbers is XORed with the original image bit plane in the invisible fragile watermarking algorithm. The VLSI architecture is prototyped with a Xilinx FPGA and a custom IC design.
IV. INVISIBLE WATERMARKING ALGORITHMS
In this section, we present an invisible robust image watermarking algorithm and an invisible fragile image watermarking algorithm whose VLSI architecture and chips are described in subsequent sections. The algorithms selected are simple and effective and, with modifications, can result in high-performance hardware that can perform watermarking in real time. We discuss the insertion and detection methods in brief, with the modifications necessary to facilitate hardware implementation. The notations used in our description of the algorithms are listed and defined in Table I .
In both algorithms described in the following subsections, we used a binary pseudorandom number sequence as the watermark. Pseudorandom number sequences with large periods have excellent randomness and correlation
properties [28] , [29] , [30] . We anticipate that the watermark created with such a pseudorandom number sequence will have several distinct advantages, including the following:
• An authorized user who knows the watermark key that includes the initial sequence and the number of cycles can exactly reconstruct the original watermark whenever needed. June 
19, 2007 DRAFT
• An attacker who tries to tamper with the watermark can only succeed in swapping the pseudorandom number sequence without affecting its correlation properties. Thus, meaningful watermark detection is still possible.
• The techniques used for generating such pseudorandom number sequences are all compatible with hardware implementation. Such implementations would be capable of online, real-time algorithm execution, the primary objective of this paper. 
A. Invisible Robust Watermarking Algorithm
Invisible robust image watermarking is based on the widely acceptable algorithm in [31] , [32] . The algorithm works in the spatial domain. This algorithm is claimed to be robust to various major attacks, including geometric attack, as evident from benchmark testing like Stirmark.
The watermark insertion process is demonstrated with the help of the block diagram shown in Fig. 1(a) . In the first step of the insertion process, a pseudorandom number generator generates a random sequence of ternary data.
The watermark W is constructed out of this random sequence; thus, W is a ternary image (i.e., an image with three levels of gray values) having pixel values {0,1 or 2}. The pseudorandom number generator uses a digital watermark key K as the initial sequence. Using encoding functions E 1 and E 2 , the watermark insertion is performed by altering the pixels of the original image I as follows:
June 19, 2007 DRAFT to obtain the watermarked image I W . The encoding functions E 1 and E 2 are functions of the original image I and its neighborhood image I N . For watermark strength factors α 1 and α 2 , these encoding functions are defined as follows:
where α 1 and α 2 satisfy 1 > α 1 > 0 and −1 < α 2 < 0. These conditions guarantee that the encoded pixel always has a positive value. The scaling (1 − α 1 ) is used to scale I N to ensure that the watermarked image gray value I W does not typically exceed the maximum gray value for 8-bit image representation corresponding to a pure white pixel. In rare occasions when I(i, j) is close to 255, the watermarked pixel I W (i, j) may exceed the maximum value. In that case, the value of I W (i, j) is truncated to 255. Our experimental results indicate that this truncation does not create a perceptible change in the image.
For a particular pixel location of the original image, the neighborhood image pixel gray value can be calculated from the gray values of the neighboring pixels. A given neighborhood radius r decides the number of neighborhood image pixel gray values to be used for its calculation. The neighborhood radius r determines the upper bound of the watermarked pixels in an image. For the smallest neighborhood radius, r = 1, it can be computed as the average of the three other pixel gray values:
This averaging requires division, which can be a costly operation. We instead aim to build fast and simple hardware.
A division of four would be easily implemented using two right-shift by 1-bit operations. However, division by four of three neighborhood pixel gray values may reduce the accuracy of I N . Thus, to ensure a proper tradeoff among accuracy, cost of computation, and cost of hardware, we propose the following function for the calculation of I N :
The use of this averaging method simplifies hardware implementation because the division by two can be implemented by using a right-shift by 1-bit operation. The block diagram for the watermark detection is provided in Fig. 1(b) . In the first phase of the detection process, the original ternary watermark W is generated with a pseudorandom number generator using the same watermark June 19, 2007 DRAFT digital key K and the same number of cycles used during the insertion process. The next step involves calculation of a difference image I D using the detection function shown below:
In this function, I W is the watermarked image under test, and I N is the neighborhood image. This neighborhood image I N is calculated from the original host image by using the function presented in Eqn. 4. After creation of the original watermark image W and the difference image I D , the next step involves creation of a binary watermark image W * as follows:
Using the original ternary watermark W and constructed binary watermark W * , a detection ratio is determined as
The detection ratio D R is in essence the ratio of the correctly detected pixels to the sum of the watermarked pixels in the image. Ownership can be established when the detection ratio is larger than a predefined detection threshold.
Before proceeding with the development of the architecture and its corresponding FGPA and custom IC implementations, we performed simulations of the above algorithm. We used MATLAB as the simulation environment and tested the algorithm for various test images. For simplicity, we chose α 1 = −α 2 and performed our experiments with various values in the range α 1 = 0.98 (weak watermark embedding) to α 1 = 0.1 (very strong watermark embedding). The corresponding values of peak signal-to-noise ratio (PSNR) were in the range of 56dB to 21dB
with no visual degradations.
B. Invisible Fragile Watermarking Algorithm
The invisible fragile watermark insertion was performed as presented in Fig. 2(a) . The first step involved the generation of a pseudorandom binary sequence {0, 1} of period N using a a pseudorandom number generator.
The period N is equal to the number of pixels (N W × N W ) of the watermark image size that the user intends to create. The watermark image was constructed by arranging the binary sequence into 8 × 8 blocks. The size of the watermark image W was assumed to be the same as the size of the host image I. The bit planes of the grayscale input image were derived. A grayscale image I is represented as eight binary images I [7] , I [6] , I [5] , I [4] , k denoting a specific bit plane. The binary watermark is inserted in the appropriate bit plane such that the PSNR is higher than a predefined threshold value. Assuming that the watermark insertion is to be performed in the k th bit plane, the watermark insertion process is given by the June 19, 2007 DRAFT following expression:
where the lower-order bit planes range from 0 to k − 1 and the higher-order bit planes range from k + 1 to 7, k being the inserting bit plane of the image.
After all the bit planes are merged, the watermarked image I W is obtained. It may be noted that this algorithm, which involves conversion of grayscale to binary images, is inherently suitable for hardware implementation because in the hardware, the number system will be 2's complement binary. The identification of the candidate bit plane for watermark insertion is an iterative process. However, in order to develop area-efficient and high-performance hardware, we decided to select a particular bit plane and eliminate the iterative step. After running simulations with several hundred different images, we concluded that the 3 rd (k = 2) bit plane is the best candidate for watermark insertion from the PSNR point of view. The typical PSNR for different test cases for k =0, 1, 2, and 3 was found to be 50dB, 46dB, 39dB, and 33dB, respectively. Thus, we decided to use the 3 rd (k = 2) bit plane as the candidate for watermarking to avoid any perceptible impact on the image quality. Fig. 2(b) highlights the steps of our fragile detection process. The first step consists of generating a pseudorandom binary sequence, followed by creating the binary original watermark W with the key and approach used during the insertion phase. We then created the watermarked image I W following the same steps as in the watermark insertion process. Then, the cross-correlation of the watermarked image I W , test watermarked image I W * , and binary watermark image W was calculated, followed by a test statistic computation. Depending on the value of the test statistic, the severity and extent of forgery of the the test watermarked image were determined. Following the approach in [30] , we calculated the spatial cross-correlation function of images I and J as:
Assuming that W k is the watermark image block, I W k is the watermarked image block, and I W * is the watermarked image block that might be forged, the test statistics for the block δ k are calculated as follows [30] :
The average test statistics δ for all blocks is obtained as given below:
where N denotes the number of blocks).
In order to determine the values of δ to set a test paradigm, we performed extensive simulations with MATLAB.
The watermarked image was tampered with by using the built-in functions of the image editing software and MATLAB. The average test statistics were calculated in all cases. Typical test statistics for the "Lena" image are shown in Table II for various quality factors (QFs) of the Joint Photographic Experts Group (JPEG). The testing paradigm can be established as follows:
• If δ < 9.0, the watermarked image under test is perceptibly identical to the original watermarked image. It is fully authentic.
• If 9.0 ≤ δ < 50, the watermarked image under test is forged. It is authentic.
• If 50.0 ≤ δ ≤ 70, the watermarked image under test is heavily forged. It is authentic.
• If δ > 70.0, the watermarked image under test does not belong to the owner or has been severely tampered with.
V. PROPOSED VLSI ARCHITECTURES FOR INVISIBLE WATERMARKING
In this section, we discuss the architectures proposed in this paper for implementing the two algorithms discussed in Section IV. We first present individual architectures for the invisible robust algorithm and for the invisible fragile algorithm. Then we describe a unified architecture that can perform both invisible robust and invisible fragile watermarking. While developing the unified architecture, we shared the different modules so that individual modules could be used for performing either of the watermarking operations. Each of the proposed architectures (datapath and controller) has three distinct units or modules: watermark generation unit, watermark insertion unit, and controller unit that executes the overall watermarking process. We used embedded on-chip memory in each of the datapaths to temporarily buffer the data for computation. However, the datapaths can be developed without the memory units if they directly access off-chip memory.
A. Architecture for Robust Watermarking
The datapath for invisible robust watermarking is shown in Fig. 3 . The image random access memory (RAM) is used to store the original image, which is to be watermarked. The image data can be written to the image RAM by activating proper control signals. The watermark RAM serves as a storage space for the watermark data. The watermark data can either be generated using the linear feedback shift register (LFSR) or given as an external June 19, 2007 DRAFT 
B. Architecture for Fragile Watermarking
The datapath for fragile watermark insertion is shown in Fig. 4 . The original image is stored in the image RAM, and the watermark is created in the same way as in the case of robust watermarking described above and stored in the watermark RAM. For watermark insertion, the 3 rd bit line of the image pixels is fed as input to an XOR gate along with that of the watermark value. The output of the XOR gate is returned to the image RAM, and the 3 rd bit line is overwritten by selecting the appropriate control signals.
C. Watermark Generation Unit
Most of the invisible watermarking algorithms described in the current literature and the algorithm discussed in this paper insert pseudorandom numbers to host data. Therefore, we focused on the structural design aspects of watermarking generators using LFSRs. The ternary watermark is generated by a pseudorandom sequence generator.
The watermark generation unit consists of an LFSR. The LFSR has a multitude of uses in digital system design and is a very crucial unit in watermark security and detection. It is a sequential shift register with combinational feedback logic around it that causes it to cycle pseudorandomly through a sequence of binary values. We have studied the challenges of an LFSR and have taken appropriate measures to ensure quality design [33] , [34] , [35] .
The LFSR consists of flip-flops (FFs) as sequential elements with feedback loops. The feedback around an LFSR comes from a selected set of points called taps in the FF chain; these taps are fed back to the FFs after either XORing or XNORing.
The design aspects considered when modeling LFSRs are as follows [33] , [34] , [35] .
• Using XOR or XNOR feedback gates: The feedback path may consist of either all XOR gates or all XNOR gates. They are interchangeable, and given particular tap settings, the LFSR will sequence through the same number of values in a loop before the loop repeats itself; the only difference is that the sequence will be different.
• Choosing one-to-many or many-to-one feedback structure: Both one-to-many or many-to-one feedback structures using XOR or XNOR gates can be implemented and use the same number of logic gates. A one-to-many structure will always have a shorter worst-case clock-to-clock path delay because it passes only through a single two-input XOR (XNOR) gate instead of a tree of XOR ( XNOR) gates in the case of the many-to-one structure.
• Avoiding prohibited or lockup state: Using XOR gates, the LFSR will not sequence through the binary value where all bits are at logic zero. Should it find itself with all bits at logic zero, it will continue to shift all zeros indefinitely. Therefore, the LFSR should be prohibited from randomly initializing to all logic zeros during powerup. Similarly, the XNOR-based LFSR will not sequence through the binary values where all bits are at logic one and should be prohibited from randomly initializing to all ones during powerup. This random initialization can be overcome by using the following methodology: (1) using a reset to either preset or clear the individual register FFs to a known good value (in this case, the value is hardwired and cannot be changed;
(2) providing a means of loading an initial seed value into the register, either in parallel or serially, and (3) June 19, 2007 DRAFT modeling extra circuitry that allows all 2 n values to be included in the sequence.
• Ensuring a sequence of all 2 n values: If taps provided for a maximal length sequence are used, the LFSR configurations described so far will sequence through (2 n −1) binary values. The feedback path can be modified with extra circuitry to ensure that all 2n binary values are included in the sequence.
An 8-bit LFSR is modeled so as to use a one-to-many feedback structure and has been modified for a 2 n looping sequence. It calculates and holds the next value of the LFSR, which is then assigned to the output signal WM DATA after each clock edge. The NOR of all LFSR bits minus the most significant bit that is LFSR REG (6:0) generates the extra circuitry needed for all 2 n sequence values.
D. Overall Architecture of the Proposed Watermarking Chip
The combined datapath for both robust and fragile watermarking is shown in Fig. 5(a) . The data path is obtained by stitching the two datapaths from ( Fig. 3 and Fig. 4 ) using multiplexers, which in turn give rise to additional control signals. The controller that drives the datapath is shown in Fig. 5(b) . The control signals and their descriptions are given in Table III . presented in Fig. 6(a) . We logically and structurally divided an architectural unit into several modules first. Then we individually tested and verified these modules through simulation and synthesis of the VHSIC hardware description language (VHDL) and the register transfer language (RTL). Once the individual modules were tested and verified to be functionally correct, they were stitched together. Next, a controller design executed the datapath and ensured that the unit performed its operations. are not included here for brevity. During the simulation process, it was assumed that the image data was being read from, and stored to, the hard drive.
From the synthesis results, we derived the macro statistics and timing report of the units in Table IV . Minimum period is an indicator of the timing path from a clock to any other clock in the design. The minimum period is reported for both the generation unit and the encoder, whereas the critical path delay is reported for the insertion unit, which is fully combinational. The cell usage indicates all the logical cells that are basic elements of the technology.
VII. CUSTOM IC IMPLEMENTATION OF THE PROPOSED WATERMARKING CHIP
In this section, we describe a custom IC design of the proposed architecture that can perform both invisible robust and invisible fragile watermarking. As demonstrated in Fig. 6 (b), we followed a modular design approach to generate the layout of the complete chip in which the logic design is top-down and the physical design is bottom-up.
Our motivation was to use a custom layout to build the whole chip using minimal silicon. Following a hierarchical approach, we created the layout of various resources, such as adders, multipliers, etc. This approach was followed by hierarchical creation of the layout for the insertion and generation units, encoder, etc. Finally, once the complete chip layout was generated, parasitic extraction and power, area, and performance analysis were performed on the post-layout (including parasitics) design.
The watermarking datapath and the controller were implemented in the physical domain using the Cadence Virtuoso layout tool. The design involved the construction of three main modules: the memory, the watermarking module (datapath), and the controller unit. Each of the three modules was designed individually through modularization and later interfaced with each other. The layouts of the gates at the lowest level of the hierarchy were drawn by using the CMOS standard cell design approach. We designed a standard cell library containing basic gates (AND, OR, and NOT) and a 1-bit RAM cell.
The memory module involved two read/write memory structures, one for the original watermarked image (with a size of 256 × 256) and another for the watermark with a size of 128 × 128. The bit size for the image RAM was 8 bits and 2 bits for the watermark RAM. The basic building block for a memory module is a 6 − −transistor static RAM cell available in the cell library. We chose static RAM instead of dynamic RAM because of its shorter read and write cycles. The memories were built as n × n arrays of static RAM cells and were addressed using row and column address decoders. Each decoder is implemented as an m-bit counter with additional AND logic to address
The main component of the watermarking module (datapath), the insertion unit, consists of two 8-bit adders, two 8-bit multipliers, and an 8-bit adder/subtractor, each of which was built with standard structures. The carry inputs to the adder/subtractor and one of the inputs to the XOR gate were set to high whenever the watermark pixel value was 2 so that a subtraction was carried out as required for the robust watermarking encoding function (Eqn. 2).
Several multiplexers were used at appropriate places in the design to select one of the incoming lines. Each of these multiplexers was implemented by using a combination of transmission gates. Three asynchronously resettable registers were designed to encode the five states of the controller depicted in Fig. 5(b) . At any given time, the three registers could be reset by the user to return the controller to its initial state; and from there, the watermarking function could be started afresh. Each of the above-mentioned modules was implemented and tested separately and then connected to obtain the final chip. The gate count, power, and delay time of each module are listed in Table V Table V were performed by using transistor-level simulations in HSPICE. Full-chip functionality was verified with NanoSim. It is evident from the above statistics that the RAM consumed the most amount of power. If we assume that the proposed chip is to be used as a module within a complete JPEG encoder, then the memory module could be avoided in the watermarking datapath circuit. The layouts of the datapath and the controller are shown in Fig. 7(a) and Fig. 7(b) , respectively.
The complete layout and floorplan of the watermarking chip is given in Fig. 8(a) . The pin diagram for the chip showing the inputs and outputs is given in Fig. 8(b) . The choice between invisible robust watermarking and invisible fragile watermarking is made with the help of a robust/fragile line. When the robust/fragile line is high, it represents invisible robust watermarking; and when the robust/fragile line is low, it represents invisible fragile watermarking.
During invisible fragile watermarking, the invisible robust watermarking part of the chip is disabled and vice versa.
The invisible robust insertion datapath and the invisible fragile datapath share the same output pins. The overall design statistics of the prototype chip are provided in Table VI .
VIII. EXPERIMENTAL RESULTS
Each individual unit in the chip was tested individually with HSPICE, and the complete encoder was simulated
with NanoSim. The model file was obtained from MOSIS. A typical simulation time of 10, 000ns was used. The simulation time depended on the module being tested. Random vectors were used for testing. The functionality and the delay of each module were verified with the help of simulation waveforms.
We carried out extensive simulations with various test images. The test images of size 256 × 256 were borrowed from [9] , [16] , [17] for the simulations. Selected examples of images are shown in Fig. 9 . Visual inspection of the images illustrates the quality of the watermarking and demonstrates that the results matched with the software approach. As a quantitative measure of the perceptibility of the watermark, we again calculated the PSNR. We then compared the PSNR of the watermarked images obtained by using the proposed chip with that of the watermarked images obtained by using the software algorithms. The PSNR in both the hardware and the software algorithms was found to be approximately the same.
A broad perspective of various existing VLSI chips for watermarking is provided in Table VII along with the prototype chip described in this paper. The chip statistics provide data of technology, area, supply voltage, operating frequency, and power consumption, respectively. Our proposed chip performs the operations of both Tsai and Lu [25] and Garimella et al. [26] , which perform invisible robust and invisible fragile watermarking, respectively. But, the power consumption is much lower than that of [25] . At the same time, our chip can operate at a frequency as high as 0.5GHz, which is higher than any of the existing watermarking chips. It may be noted that it is not possible to provide a fair comparison because the different chips work in different domains and have different capabilities.
It is a fact that DCT domain computations are more resource-intensive than spatial domain computations. However, the table is provided here not for comparison but as a guideline for the readers about similar architectures and chips performing watermarking. 
IX. CONCLUSIONS
In this paper, we presented a watermarking encoder that can perform invisible robust, invisible fragile watermarking, and a combination of both in the spatial domain. To the best of our knowledge, this is the first watermarking architecture having both functionalities. The chip can be easily integrated in any existing JPEG encoder to watermark images right at the source end. The implementation of a low-power, high-performance version is currently in progress. Low-power VLSI features, such as multiple supply voltages, dynamic clocking, and clock gating will be considered. High-performance architectural implementations, such as pipeline or parallelism, are under investigation.
The disadvantage of the watermarking algorithms implemented is that the processing needs to be performed pixel by pixel. In the future, we plan to investigate block-by-block processing. A low-power, high-performance watermarking decoder is also in the implementation stage. We are also planning to modify the architectures to handle color images.
Because DRM systems need both encryption and watermarking, we believe that combining both the hardware and the data compression architectures will be necessary. Moreover, the on-chip encrypter can be used in storing the watermarking generator key in encrypted form, thus enhancing watermark security.
X. ACKNOWLEDGMENTS
