In this paper, a new cryptography system is proposed, which combines the methods of position permutation and value transformation for encryption/decryption. Three good features are involved in this system: (1) High security evaluated with the measure of fractal dimension, (2) Content of encrypted image is sensitive to the initial key, and (3) this system can easily defense against the exhaustive search attack. For the applications with real-time in multimedia system, the parameterized hardware design and its very large-scale integrated circuit (VLSI) generator software are developed. The proposed VLSI generator can be parameterized by the parameters of system-type, packet size, throughput, and security to create proper architecture for the application. All the architectures generated from the VLSI generator have been verifi ed strictly. Except for passing the entire coding guideline check and 100% code coverage, with the 0.18 um cell library, all the confi gurations of architecture are synthesized and verifi ed for speed, area, and power consumption as well as delivering the essential scripts. Regarding verifi cation of all confi gurations, the throughput can be ranged between 1.59 and 2.25 Gbps with the hardware cost of 0.54 and 3.92 mm 2 . Compared with the existing designs, the proposed design possesses wide range of performance and benefi t for most of applications in multimedia system.
INTRODUCTION
Nowadays, owing to rapid increase in bandwidth, it is more and more prevalent to transmit multimedia signal over internet. Meanwhile, illegal data access has become more serious. The data security [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] has become the critical and imperative issue. The real-time realization of cryptography system for application of multimedia system should be focused on as well as the algorithm development. Some data encryption/decryption algorithms and their hardware designs for real-time have been proposed in the last decade. Based on the SAFER algorithm with 128-bit key, Schubert et al. [5] proposed the reusable cryptographic VLSI core with 251.8 Mbit/s throughputs. Mitsuyama et al. [6] have implemented the highperformance VLSI with burst mode and 128-bit block ciphers for AES. Lin [7] has designed and realized the IP core for TDCEA Encryption/Decryption algorithm for the Video Surveillance System. Lewis et al. [8] proposed the architecture design of application-specifi c processor for cryptographic system and its VLSI implementation. On the analysis of ciphertext-only attack and computation complexity points of view, in this paper, based on the Shannon product theory, we proposed a multimedia cryptographic system (MCS). MCS consists of four functions of (i) Every N−1 bytes of input data are randomly expanded to N bytes; (ii) Random swapping on the expanded data is made in multi levels; (iii) The eight bit-planes are randomly XORed or XNORed with two random operands; (iv) Two rounds of 2D 64-bit rotation operation. All the involved operations are controlled by a binary sequence being generated from a chaos-based pseudorandom bit generator (PBG). As the computation complexity analysis of ciphertext-only attack with the key lengths of 271 and 283 and the packet sizes of 16 and 32 bytes for decryption, we can see that MCS is hard to be attacked within short period of time.
Regarding real-time realization of the proposed cryptographic system, the parameterized hardware architecture is designed with the techniques of pipeline and parallel processing, and its VLSI generation software is developed such that the hardware of MCS can be automatically created for the application with required performance. We have verifi ed all the confi gurations in VerilogHDL created by the VLSI generator with 0.18 um CMOS technology. Each of the confi gurations has been qualifi ed for coding style and code coverage. The qualifi cation results of all the confi gurations show no error and few petty warnings under RMM coding guidelines [13] as well as near 100% code coverage. The verifi cation results of high-level synthesis reveal that the throughput can be ranged between 1.59 and 2.25 Gbps with the corresponding hardware cost of 0.54 and 3.92 mm 2 . Comparing with the existing designs [5] [6] [7] [8] [9] , the performance of the proposed design is better than the others in terms of the evaluation index of data rate per area (DRPA). Obviously, with the provided high security, the proposed high-performance VLSI design is easily integrated in most of high-quality multimedia system with the requirement of real-time and reasonable hardware cost.
The rest of paper is organized as follows. In Section 2, we illustrate the proposed cryptographic system in detail; the MATLAB simulation results that are involved. In Section 3, we present the proposed parameterized hardware design of MCS. In Section 4, we show the verifi cation of proposed design with different parameters and comparison with some existing designs. Finally, we conclude this paper in Section 5.
THE PROPOSED CRYPTOGRAPHIC SYSTEM

Multimedia Cryptography System
Based on the Shannon product theory, a new MCS is proposed as Fig. 1 input data undergo the following subsystems: (i) data expansion to have 16-byte packet, (ii) swapping operation, (iii) bit-plane logical operation, and (iv) 2D bit rotation operation. The detail description for MCS is elaborated in the following.
PBGs in MCS
Kocarev and Jakimoski construct a class of chaos-based PBGs in [14] . We adopt the fi rst PBG with p = 5, m = 2, M = 256, and k = 2 as the PBG in MCS.
The 
log log : log 2 
Data expansion in MCS
Every N-1 bytes of input data are extended to N bytes and defi ned the N bytes as a packet. For fi rst packet on the input of swapping operation stage, the extended byte is the parameter Secret. In the following packets, with the corresponding bits of control sequence, the extended byte is randomly chosen from the previous packet. The swapping operation for a 16-byte packet g(n)s, 0 ≤ n ≤ 15, contains four levels. They are defi ned as follows: (1, 9, 5) , (2, 10, 6) , (3, 11, 7) , (4, 12, 8) , (5, 13, 9) , (6, 14, 10) , (7, 15, 11) (1, 5, 13) , (2, 6, 14) , (3, 7, 15) , (8, 12, 16) , (9, 13, 17) , (10, 14, 18) , (11, 15, 19) (1, 3, 21) , (4, 6, 22) , (5, 7, 23) , (8, 10, 24) , (9, 11, 25) , (12, 14, 26) , (13, 15, 27 (2, 3, 29) , (4, 5, 30) , (6, 7, 31) , (8, 9 , 32), (10, 11, 33) , (12, 13, 34) , (14, 15, 35 )}.
Bit plane logical operation in MCS
Defi nition 2: The ith bit plane BP i , 0 ≤ i ≤ 7 of g(n)s, 0 ≤ n ≤ 15 is defi ned to be the set of all the ith bits of g(n)s from least signifi cant bit.
The logical operation on the bit planes is defi ned as follows: Regarding the fi rst eight bytes of the packet as the 8 × 8 binary matrix M 1 and the second eight bytes as M 2 after the logical operation, this function block in MCS performs the operations of
, and
Evaluation of MCS algorithm
We evaluate the proposed MCS algorithm by three respects of MATLAB simulation result of image, initial key sensitivity of MCS, and computation complexity of MCS. Parameters in the evaluation are α1 = 2, β1 = 1, α2 = 3, β2 = 2 and Secret = 255 for N = 16 and α1 = 5, β1 = 1, α2 = 1, β2 = 2, α3 = 2, β3 = 2, α4 = 3, β4 = 4, and Secret = 255 for N = 32 as well as the 255 bits of x(0). Figure 2 shows the MATLAB simulation results of image encryption and decryption.
MATLAB simulation result of image
However, since it is not so easy to identify the encrypted results from the images, we have further verifi ed the encrypted results for the security by the metric of fractal dimension [15] . Table 1 Table 1 , we can see that the fractal dimensions of the encrypted images are ranged between 2.9949412 and 2.9956699. They are much close to the optimal fractal dimension of 3, namely the encrypted data in image are randomly so much to have high security. Moreover, the computation of the fractal dimensions (fds) of encryption under x(0) and x[cj] is listed in Table 2 . It shows that the encryption results of MCS are completely disorderly. [16] , it is well known that (i) it has sensitive dependence on initial conditions, (ii) the trajectories are dense, bounded, but non-periodic in the state space, and (iii) it has noise-like spectrum. Especially, Kocarev and Jakimoski have proved that the adopted PBG is cryptographically secure.
To demonstrate the parameter sensitivity of MCS by MATLAB simulation, the root mean square difference (RMSD) is computed. Let A f ′ and B f ′ be the encryption results of the image f of size L × P pixels under x A (0) and x B (0), respectively. The RMSD is defi ned as ( )
. Tables 5 and 6 show the RMSD for decryption. tions are needed. We show the operations in more detail and how many times of these operations are performed with the two packet sizes in the encryption algorithm in Table 7 . It reveals that the proposed MCS can be a low-cost algorithm.
Regarding the analysis of ciphertext-only attack, as shown in Table 8 , due to the key lengths being 271 and 283, respectively, for decryption with the packet sizes of 16 and 32 bytes causes the large amount of computation, we can see that MCS should not be easy to attacked within reasonable duration.
Parameterized Multimedia Cryptographic System
As shown in Fig. 3 , based on the original MCS algorithm above, except for processing type and packet size, the extended MCS algorithm can be confi gured with various throughput, hardware cost, and security. In the parameterized multimedia cryptographic system (PMCS), the The parameter of Security is used to vary the security of PMCS. In the following section, all the hardware architectures corresponding to the confi gurations of PMCS will be elaborated in detail. 
HARDWARE DESIGN OF PMCS
Corresponding to four confi gurations of PMCS algorithm for encryption, architectures with low hardware cost, high throughput, high security, and high throughput and security are designed as shown in Figs. 4 to 7. Depending on the requirement of application, the architecture can be confi gured with the techniques of pipeline and parallel pipeline. The four-stage pipeline kernel in each architecture is composed of data extension and multi-level data swapping for position permutation, and one-level XOR/XNOR operation and 2D circulation for bit-recirculation with random direction and random number of bits for value transformation, where for balancing the pipeline architecture, we further split the design of 2D circulation stage to two stages. Detailed designs of the building blocks in each confi guration of the PMCS are shown in Fig. 8 . The architecture of the PBG, shown in Fig. 9 , is used for the generation of chaos-based pseudorandom bit sequence; where 259-bit control signal is generated as the key of PMCS to randomly decide the operations in each pipeline stage. Observing the formulation of the chaos-based pseudorandom bit sequence generation, we concatenate the result of multiplication by wiring rather than by the complex operations of modular operation and truncation operation to minimize the hardware cost of PBG.
For the clocking issue of proposed design, since the computation time for generating the chaos-based pseudorandom bit sequence is much longer than that needed in encryption, the concept of multiple clocks is adopted in the proposed design to realize a slower clock source in the PBG by dividing the original clock with a certain factor, where the dividing factor is determined by the consumption rate of data processing stages. Regarding data issuing, each packet of data for encryption is composed of 15 or 31 bytes and one Secret byte. Except the Secret, the rest of bytes are sampled serially, and then concatenated with the secret at fi rst stage such that a 16-or 32-byte packet is issued to encrypt. Thus, two clocks are also needed in this part of circuitry. While one is used for data sampling, the other is used as the pipeline clock. For maintaining to continuously issue data in the system, two clocks must be synchronized with each other. Namely, the period of 15 cycles for data sampling must be synchronized with the period of 16 or 32 cycles for sending out the 16 or 32 encrypted bytes. However, since synchronization of the two clocks is not so easy, the manner of stalling to sample for one cycle every 16 clock cycles is adopted in our design. Figure 10 shows the pipeline process of encryption for MCS. The latency is t 0 +t 1 +t 2 +t 3 +t 4 +t 5 , and the sample rate can be the largest one of t 0 , t 1 , t 2 , t 3 , t 4 , t 5 for a packet.
Timing Analysis
Regarding timing of data sampling in the data expansion stage, as shown in Fig. 11 , we can see that N−1 bytes are sampled for each packet. The Nth cycle is used to expand the packet to N bytes, where the extra byte can look as part of the external key. Figure 12 shows the control pipeline. The processing stages in MCS are respectively controlled by 259 bits of control signal generated from PRBG module. N control signals are PR [33] PR [34] PR [35] PR [20] PR [21] PR [22] PR [23] PR [24] PR [25] PR [26] PR [27] PR [12] PR [13] PR [14] PR [15] PR [16] PR [17] PR [18] PR [19] PR [4] PR [5] PR [6] PR [7] PR [8] PR [9] PR [10] PR [11] g (0) g (8) g (1) g (9) g (2) g (10) g (3) g (11) g (4) g (12) g (5) g (13) g (6) g (14) g (7) g (15) ] 4 VERIFICATION OF THE PMCS HARDWARE GENERATOR We have implemented the parameterized VLSI generator of proposed PMCS with different synthesis options. To facilitate usage of the VLSI design, we provide a VLSI generator with graphic user interface (GUI), as shown in Fig. 13 , such that synthesizable RTL code, testbenches, and synthesis script of the desired confi guration of PMCS can automatically be generated to meet the requirements of applications.
For a fair verifi cation, based on 0.18 um CMOS technology, we verify all the designs with different confi gurations by cell-based IC design fl ow. Table 9 shows throughput and area cost of the proposed parameterized VLSI design for all confi gurations. We can see that the throughput can be ranged between 1.59 and 2.25 Gbps with the area of 0.54 and 3.92 mm 2 .
Packet # Figure 10 : Pipeline of the encryption process for MCS. It reveals that the performance is suffi cient for most of the requirement in high-quality multimedia applications with reasonable hardware cost.
Regarding qualifi cation of the proposed VLSI, all the confi gurations of the design have to be passed by both RMM coding guidelines and code coverage. There is no error and only with few petty warnings under the coding guidelines. For the code coverage verifi cation, we adopted VN-cover to check the code coverage. The proposed VLSI for all confi gurations has been certifi ed to have the code coverage near to 100% with the provided test-benches.
In the following, we show the performance evaluation of proposed design and the other existing designs [5] [6] [7] [8] [9] . To eliminate the factor of different fabrication technologies, the index of normalized area (NArea) defi ned by [17] is adopted as eqn (1) , where the silicon area is normalized to 0.35 um technology. In addition, as shown in eqn (2) we further provide the index of DRPA [17] to refl ect the effi ciency of hardware cost for encryption and decryption.
.
Data rate Mbps
DRPA
NArea mm
According to the two indices, we have summarized the comparison of the proposed designs and the existing ones [5] [6] [7] [8] [9] as Table 10 . It reveals that the proposed design is better than some of the existing designs in providing higher data processing rate at lower hardware cost. Although the comparison cannot reveal for all aspects, such as exact security, the simulation result refl ects the effi ciency of proposed algorithm and its hardware design is good enough for applications with requirement of real-time.
CONCLUSIONS
In this paper, a new cryptographic system with high security is proposed. Moreover, for the applications with real-time in multimedia system, the high-performance parameterized architecture and its VLSI generation software are developed. Four architectures corresponding to the parameterized cryptographic system are designed for low hardware cost, high throughput, high security, and high throughput and security, which consist of pipeline or parallelpipeline architectures, and verifi ed with 0.18 um CMOS technology. The verifi cations of all confi gurations show that the throughput of proposed designs can be ranged between 1.59 and 2.25 Gbps with the area of 0.54 and 3.92 mm 2 . Comparing with the existing designs, the performance of proposed designs is better than the others in term of the evaluation index DRPA. It reveals that the proposed high-performance VLSI design is easily integrated in most of high quality multimedia system with the requirement of real-time and reasonable hardware cost. 
