Keywords: system on a chip (SoC), fingerprint-based key management (FKM), hardware security coprocessor, wireless sensor network, reusable optimized logic cell (ROTL).
Introduction
As wireless sensor networks are widely deployed in many applications, such as ocean and wildlife monitoring, manufacturing machinery performance monitoring, building safety and earthquake monitoring, and many military applications, so does the need for effective security mechanisms. Because sensor networks may interact with sensitive data or operate in hostile unattended environments, it is imperative that these security concerns be addressed from the beginning of the system design. However, due to inherent resource and computing constraints, security in sensor networks poses different challenges than traditional network and computer security. Traditional security techniques used in traditional networks cannot be applied directly.
Nowadays, the majority of security coprocessors in SoC for WSN are developed based on Advanced Encryption Standard (AES) [1] , such as CC2430 [2] produced by Chipcon company and JN5121 [3] produced by JENNIC company. As a part of ZigBee protocol, the AES is one of candidate algorithms for wireless sensor networks security.
However, after further inspection, we find that the baseline version of AES uses over 800 bytes of lookup tables which is too large for our memory-deprived nodes. An optimized version of that algorithm (about a 100 times faster) uses over 10 Kbytes of lookup tables. Similarly, we rejected the DES block cipher which requires a 512-entry SBox table and a 256-entry table for various permutations. We use RC5 [4] because of its small code size and high efficiency. RC5 does not rely on multiplication and does not require large tables. However, RC5 does use 32-bit data-dependent rotates, which are high cost on Atmel processor.
The earliest proposal for securing WSNs is a set of security protocols for wireless sensor networks based on TinyOS, called SPINS [5] , to provide data confidentiality, message integrity and authentication. TinySec [6] as the first implementation of WSN security protocol is a link-layer security architecture similar to SPINS. It provides data encryption and authentication functions with little overhead.
However, these protocols are based on symmetric keys, but they do not provide key management functions and any program data protections. Moreover, most of these protocols are realized by software in embedded systems rather than hardware coprocessor in SoC. Therefore, compared with existing works, the hardware security coprocessor and program protection mechanism we proposed offers unique new features and components.
The major contributions in this paper consist of the following aspects:
A fingerprint-based key management (FKM) [7] based on SoC technology is realized in it.
A low-cost and efficient cryptographic coprocessor is described. It allows users to encrypt and decrypt data using the RC5 algorithm with 128-bit key and initialization vector. It supports CBC mode, EBC mode and CTR mode. It reuses ROTL to reduce its additional hardware overhead.
A program protection mechanism and component based on un-resemble coder algorithm is described.
Then some experiments about PC simulation and FPGA validation are presented and the results show the proposed RC5-FKM cryptographic coprocessor is more efficient than embedded general processors such as Atemeg128 and other AES coprocessors. Moreover its additional hardware overhead is 9.6% less than common designs.
The remainder of this paper is organized as follows. Section 2 describes the background and related work of security coprocessors in SoC and security protocols for WSN. Section 3 presents the system architecture of our EasiSoC in details. Section 4 presents the design and implementation of our SoC-based FKM module, hardware security coprocessor and program protection mechanism. The performance of our security coprocessor is evaluated in Section 5. Finally, we make a conclusion in Section 6. Figure 1 System architecture of EasiSoC The system architecture of our SoC for WSN is based on previous work namely EasiSoC [8] [9] illustrated in Figure 1 .
System Architecture
There are several functional components in EasiSoC architecture including a CPU core, Security Coprocessor, Program Protection, Communication, Sensor Control and Standard Interface components. The Security Coprocessor component has two individual modules, including a security coprocessor and a FKM module. The Communication component includes digital baseband [10] and radio which is used to provide wireless communication function. The Sensor Control component provides many kinds of digital interfaces such as SPI and I2C for accessing and controlling digital sensors. Moreover, there are several standard interfaces such as analog interfaces for other devices. All these components are controlled and accessed by CPU core through data bus and interrupt bus. The system chip is designed to interface to external sensors and other devices to form a sensor node.
The functional components proposed for security in EasiSoC are FKM module, Security Co-processor and Program Protection component. The FKM module is implemented to generate random key identifiers, extract key elements from the fingerprint of program memory and compute session keys using two key elements from two communication nodes for encryption.
The Security Co-processor includes Encryption and Decryption module, which provides data confidentiality using RC5 cryptographic algorithm with 128bits key. It supports CBC mode, EBC mode and CTR mode for encryption and decryption. Additionally, to reduce time cost in reading and writing data between baseband and security coprocessor, the security coprocessor has DMA transfer trigger capability, so its input/output registers can be accessed both by CPU and digital baseband directly. This is handled automatically by the DMA triggers generated by the security coprocessor.
Program Protection component has three individual modules, including a Program downloader module, a Program Protection module, and a ROM. The Program Downloader module is designed to download program data to program data memory responsibly. The Program Protection module is designed to support program data protection which can prevent program data being read out by un-resemble coder algorithms. The ROM is a program data memory.
Design and Implementation

Fingerprint based key management module
The fingerprint-based key management (FKM) is proposed in our previous work. We implement it in EasiSoC as hardware support. It contains a fingerprint extraction module, a sequential one-way hash operation, an arbiter and a random number generator in FKM module which is showed in Figure 2 . The ROM in it is a program data memory. Then it reads 64 bits binary code from program data memory via the flash bus which is controlled by the arbiter. This binary code selected as a key element is stored in registers of fingerprint extraction module.
Second, in program execution state, CPUs can read these key identifiers from random number generators and exchange identifiers of two communication nodes with each other by non-enciphered communication.
Next, both CPU of nodes write the received identifiers of each other to fingerprint extraction module by bus operation. The fingerprint extraction module uses this key identifier to generate corresponding address and read 64 bits binary code from program data memory as another key element. After it reads the key element of its communication node, the fingerprint extraction module sends hardware interrupt to CPU and arbiter to continue software program execution.
Then both nodes apply the sequential one-way hash operation, such as MD5, on two selected key elements which can be read from fingerprint extraction to compute a session key for encryption and decryption, which guarantees that attackers can hardly obtain the key elements just by conversely operating their identifiers and the corresponding session key. For our design with 200 key elements in its fingerprint, each node can access a 200 × 200 square grid key space, where each element corresponds to a symmetric key.
For example, element (i, j) in this grid corresponds to the key Kij which is computed by hash operation H on the i and j key elements in the fingerprint, Kij = H (Si, Sj) (1) Where X-coordinate i and Y-coordinate j are selected by the source and destination nodes of a secure link, respectively.
After the hash operation completes session key calculation, it sends a hardware interrupt to CPU, and then CPU can read session keys from FKM module and use these keys in hardware security coprocessor.
3.2RC5-FKM hardware security coprocessor
RC5-FKM coprocessor proposed contains a control module, a key expansion module, encryption module, decryption module and DMA control module as is showed in Figure 3 . State machine is used to control three computing states such as key expansion, encryption and decryption, and it can be operated by software commands. Users may select operating mode, computing states and the number of encryption rounds in bus operation. Moreover the security coprocessor supports CBC mode, EBC mode and CTR mode for encryption and decryption. So it is more flexible than common security coprocessor with fixed setup parameters. Key ram is used to store security key which is set by CPU in bus operation. CPU gets keys from FKM module and then configures this ram.
The key expansion routine expands the user's secret key K to fill the expanded key array S. According to RC5 algorithm, it consists of three simple algorithmic parts, such as converting the secret key from bytes to words, initializing the array S and mixing in the secret key. However, each algorithmic part contains same calculating elements such as adders and shift registers. Therefore, we implement a reusable optimized logic cell named ROTL, which contains a 32-bit data rotation shift register, three 32 bits adders and two 32-bit subtracters. Since only three 32 bits adders and two 32-bit subtracters are used in a calculating state at most, it is economic and efficient to design these calculating elements in ROTL. These calculating elements are reused in different encryption and decryption states so as to reduce hardware cells usage and energy consumption in these modules. Figure 4 shows the framework of ROTL. The MUX, as a multiplexer, is controlled by status signal, which is used to select adders, subtracters or data rotation shift register to complete computing tasks in calculating states. Where the S array is key expansion array, A, B registers are used to store high 32 bits and low 32 bits of ciphertext (or decipher) in every round, r is the number of cryptographic rounds, and w is the word size in bits. Plaintext and ciphertext blocks are each 2w bits long. The nominal value of w in our coprocessor is 32bits.
The cyclic rotation of word x left by y bits is denoted X<<<Y. Here y is interpreted modulo w. The inverse operation, right-rotation is denoted X>>>Y. However, X>>>Y can be instead of X<<<(w-Y) since X<<<Y is a cyclic rotation and its cycle length is w, so we utilize the data rotation of shift register in ROTL to implement this left cyclic rotation in both encryption and decryption states, so as to reduce hardware cells usage and the power consumption brought by these hardware overhead.
DMA control provides direct access interface for digital baseband. It is controlled by CPU in bus operation. User can write command registers to switch the state of DMA trigger, which switches the interface of hardware coprocessor to CPU or digital baseband. Figure 6 shows the framework of the program protection Component. In EasiSoC, the CPU core, program downloader and flash (program data memory) are integrated in a single chip and users have to program the device via the program protection module which provides a one-way download interface and a limited verification interface. This module is realised based on special designed FKM mechanism and unresemble coder algorithm such as Cyclic Redundancy Check (CRC) coder. The Program Protection module receives the program data packages from the host through the serial port and then checks data with the un-resemble coder algorithm. If there are some errors occurred, the programming procedure will be stopped, and send error information feedback to the host to resend the wrong data packages or to return error to user. On the contrary, if it is verified correctly, the program data from will be programmed into the program data memory by program protection module. By this way, it can be guaranteed that the program data is complete and correct, and make nodes work properly. At the same time, the contents of program data memory can be verified through the verification interface. Program protection module provides enough information about program verified result to user for data checking after program download. The relevant information includes CRC result of each package and final CRC result of overall. It cannot do any things without the whole code data. On the other hand, the program code data, even each data package, can be verified by comparing the verification feedback information and the CRC result based on the raw code data. In this way, only the authorized users, who have the whole program data, can update and verify the program of WSN node. The intruders have no chance to get any information about the node program through the serial port without authorized.
3.3Program protection mechanism
Figure 6 Framework of Program Protection Component
Moreover, once any adversary comprehends how to download program into sensor nodes and control them, he may program Trojans into our sensor devices. Not only can he break down the secure design and launch any attack, but also can he control our sensor nodes to output secret information stored in them by serial ports. So it is urgent to solve this problem. To resist this form of attack, we propose that the program protection module executes erasing flash every time before programming. Any users can not download any programs until the flash has been erased. So that no one can read out any important primary information in our sensor nodes. Therefore the program data can be protected by our program protection module.
Verification and Performance Evaluation
In this work, we built a general system development and test platform based on Sparten-3E FPGA produced by Xilinx company to verify the function of our SoC design and evaluate time cost and logic cells usage. This platform can be used to develop next-generation sensor devices. (4) sensor board. These components can be stacked up to build a FPGA sensor node for system design and evaluation.
We have tested the design of all the proposed components, such as security coprocessor, and program protection module, in the architecture on the FPGA platform. To further verify the design, we converted the FPGA design into an ASIC and implemented the first version prototype chip in 180nm CMOS process. Figure 8 shows the chip, its ASIC design, and its evaluation board which includes the EasiSoC chip, RF device, sensors, ADC, and the RS232 port connecting PC. The test bench is also shown at bottom right corner. Our chip in testing is shown in the red box. 
A. Program Protection Functional verification
The program data download software, named Flash Programmer V2.0, has been developed by the Microsoft VC++ 6.0 software development kit. Figure 9 is the user interface of the program data download software. It is the only way to program our chips under our downloading protocol and any others can't modify the node program without our download tool. Furthermore, the node only provides a one-way program download interface and a limited verification interface, so it cannot catch any information from the download interface and only can get some verification result information from the verification interface. Figure 9 Interface of the program data download software It can be seen on the red line that the flash is must be erased firstly before program downloading. Therefore, any attackers have no chance to get any information by downloading a little Trojan software to read ROM contents from a sensor node. Through this test, it is verified that the program protection module is working properly.
B. Security Coprocessor Functional verification
To verify that the function of our hardware security coprocessor is correct, we realize a RC5-FKM encryption algorithm by the Microsoft VC++ 6.0 software development kit on a PC, and then the result is compared with the calculation of our hardware security coprocessor. Figure 10 shows the RC5-FKM encryption results calculated by PC. It runs at EBC mode. Each block is 64 bits and is encrypted 5 rounds with 128 bits different keys. The initial input plaintext is 0x0 and the keys are set as a simulation of the keys which is built by FKM according to actual conditions of security coprocessor in EasiSoC. Figure 11 shows the output of hardware security coprocessor in EasiSoC through the RS232 port to PC. It runs at above-mentioned mode, the initial input plaintext and initial key of security coprocessor is 0x0, and the other keys are built by FKM. It can be seen that their results are same which means the function of our hardware security coprocessor and FKM module is correct. 
C. Time cost evaluation
To evaluate the performance of the hardware security coprocessor, we estimate the time cost to run RC5-FKM encryption algorithm on a general-purpose microcontroller Atemeg128 by simulating in AVR Studio. Moreover, we compare the time cost of our security coprocessor with Atemeg128 and existing AES hardware coprocessor. Their key lengths are 128 bits as our security coprocessor and clock frequencies are 6MHz.
For the quantitative analysis of the performance, the cycle of encryption and decryption is divided into 3 parts, including Key expansion time, Encryption time and Decryption time. The results are shown in Table 1 . Table 1 Comparison of encryption efficiency It can be seen from Table 1 that the implementation of one encryption and decryption calculation cycle in Atemeg128 needs 20015.5 us. For the hardware security coprocessor, it only takes 39.2us. Though the time cost of AES coprocessor during key expansion is less than our RC5-FKM coprocessor, the total cost is more than ours. Obviously the computation times are about 510 times for the Atemeg128 compared with the hardware security coprocessor. Therefore our hardware security coprocessor is more efficient than other existing AES coprocessors and general-purpose processors, especially Atemeg128. It is more suitable for wireless sensor node platform which is limited in computation.
D. logic cells usage evaluation
To evaluate the performance of ROTL in the hardware security coprocessor, we estimate the hardware resources utilizations of common design and our optimized design which is realized in develop platform based on Sparten-3E FPGA. In common design, we do not reuse of any logic cells, but in optimized design, ROTL is reused in each calculation state to reduce the hardware overhead. The comparison of hardware cells usage is shown in Table 2 . Table 2 Comparison of hardware cells usage As our design reuses of some adders and registers existing in circuit to reduce hardware cells usage, the optimized additional hardware overhead is 5863 logic-cells which is 9.6% less than common designs. Moreover, since we use ROTL to improve the combinational logic cells in security coprocessor, the maximum frequency is higher than one half of the common design. Therefore, according to the wireless sensor network nodes with low-cost and high performance features, our hardware security coprocessor is more appropriate.
Conclusion
In this paper, we present the design, implementation and simulation of an effective hardware security coprocessor and a program protection mechanism based on system on chip (SoC) technology for WSN. Compared with existing works, the unique features of our design include 3 parts. (1) One is a hardware security coprocessor which allows the user to encrypt and decrypt data using the RC5 algorithm with 128-bit key and initialization vector. It reuses of a reusable optimized logic cell (ROTL) which results in the elimination or minimization of the additional hardware overhead. (2) The other is a program protection mechanism to prevent the program data from being read out by system intruders so as to improve the security of the sensor device. (3) The secret keys of cryptographic coprocessor are built by a fingerprintbased key management module. The design is mapped on FPGA and ASIC design. Results show that the hardware overhead of our design is 9.6% less than previous designs and the execution time of our design is only 0.2% of that of general processors and shorter than other AES coprocessors.
