By encoding a message using crypto algorithms, users can make information transmitted over communication systems almost impossible to read, even if such information is intercepted for malicious purposes. It is fairly easy to implement crypto algorithms in software, but such algorithms are typically too slow for real-time applications such as storage devices, embedded systems, network routers, etc. For this reason, it becomes necessary to implement crypto algorithms in hardware.
Because our crypto processor supports various private and public key crypto algorithms, our crypto processor can be applied to VPN(Virtual Private Network) and secure web servers which use the IPSec(IP Security) and SSL(Secure Sockets Layer) protocol. The IPSec and SSL protocol require RSA(the Rivest, Shamir, and Adleman algorithm), AES(Advanced Encryption Standard), and triple-DES(Data Encryption Standard) crypto algorithms for key exchange and data encryption [12] . We can also use our crypto processor for an IMT-2000 RNC(Radio Network Controller) switch, wireless applications and various security applications because our crypto processor supports KASUMI [7] , ECC(Elliptic Curve Cryptography), and other crypto algorithms.
In our crypto processor design, we have implemented the crypto processor with an FPGA. After verifying the correct operation of the FPGA implemented crypto processor, we have implemented some of the crypto algorithms with an ASIC. The reason we did not implement all of the crypto algorithms in our ASIC chip is because of the area constraints of the selected fabrication process.
Our crypto processor has a RISC processor and coprocessor blocks dedicated to crypto algorithms. The dedicated crypto block of the crypto processor permits fast execution of encryption, decryption, and key scheduling operations for AES, KASUMI, SEED [16] , and triple-DES [13] private key crypto algorithms and ECC and RSA public key crypto algorithm. Also, the 32-bit RISC processor block can execute other crypto algorithms including hash algorithms (SHA-1, MD5, etc.) and control the dedicated crypto block and I/O buffers.
Various crypto algorithms can be programmed and executed in our crypto processor without interrupting the host node's work. If crypto-related operations were executed in the host node's CPU without the use of a special crypto processor, it would consume the processing power of the host node's processor and hence slow down the entire system. We have designed and implemented this crypto processor with the design philosophy of making certain crypto algorithms as fast as possible while providing reasonably high performance for other crypto algorithms. This paper is organized as follows. In Section II, the architecture of the crypto processor is briefly described; this includes the dedicated crypto block for AES, KASUMI, SEED, triple-DES, ECC, RSA and the 32-bit RISC processor. In Section III, the FPGA and VLSI design methodology of the crypto processor is described. In Section IV, the implementation results and performance evaluation of the crypto processor design are reported. Section V presents an example application of the crypto processor as a means for providing real time data security for a storage device. Finally, concluding remarks are presented in Section VI.
II. THE CRYPTO PROCESSOR ARCHITECTURE
A. The architecture of the crypto processor The block diagram of our crypto processor is shown in Fig.  1 . This single chip crypto processor has a PCI interface logic, dual port memory, memory controller, register file, datapath controller and dedicated crypto blocks for the AES, KASUMI, SEED, triple-DES, ECC and RSA crypto algorithms.
The 32-bit RISC type crypto controller controls the dedicated crypto block and performs the interface operations with external devices such as memory and an I/O bus interface controller. It can also execute various crypto algorithms such as MD5 and SHA-1 (hash algorithms) and other application programs such as a user authentication program and an IC card interface program.
The dedicated crypto block results in fast execution of the encryption, decryption and key scheduling operations for the AES, KASUMI, SEED, and triple-DES algorithms, and enables fast scalar multiplication and exponentiation operations for the ECC and RSA crypto algorithms. The PCI interface logic permits our crypto processor to be easily applied to practical environments (any application which has a PCI I/O bus).
B. The dedicated crypto block for the private key crypto algorithm 1) AES crypto block In September of 1997 the National Institute of Standards and Technology (NIST) issued a request for possible candidates for a new Advanced Encryption Standard(AES) to replace the DES. Then in October of 2000, NIST announced the cipher Rijndael, which is developed by Joan Daemen and Vincent Rijmen, as an AES algorithm. Rijndael is a block cipher using 128, 192, 256-bit input/output and keys. The sizes of data blocks and keys can be chosen independently. The number of rounds depends on both of these parameters and is given in [2] .
In this paper, 128bits for both I/O block and user keys are assumed. Therefore, the cipher in all configurations presented operates in 10 r N = rounds. an affine mapping over (2) GF . In the decryption process, the inverse S-box is used. The inverse S-box is constructed by first applying the inverse of the affine transformation and then computing the multiplicative inverse in 8 (2 ) GF . ShiftRows: In this transformation, the rows of the State shift cyclically to the left with different offsets [2] Fig. 2(b) [2] . Fig.3 shows the block diagram of the AES crypto block. It consists of an AES encryption/decryption core, a key generator, a register file and control logic. The data path for encryption is as follows: MUXA → AddRoundKey → RegA_128 → S_Box(SubBytes) → SR for Enc(ShiftRows) → MixColumn(skipped in the final round) → MUXEnc → MUXB →MUXA. The data path for decryption is as follows: MUXA → AddRoundKey → InvMixCol(skipped in the first round) → MUXDec → SR for Dec(InvShiftRows) → SI Box(InvSubBytes) → MUXB → MUXA. The AES crypto block is an one-round based architecture. Although other architectures which use pipelining, sub-pipelining, and loop unrolling for high performance are possible [17] , we have selected the most simple architecture because the AES algorithm can achieve high performance with this simple architecture. In other crypto blocks, we have used subpipelining and pipelining techniques to implement the KASUMI, SEED, and triple-DES crypto blocks.
We have implemented the SubBytes block (S-box) with a ROM instead of calculating multiplicative inverse and affine transform for simple and high performance. We have used a 1Kbyte (256 × 8 bits × 16/8 × 2) ROM for the S-box and SIbox used in the implementation of the AES crypto block. It requires eight repetitions of the S-box and SI-box for encryption and decryption, respectively. Key values are precomputed and saved in REG_FILE so that they can be used for encryption and decryption without recalculation of key expansion. 
X and {0 } e X can be found in [17] .
2) KASUMI crypto block KASUMI is a block cipher algorithm which has a Feistel structure and is based on the MISTY1 [18] algorithm. KASUMI is a core cipher of the confidentiality algorithm (f8) and the integrity algorithm 
This constitutes the (  ,  ,  ) .
KL KO KI For odd rounds, the i f function is as defined below: ( , ) ( 
The result of the KASUMI algorithm is equal to the 64-bit string 8 8 ( || ) L R offered at the end of the eighth round [7] .
For implementation of the KASUMI algorithm as a hardware block, we have considered two architectures, the low power version (Type 1) and high performance version(Type 2), which are applicable to ME(Mobile Equipment) and RNC(Radio Network Controller) switch in a 3GPP system, respectively. Fig.5(a) shows the block diagram of the KASUMI algorithm for low power consumption. It has a simple but efficient architecture for executing KASUMI. One round of KASUMI is repeated 8 times to complete the KASUMI operation. This architecture has simple hardware complexity and low power consumption at the cost of a sacrifice in performance.
For odd rounds, the data in Reg A is processed by the FL function and then the FO function with proper key values. The results are fed back to Reg B after an exclusive-OR (XOR) with the data in Reg B . The data path for this operation is depicted with solid lines in Fig.5(a) . For even rounds, the data in Reg B is processed by the FO function and then the FL function. The exclusive-OR of this result and the contents of Reg A are saved in Reg A .
To make the KASUMI algorithm applicable to an RNC switch, which requires high performance cryptographic operations, we have designed another architecture (Type 2) as shown in Fig.5 (b). It has a four-stage pipeline for one round of the hardware logic (the sub-pipelining technique in [17] ). This architecture achieves high performance with a low hardware complexity. The four-stage pipeline means that we can execute encryption (or decryption) operations for up to four messages simultaneously. This architecture is suitable for an RNC switch, which needs to be able to handle a lot of message traffic. Since the major delay time is due to the FO function block (which is composed of three FI blocks, with each FI block composed of two 9 S boxes and two 7 S boxes), we have divided the FO block into four parts by inserting three pipeline registers ( Fig. 5(b) . Also, one pipeline register is inserted on each FI block as shown in Fig.6 .
The number of pipeline registers and the positions of those registers are determined by calculating the critical timing delay of the KASUMI hardware and by considering tradeoffs between performance and hardware complexity.
The four-stage pipeline structure of the KASUMI algorithm results in a high operating frequency and a high throughput. Ideally, it executes 4 times faster than the Type 1 architecture. In order to avoid FL block conflicts with other data, we have inserted another FL block, 2 FL , as shown in Fig.5 (b) . For key scheduling in the KASUMI crypto block, we have developed an efficient on-the-fly key scheduling hardware block for the four-stage pipeline architecture. The scheduling block has registers to store constant values and key values, a hardwired right rotation block, XOR gates and registers for pipeline synchronization. Also, the key scheduling block is extensible to any number of pipeline stage with the addition of a few key storage and synchronization registers [5] .
In general, there are two methods to implement the S-box in hardware; the LUT (Look Up Table) ROM design method and the combinational logic design method [11] . In the S-box implementation of the KASUMI crypto block, we have selected the combinational logic method instead of the ROM- based method. By implementing the KASUMI S-box using combinational logic, we can get a short delay time.
Up to now, we have described the details about the private key crypto blocks such as AES and KASUMI. Since the hardware design techniques presented in the previous sections are also applicable to other private key crypto blocks (such as SEED and Triple-DES, etc.), we will now briefly describe the architecture of the SEED and Triple-DES crypto block in following sections.
3) SEED crypto block
The SEED algorithm [16] is a block cipher that operates on 128-bit blocks of data and uses a 128-bit key. It has a 16-round Feistel structure. permutations, rotations, and basic modulo-arithmetic operations such as modulo-2 addition (exclusive OR) and modulo-32 2 addition. As with other Feistel ciphers, the SEED algorithm has an F function, which takes a 64-bit data value and 64-bit key values. A 64-bit input block of the round function is divided into two 32-bit blocks and wrapped with 4 phases: a mixing phase of two 32-bit subkey blocks and 3 layers of G functions with additions for mixing two 32-bit blocks. The G function is composed of two layers of 8 X 8 Sboxes and permutation logic to provide good characteristics against DC (Differential Cryptanalysis) and LC (Linear Cryptanalysis) attacks. More information on the SEED algorithm can be found in [16] .
From a hardware implementation viewpoint, the SEED crypto algorithm is not an efficient algorithm. The G function and modulo-32 2 addition logic in the F function and key scheduling logic make the SEED slow. Furthermore, from a security viewpoint, the last layer of the G function in the F function is a redundant component because it does not provide any security strength (for providing security strength, the last layer of the G function should have an XOR operation with key values like the previous two G functions).
To implement the SEED crypto block, we have instantiated one stage (round) and divided it into 5 stages of pipeline logic. That is, we have implemented the SEED crypto block with the sub-pipelining technique [17] . We have inserted two 64bit-sized pipeline registers into the F function and two 64bit-sized pipeline registers into the key generation logic as shown in Fig.  8 . After scrutinizing the critical path of the SEED crypto block, we have decided on the pipeline insertion location. The complexity of the G function (which consists of four 8X8 Sboxes and some combinational logic) and the long delay time of modulo-32 2 addition results in high latency of the SEED crypto block.
In the SEED crypto block, the key values are generated simultaneously with the encryption or decryption process. The reasons for using on-the-fly key scheduling instead of the precomputation method are as follows. First, the SEED crypto algorithm has a good key scheduling structure for on-the-fly key scheduling. Second, if we adapt the pre-computation method to the SEED crypto block, it requires a large amount of key storage capacity because of its 5-stage pipeline.
4) Triple-DES crypto block
DES(Data Encryption Standard) [13] is a block cipher which uses a 64-bit key and operates on 64-bit blocks of data. Because every 8 th bit of the 64-bit key is used for parity checking, DES has a 56-bit key. In the DES algorithm, there are 16 rounds of identical operations such as non-linear substitutions and permutations. In each round, 48-bit subkeys are generated, and substitutions using S-box, bitwise shift, and XOR operations are performed.
The 56-bit key length is relatively small by today's standards. For increased security, the DES operation can be performed three consecutive times, which expands the effective key length to 112 bits. Using DES in this manner is referred to as triple-DES [15] . In this section, we only describe the DES crypto block because the expansion to triple-DES is trivial. ( 1
, and the 32-bit left halves of the data are processed in the following manner:
As shown in Fig. 9 , the F function of the DES algorithm is composed of an expansion permutation table (block E ), modulo-2 addition with the i th − sub-key( i K ) or round key, substitution with the S-box, and permutation with the P table (block P ).
In our DES crypto block implementation, we have implemented the DES crypto block with a 4-stage pipeline as shown in Fig. 10 . This architecture has the advantage that 4 data-key pairs can be processed simultaneously. Thus, it has high performance at the expense of an increase in the hardware overhead. The reasons for selecting the pipelining technique instead of sub-pipelining or one-round repetition are as follows. First, because the DES crypto block should be used as a triple-DES (which requires 3 rounds of DES) for security reasons, the use of the sub-pipelining technique for the DES architecture would result in many clock cycles to complete the triple-DES operation. Second, the triple-DES crypto block can be implemented with low hardware complexity because of the hardware simplicity of the DES crypto block.
C. The dedicated crypto block for the public key crypto algorithm 1) ECC crypto block
The ECC crypto system was proposed independently in 1985 by Vitor Miller and Neal Koblitz. The ECC crypto system is based on the difficulty of solving the discrete logarithm problem. In general, ECC has advantages over RSA in that ECC has higher security per key bit, higher speed, lower power consumption, and better storage efficiencies than RSA. For these reasons, ECC is particularly beneficial in applications with bandwidth, processor capacity, power availability, or storage constraints such as IC cards, mobile devices, etc. [4] , [10] .
The elliptic curve used in our crypto processor is defined by Weierstrass equations as The hierarchy of ECC arithmetic operations is shown in Fig.  11 . ECC application protocols such as ECDH and ECDSA are performed using scalar multiplication at the highest level of the ECC crypto system. The scalar multiplication is done by repeated curve operations such as point doubling and point addition with proper algorithm such as binary method [12] .
The most basic operation in ECC is the field operation.
The addition and subtraction operation at the field level are the simple XOR (modulo-2 addition) operation, and the squaring operation is also trivial (simple cyclic shift operation) as shown below. If an element A is represented as c , we can easily compute the k c value using simple cyclic shifts of the vector representations of A and B [9] . Also, for the inversion operation, we have used a recursive inversion algorithm based on Fermat's theorem, which is shown in [14] . In this section, we only describe the core of the inversion algorithm as follows.
In Fermat's theorem, r is odd, we require an additional squaring and multiplication operation. More detailed algorithms can be found in [14] . The calculation of point addition and doubling requires the inversion, multiplication, squaring, and addition operations over the field (2 ) m GF as shown in Fig.12 .
Since our ECC crypto block is implemented at the level of scalar multiplication, we can use our ECC crypto block directly for ECC applications such as ECDH and ECDSA. To perform scalar multiplication, we have used the binary method [4] .
2) RSA crypto block
The RSA public key crypto system was invented in 1978 by Rivest et al. [19] , and it is now the most widely deployed public key crypto system. It is used for securing web traffic and e-mail in the SSL protocol. The security of the RSA cryptosystem depends on the difficulty of factoring a large integer, the published modulus value.
In this section, we describe of the core algorithm and its implementation in our crypto processor architecture (the details of the RSA algorithm can be found in [12] , [15] ). The core arithmetic operation in RSA is exponentiation, which is accomplished by a series of modular multiplications. Therefore, fast modular multiplication is key to achieving fast execution of RSA cryptosystems.
Contrary to a classical modular multiplication algorithm, the Montgomery multiplication algorithm does not need the division operation [1] . This method is based on an ingenious representation of the residue class modulo m , and it replaces division by m operations with division by a power of 2 (if we use 2 n R = ). The single-precision Montgomery multiplication algorithm is shown in Fig. 13 . Step 5 is a k multiple
which is equivalent to Step 2.
Thus, the Montgomery multiplication algorithm can be solved without a complex division operation if we select 2 .
n R = Fig. 14 shows the multiple-precision Montgomery multiplication algorithm. Since we have implemented 1024-bit RSA in our crypto processor, the value of n is 1024 and , ,and x y m are represented by multiple-precision. This algorithm is adequate for hardware implementation because it is composed of simple operations such as an n bit by one bit multiplication operation ( i x y and i u m ), shift operation (division by 2), and addition. Fig. 15 shows the hardware block diagram for the multiple-precision Montgomery multiplication algorithm. It is composed of input registers (for , , ,
x y m m′ ), a i u calculation block, a Montgomery multiplication core block, and control logic. The i x y and i u m multiplication operations are equivalently implemented with multiplexers (MUX1 and MUX2) and division by 2 is implemented with a shifter. We have used the CSA (Carry Save Adder) for addition operations. 
D. The 32-bit RISC processor block
In our crypto processor, we have used the ARM7TM processor, the 32-bit RISC type processor block with a threestage pipeline [3] . It controls the operation of the dedicated crypto block during encryption, decryption, and key scheduling, and also performs the operations required to interface with external devices such as the input buffer, output buffer, memory, and IC card interface logic. Since the RISC processor block is fully programmable, it can execute various crypto algorithms, protocols and application programs with a high degree of freedom. The programmability of the crypto processor makes the crypto processor applicable to embedded systems which require standalone programmability.
The 32-bit RISC processor block has features (such as a barrel shifter, a Booth multiplier block, register file, and a 16-bit and 32-bit data memory architecture) that enable it to achieve high performance and savings in memory when executing crypto algorithms. The 32-bit barrel shifter implements a shift/rotate of its input data by any amount to produce an output within a fixed time period. It has associated logic to allow values to be arithmetic shifted or rotated through the carry bit. The barrel shifter boosts the performance of crypto algorithms, such as most symmetric key crypto algorithms, which require multiplebit shift operations. The Booth multiplier block assists in the implementation of the multiply and multiply-and-add instructions. This instruction is useful for implementing hash algorithms.
III. FPGA AND VLSI DESIGN METHODOLOGY
Our crypto processor was modeled using VHDL (VHSIC Hardware Description Language) language and then implemented as an ASIC chip after verification with an FPGA implementation. Modeling the processor using VHDL facilitates quick prototyping and modification of the target design while considering various possible trade-offs in different implementations of the crypto algorithms with differing speed and area characteristics.
After verifying the functionality and performance of the crypto processor, we implemented the crypto processor with an ASIC. The target process technology is 0.5 m µ CMOS technology.
IV. PERFORMANCE EVALUATION OF THE CRYPTO PROCESSOR

A. FPGA Implementation
In previous sections, we have described the architecture and design methodology for our crypto processor. In this section, we present and analyze the implementation results of our crypto processor. Table I and II show the architectural characteristics of the crypto blocks in our crypto processor. The AES crypto block was implemented with a one-round based architecture and its S-boxes were implemented with FPGA's internal memory. Also, we have selected the subpipelining technique for the KASUMI and SEED crypto blocks, and the pipelining technique for the Triple-DES crypto block. The S-box of the KASUMI crypto block was implemented with combinational logic due to the constraints of the target FPGA's memory size. The reason we have used only the pipelining technique for Triple-DES is due to the fact that Triple-DES requires too many rounds(16 × 3). If we implemented Triple-DES with a sub-pipelining technique for high performance, we would have had to wait much longer to get the result. Table II shows the architectural characteristics of the 146-bit ECC and 1024-bit RSA crypto blocks. We can get 2048-bit RSA operations when we applying the Chinese Remainder Theorem to the 1024-bit RSA crypto block. The scalar multiplication level designed ECC crypto block and the exponentiation level designed RSA crypto block provide the user with easy interface to their applications such as ECDH, ECDSA, and RSA encryption/decryption. The ECC crypto block has field operation units (addition, multiplication, squaring, and inversion) and curve operation units (point doubling and addition) as its internal blocks. Tables III and IV show features of our crypto blocks when implemented using an FPGA. Although the AES crypto block was implemented with an one-round based architecture, it achieves 390Mbps, which is superior to SEED and Triple-DES crypto blocks, which were implemented with a sub-pipelining and pipelining technique, respectively. The logic size of the AES crypto block is rather high because it's S-box and SIboxes are implemented in the FPGA's internal memory (only two S-boxes and two SI-boxes are implemented). It is also possible to reduce the logic size if we sacrifice performance.
The KASUMI crypto block(Type 2, the high performance version) achieves the highest performance as shown in Table  III . This is because KASUMI has a highly parallelizable encryption/decryption body and key scheduling block as described in the preceding section. When we compare the characteristics between KASUMI and SEED, which are implemented with a similar architecture (sub-pipelining), we can easily see KASUMI is superior to SEED in operating frequency, hardware complexity, and performance. Since the confidentiality algorithm f8 and integrity algorithm f9 in a 3GPP system are simple applications based on the KASUMI crypto algorithm [7] , we can easily implemented f8/f9 algorithms with low overhead.
The SEED crypto block achieves 358Mbps in spite of its 5 stage sub-pipelines. This is due to of its architectural inefficiency; the G function and the modulo-32 2 adder, which are the core of the SEED crypto algorithm, produce a long delay time. The SEED crypto block has high hardware complexity in crypto blocks when compared to other crypto blocks with a similar architecture such as KASUMI and Triple-DES. The Triple-DES crypto block shows the worst performance in our crypto blocks. This is because Triple-DES requires 48 round operations (three repetitions of DES) to complete its encryption/decryption operation. Table IV shows the characteristics of the ECC and RSA crypto blocks implemented in the FPGA chip. The public key crypto blocks operate at about the range of 28 to 50MHz. The execution time of the ECC crypto block is 7.28msec, which corresponds to a throughput of about 20Kbps. The execution time of the ECC crypto block consists of the time for computing scalar multiplication, kP , where P is defined in 146 (2 ) GF and k is a random 146-bit value. The execution time of the RSA crypto block is 6.69msec for encryption and 58.9msec for decryption, which correspond to 153Kbps and 17Kbps throughput respectively. The critical point in the performance of the ECC crypto block is the efficiency of the multiplicative inversion algorithm and the scalar multiplication algorithm. Also, the critical path in our RSA crypto block is due to the 1024-bit addition block. Since there are other algorithms that achieve better performance than those used in our crypto blocks [1] , [4] , we will devise more efficient algorithms in implementing the next version of our crypto processor.
B. ASIC Implementation
After verification of our design with an FPGA implementation, we have laid out and fabricated the crypto processor using 0.5 m µ CMOS technology 1 . Fig.16 shows a photograph of the crypto processor, and Table V summarizes the main features of the crypto processor. Note that a photograph of the layout is not presented as the circuit was synthesized using a standard cell library (the layout picture is not very meaningful as it simply consists of a few rectangular boxes). Also, in the ASIC version of our crypto processor, we have implemented the S-boxes of crypto blocks with combinational logic (and not by ROMs) because the target process (0.5 m µ CMOS process) does not support embedded ROMs.
V. A CRYPTO PROCESSOR APPLICATION: REAL-TIME DATA SECURITY FOR A STORAGE DEVICE
To evaluate the usability of the crypto processor, we have developed an RTDS (Real Time Data Security) system for storage devices. The RTDS system is composed of control and monitoring software with a GUI(Graphical User Interface) environment, a device driver, and an RTDS board. Fig.17 shows the block diagram of the RTDS system, and Fig.18 shows a photograph of the RTDS board with the crypto processor.
In order to make a section of the hard disk area secure, a user can configure a specific directory to be a secure directory by using the control and monitoring software. Data which is written (read) to (from) the secure hard disk area is automatically encrypted (decrypted) by the crypto processor in real-time.
1. A user process wants to write data into the secure area of a hard disk(a). 2. The CPU reads data from a certain area of the memory and sends it to the hard disk via the I/O bus(b). 3. The device driver, which is a part of a RTDS system, catches the hard disk write event, and forwards data to the crypto processor(c). 4. In the crypto processor, an encryption task is performed in real-time(d). 5. The crypto processor, which has completed its encryption task, sends the encrypted data to the hard disk(e). 6. The hard disk receives the encrypted data and completes the write procedure(f).
The RTDS board, shown in Fig.18 , is mainly composed of a PCI interface controller, an SRAM buffer, a FPGA chip and an ASIC chip. The performance of the crypto processor and the PCI interface controller is high -267 568 Mbps and 1056 Mbps, respectively -and the average access time of the hard disk that is used for our test environment, is low -12 ms in our system. Therefore, the RTDS system operates in real-time.
VI. CONCLUSIONS AND FUTURE WORKS
In this paper, we have presented the design and implementation of a crypto processor composed of a 32-bit RISC processor and coprocessor blocks dedicated to the AES, KASUMI, SEED, triple-DES, ECC and RSA crypto algorithms. The dedicated block of the crypto processor accelerates private and public key crypto algorithms and the programmability of the crypto controller makes possible fast execution of various security applications (such as SHA-1 and protocol processing etc.). Some parts of the crypto processor were also implemented as an ASIC chip using 0.5 m µ CMOS technology after verification with an FPGA implementation. Simulations, formal verification, and static timing analysis were used to fully verify the ASIC design before fabrication.
The crypto processor was evaluated by constructing an RTDS (Real-Time Data Security) system for storage devices.
This application board was used to thoroughly test and verify the functionality of the crypto processor. The crypto processor in the RTDS system performs data encryption and decryption in real-time. The high performance and high flexibility of the crypto processor design makes it applicable to various security applications such as storage devices, embedded systems, network routers, security gateways for IPSec and SSL protocol processing, etc.
For future work, we plan to develop additional high performance public key crypto blocks. Also, to enhance the security of our crypto processor, we will devise side channel attack resistant techniques in the private and public key crypto blocks.
