Implementation in FF'GA of the new Advanced Encryption Standard, Rijndael, was developed and experimentally tested using the Insight Development Kit board, based on Xilinx Virtex I1 XC2VlOOO-4 device. The experimental clock frequency was equal to 75 MHz and translates to the throughputs of 739 Mbit/s for Rijndael with block size and key size of 128 bits, respectively. This circuit has capability to handle encryption/decryption and fitted in one FPGA taking approximately 84 % of the area. Our work supplements and extends other research efforts 111 [2].
INTRODUCTION
In 1997, The National Institute of Standards and Technology (NIST) initiated an effort towards developing a new encryption standard, called Advanced Encryption Standard (AES). The development of the new standard was organized in the form of a contest coordinated by NIST. In October 2000, Rijndael[3] was announced as the winner of the contest and a future Advanced Encryption Standard. Rijndael proved to be one of fastest and most efficient algorithms. It is also easily implemented on a wide range of platforms and is extendable to other key and block lengths. This paper evaluates the Rijndael cipher implementation from the viewpoint of its hardware mapping into high performance Xilinx FPGA. An FPGA implementation can be easily upgrated to incorporate any protocol changes without the need for expensive and time consuming physical design, fabrication, and testing required in case of ASICs. Our paper is organized as follows. A brief overview of Rijndael cipher algorithm and its basic building blocks is given in Section 2. Section 3 outlines the design of the pipelined Rijndael implementation. Performance results and the test setup are given in Section 4. Finally, in Section 5, possible future work is described and concluding remarks are made. Substitution is composed of sixteen identical S-boxes working in parallel. InvSubstitution is composed of the same number of inverse S-boxes. Each of these S-boxes can be implemented independently using a 256x8 bit lookup table.
ShiftRovr and InvShiftRow change the order of bytes within a 16 byte (128 bit) word. Both transformations involve only changing the order of signals, and therefore they can be implemented using routing only, and do not require any logic resources, such as Configurable Logic Blocks (CLBs) or dedicated RAM.
The MixColumn transformation as well as
InvMizColumn can be expressed as a matrix multiplication in the Galois Field GF(2*). The InvMixColumn transformation has a longer critical path compared to the
1-507
Figure 1: Architecture of the circuit.
MixColumn transformation, and therefore the entire decryption is more time consuming than encryption.
KeyAddition is a bitwise XOR of two 128 bit words.
ARCHITECTURE OF THE ENCRYPTIONIDECRYPTION
The organization of the hardware implementation of the circuit is shown in Figure 1 . The organization includes the following units:
-EncryptionFSM and DenyptimFSM (Encryption and Decryption Finite State Machines), used to encipher and decipher input blocks of data.
-KeyScheduleFSM (Key Scheduling Finite State Machine), used to compute a set of internal cipher keys based on a single external key.
-RAMSubKeys (RAM memory) of internal keys, used to store internal keys computed by the KeyScheduleFSM, or load the initial key to the FPGA through the Key Entry
Interface.
-InputFSM (Input Interface Finite State Machine), used to load blocks of input data and to store input blocks awaiting encryptioddecryption.
-KeyEntryFSM (Key Entry Interface Finite State Machine), used to load the external key.
-OutputFSM (Output Interface Finite State Machine), used to temporarily store output from the encryptioddecryption unit.
-MainFSM (Main Control Finite State Machine), used to generate control signals for all other units.
The registers R1 through R3 allows the circuit to process data in parallel with an encryption or decryption.
Both the input and output channels are 16 bits wide.
Therefore, in order to read in the whole cipher or key, It is important to store the initial key since is used in the decryption process. Figure 2 and Figure 3 
The implementation of the encryption and decryption Finite State Machines is shown in

1-508
TESTSETUP
The Rijndael cipher was first described in VERILOG, and its description verified using the VERILOG-XL simulator from Cadence Design Systems. Test vectors from the reference software implementations [4] were used for debugging and verification of VERILOG codes. The revised VER-ILOG code became an input to Xilinx ISE Series 4.li software [5] performing the logic synthesis, mapping, placing, and routing. These tools generated reports describing the area and speed of implementation, a netlist used for timing simulations, and a bitstream to be used to program the FPGA device Virtex I1 XC2V1000-4. ules EncryptionFSM or DecryptionFSM for an encryption or decryption.
CONCLUSIONS
In this paper we have evaluated the Rijndael cipher from the point of view of its implementation in P G A . The new architecture presented allows the implementation of the Rijndael cipher with high speed encryption and decryption. The experimental procedure demonstrated that the total encryption and decryption throughput of 739 Mbit/s can he achieved using a single FPGA device. Only up to 84 % of resources of this single FPGA are required by all cryptographic modules. Future development will include integration of the modes of operation CBC, CFB, and OCB which are considered secure for transmission of large volumes of data.
ACKNOWLEDGMENTS
This work was supported by G-Plus, Inc., which is a rapidly growing fabless semiconductor provider offering optimized RF solutions to enable broadband wireless networking and fixed wireless access system applications. 
1-509
1-510
