Abstract-This paper proposes an effective FPGA implementation for data storage encryption. Throughput, area and power consumption are the most important parameters to evaluate any FPGA implemented design. Our proposed design not only achieves a high throughput but also a considerable amount of throughput/area that is better compared to current implementations. The proposed implementation gives a memory based pipelined architecture. The design achieves a throughput of 5.25 Gb/sec that is relatively better than any other FPGA implementation to date. Xilinx ISE 10.1 is used as a design tool and Verilog HDL is used to code the design.
INTRODUCTION
Hardware and software are two approaches that provide a way to implement cryptographic algorithms. Software based encryption is considered to be economically feasible but consumes more power, relatively slow [1] and not totally secure [2] but it is scalable, flexible and works across different hard disk platforms and types, which helps in easy management [1] . Hardware based encryption is further classified into two groups that are controller based encryption [3] and internal hard disk (or Discryption) encryption [1, 2] . The main drawback of controller based encryption is that encryption outside the storage device makes an easy accessibility of the ciphertext for man-in-the-middle. On the other hand discryption has many advantages over controller and software based Encryption. It costs less to implement than external encryption modules and consumes less power than software encryption because of dedicated, optimized hardware and better security [2] .
Hardware implementations are intended to be realized using two major implementation approaches: Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). FPGAs are considered to be extremely effective in terms of time-to-market and overall cost in comparison to an ASIC design. Parallelization in block ciphers can be achieved by different methods such as multiprocessing, pipelining etc. An advantage with pipelining is that it results in higher throughput but at the cost of increased area. Additionally, modern FPGAs contain embedded higher-level components, such as memory blocks, multipliers, multipliersaccumulators, and even microprocessor cores which helps not only in achieving a high throughput but also a smaller area footprint [4] . Our work mainly focuses on achieving acceptable throughput for encryption of secondary storage devices. Several methods can be employed and the new methods that we have adopted is a feature of FPGA that provides Digital Clock Manager (DCM) technique that divides the input clock into two [5] . Our implementation focuses on using two DCMs that have been cascaded to achieve our memory based pipelined implementation. Further our design generates on-thefly keys which are pre-computed thus minimizing overall latency of the design.
The rest of the paper is organized as follows. Section II describes AES-XTS algorithm. In Section III, our proposed architecture has been discussed. In Section IV, results are being discussed and finally in Section V, conclusion is given.
II. AES-XTS ALGORITHM
NIST has currently approved nine (9) modes of operations. XTS is one of the modes that have been designed for blockoriented storage devices and approved recently [6] . XTS-AES provides confidentiality of data for block-oriented storage devices compromising the authentication. Also this mode works within the constraints of hard disks while keeping the security that the AES algorithm provides [7] .
AES-XTS is a tweakable block cipher that uses 128 or its multiple for data encryption and its uses Advanced Encryption Algorithm (AES) block cipher as a subroutine. The XTS-AES addresses threats of ciphertext manipulation and copy-andpaste attacks [7] , while allowing parallelization and pipelining in cipher Implementations. This mode by the April 2009 has been supported by 14 vendors [8] . Fig. 1 shows the block diagram of XTS-AES [9] .
The XTS-AES encryption procedure for a single 128-bit blok is modeled as
Where Key is the 256 or 512 bit XTS-AES key P is a block of 128 bits (i.e., the plaintext) i is the value of the 128-bit tweak j is the sequential number of the 128-bit block inside the data 
A. AES Encryption process
AES Encryption process includes the following steps described in [10] .
Key Expansion: In this step the master key is expanded into a total of 11 sub-keys of 16 words of 8 bits. The first key is called the initial key and the remaining keys are called round keys.
AddRound key: It is the simple step which performs an Exclusive-OR operation between state matrix and subkey during the each round.
Byte Substitution: It is the nonlinear transformation that operates on each byte independently using the Sbox table.
ShiftRow: In this step there is no shifting on first row. While second row is shifted cyclically by one byte, third row is shifted by 2 and fourth row is shifted by 3 bytes.
MixColumns: This transformation works on the input matrix column by column that is of 32 bits each.
III. AES-XTS MEMORY BASED PIPELINED ARCHITECTURE
One of the important requirements of discryption is that data must be encrypted in real time before writing to secondary storage [1] . This control mechanism is being provided by a state machine. Both Key I and Key II are also provided in the form of 16 bits that can be dynamically changed at any time. If the key remains unchanged for the data unit of a particular disk sector then the key would only be loaded once. Tweak value encryption is started as soon as Key II is loaded.
Since the tweak value has to be encrypted using AES, the implementation of the algorithm is based on pipelined approach. Tweak value is later used for two purposes; it is used to exclusive OR with plain text and it is also being used for modular multiplication with j α . In a sense, we have saved one cycle since during the exclusive OR operation, the modular multiplication of tweak value has been calculated. Each data units within a sector are assigned a tweak value during encryption process which is the fundamental requirement for disk encryption devices. Figure 2 shows our memory based pipelined architecture diagram.
Once a tweak value is encrypted then the successive plaintext block within sector are encrypted in a pipelined fashion. So there is initial latency of tweak value encryption and data loading cycles. Once the pipeline is full successive blocks are encrypted one after another. For this implementation we have used registered input and outputs for data. In this paper we have proposed an optimal solution for cryptographic protection for storage devices. The memory based pipelined implementation not only achieves a considerable throughput but also a good efficiency. Our design achieves a throughput of 5.25 Gb/sec by only consuming 573 slices and 9 Block RAMS. Our proposed work has outperformed previous works till to date.
