Modern cryptography seeks to guarantee the information confidentiality and prevent unauthorized people from having access to it. These principles may be applied in portable devices that require the protection of the information that has been stored and processed. These types of applications require certain design commitments that are achieved using high-performance hardware and the implementation of light-weight algorithms, specifically the Present algorithm. This method uses a block-based light encryption scheme which is relatively new, that has not been infringed at the present date, with features that make it appealing for an implementation on reconfigurable architectures such as FPGA. This work presents the study, design, implementation and tests of this encryption algorithm.
Introduction
The need to transmit secure data poses different scenarios and features in every type of application. In each case, the two key elements that are integrated in the design and determine reliability and robustness are hardware and software. In wireless applications with low data transfer (throughput) rates, the hardware tends to be dedicated and have low cost. These applications, which are the focus of this investigation, require the use of embedded hardware with low computational capacity (light-weight) [2] [3] [4] that are based on an embedded microprocessor development platform [5, 6] , as well as programmable logic devices such as FPGA [7] [8] [9] .
Block-based encryptors [10, 11] are one of the most widely used encryption schemes [12] , where an endless number of algorithms with different features can be found. Most of them have been implemented on programmable logic devices [9, 11, 13, 14] . One of these algorithms is the AES (Advanced Encryption Standard), which is a mandatory reference since it is the block-based encryption standard [5, 13, 15] as well as many others which are listed in numerous works that compare their characteristics and metrics [1, 11, 16] . Another mandatory reference is the elliptic curve cryptography [17, 18] due to its background and the need to give the project a solid mathematical foundation. Taking this into consideration and knowing that the implementation of such algorithms can be done with different techniques and devices, this research aims to implement a block-based encryptor that complies with the light-weight philosophy and that can be easily achieved in terms of hardware. Therefore, the PRESENT algorithm [4, 8, 9, 16, 19] is chosen to be implemented on hardware and afterwards studies will be performed to determine the main metrics.
The implemented procedures in hardware give quick solutions to applications where data traffic is higher and real-time encryption is required [14] . The programmable logic devices are an excellent alternative, since they are reconfigurable, and their concurrence makes them optimal for massive and complex processes with pretty good responses times [4, 8, 9] .
The PRESENT algorithm is specifically chosen, for being standardized and accepted by the entities that regulate cryptography at a worldwide level [20, 21] and for being a relatively new algorithm that has not been breached for now. Furthermore, there is a comparison point with another block-based encryption algorithm with a previous implementation [5] , which was published in an international congress.
The PRESENT algorithm is a symmetrical encryption method with a totally minimalist nature, that uses substitution and permutation blocks of only 4 bits as a base. This type of structure allows its implementation to be performed in devices with reduced-sized hardware which gives a sense of adequate use of resources making it easy to implement in processors with small bus. Additionally, it has passwords of relatively small width, compared to other algorithms of the same type and require a low number of rounds. From the standpoint of its design and implementation it presents an interesting challenge, since it requires an implementaPerformance evaluation of the present cryptographic algorithm over FPGA 557 tion that establishes a balance between the amount of used resources and the system's general communication speed [1, 22, 23] .
Present Algorithm
PRESENT is one the most well-known block-based lightweight encryption algorithms mainly due to its specific design that has an easy application, both in hardware and software [4, 8] . Its hardware implementation can be performed in some of the smallest FPGA in the market, with a significantly high throughput [9] . Figure 1 describes the basic structure of the PRESENT algorithm where its blocks are shown and how each of its 31 rounds is carried out [4] . [4, 8] . This substitution block is applied to the 16 nibbles that add up to 64 bits of information which is the standard size of the encryptor blocks. When the encryption algorithm is designed, the highest entropy within the substituted data is searched. In this case, the word has a size of 4 bits since it is a totally minimalist algorithm it can be implemented on a single LUT (Look Up  Table) in FPGAs.
Structure of the PRESENT algorithm

Bits permutation: pLayer
It is a mixing layer where a bitwise substitution is performed on 64-bit information block where the bit i of the round is moved to position P(i), the order of the substitution is shown in table 2. 
Password expansion function: addRoundKey
PRESENT may have passwords going from 80 to 128 bits of length. However, this design and implementation will only consider 80 bits that will be stored in a register K of such size and will be numbered K79, K78 …K0. In each round, only the 64 most significative bits of the new calculated password will be mixed after applying the password expansion function so that the new password Ki = K79, K78.... K0, is determined by the following bit rotation:
After this rotation has been performed on the input block, the following operations must be performed for each new generated subPassword Ki:
Bit rotation of the input password:
Performance evaluation of the present cryptographic algorithm over FPGA 559 Substitution using S-Box for the nibble from k78 to k76 of the password:
Addition or mix of nibbles k19 a k15 of the password with the round counter, through the addition on finite field GF(2 1 ) or XOR operation:
This function allows the generation of information blocks that are useful as subpasswords starting from the system password K. The first Nk words of this array contain the password used for encryption, since the user password is mapped to the array W while the rest of the words are generated from these first Nk words [4, 8] .
This function takes consecutive bytes from the sequence derived from the password expansion function and assigns them to each sub-password Ki, to form blocks of the same size than the state matrix. This means that it takes Nb*4 bytes for each round, here Nb is 16.
The generation of the password (password expansion) for the decryption process is performed in the same way as in the encryption process. The difference is in the password selection function. In the decryption process, blocks from the password list are taken starting from the final values up to the initial ones, which is the user's personal password. This means that the last sub-password Ki used to decrypt will be the first to be used to decrypt [4, 8] . Hence, the encryption process has to include all password generation rounds to begin from this last password Ki, performing the reverse process until the original password is reached. The round counter must be therefore decremental and perform its mixing in each round with the previously indicated nibble.
Implementation of the PRESENT algorithm
Block diagram: DataPath
Since all encryption algorithms (be it through blocks or through flow) must give all the information of their possible implementation, a possible implementation is shown ( Figure 2 ) through a DataPath that solves the algorithm. Although the architecture shows a possible hardware implementation, it is one of the many possible implementations from the light-weight perspective but does not guarantee an optimization of the use of the programmable device's hardware resources [10] . In other implementations, dedicated memory blocks of some FPGA are used to achieve the least number of gates used in the device, causing the number of clock cycles for each round to increase [9] .
Figure 2. 16-bit DataPath for PRESENT's SboxLayer and pLayer
When a round is performed, the calculation of the password must be computed again. In this case, the sub-password Ki. Figure 3 exhibits functional blocks that could be implemented both in software and hardware, since PRESENT is an algorithm designed to be implemented in any of these techniques without distinction.
Figure 3. DataPath generator of sub-passwords from the Present algorithm
Performance evaluation of the present cryptographic algorithm over FPGA 561
Hardware platforms
The selected embedded hardware platform is a Xilinx FPGA, using the ISE development tool and low-cost FPGA such as the Spartan 3AN and Spartan 3E which are in at least two different development systems used in the Universidad Distrital Francisco José de Caldas. It is very important to mention that although tests are not planned in Altera devices, a description is performed in VHDL (Hardware Description Language) so that the resulting code is totally compatible with any FPGA fabricated by any company.
Evaluation metrics
After choosing the technology, development systems and language to be used, the parameters to be verified for each one of the embedded platforms must be considered. In this case, the measurement of the implementation results of the block-based encryption algorithm PRESENT over the different FPGA-type programmable logic devices was achieved using the encryption and decryption function with password size of up to 80 bits. The performance was tested with vectors established with a 64-bit blocks and the evaluation metrics selected were the throughput and the number of occupied slices (CLB-GE).
For the encryption implementation in FPGA-type embedded hardware platform, the Top-Down methodology is applied, creating each block and using afterwards a state machine to control such blocks. A code will be obtained in the end that can be called upon by a higher-level design such as a functional block. Some of the most significative design diagrams are shown seeking an increase in the throughput but trying to reduce the use of the device in terms of slices, that are no more than the basic resource of any device programmable in the field.
Algorithm implementation
Each one of the functional blocks was designed, tested and implemented for both the encryptor and the decryptor. At this point, each block was described in low level trying to use the least possible amount of resources. Each bock must be simulated and tested before uniting them, to follow up with the methodology and guarantee the system's proper functioning. If all blocks work but the system's control unit fails, the system's function would be compromised. Figure 4 shows the state machine that will serve as system control unit handling the manipulation of all signals and, most importantly, the counting and verification that confirm that system output is the correct one. Figure 4 . State machine of the encryption control unit (Fig 3. 3)
562
Edwar Gomez, Cesar Hernández and Fredy Martinez
The system rests until a starting signal is issued then the following state must generate the load of both the password and the text to encrypt: the round counter must start at zero, and immediately after that, the system must go to an encryption state where the remaining 30 rounds are performed which mix the password and the information. This is done by placing multiplexors in one of them so that the output is given feedback by the previous state. The round counter must be totally synchronized because, if at least one clock cycle is lost or the password is mixed with an erroneous counting, the entire encryption process would be damaged. This can be explained by the fact the S-box layer and pLayer blocks expand the error of at least one bit over the entire information processed when performing permutations and combinations. In the end, a finish state is given which generates an output signal that is synchronized to issue a load order or data exit over a possible transmission or storage block of the encryption data.
After the entire code is established and considering the given test vectors, a simulation of the whole system is started. The depuration and verification process was pretty long and dispendious and requires the redesign of each block and the control unit for each step.
Performance evaluation of the present cryptographic algorithm over FPGA 563
Results
When the implementation is complete, a hardware measurement is carried out on each functional block from the description and is compared to the reference, created by the people that posed the algorithm [4] . It is determined that there is a significant improvement in terms of hardware use. One of the most important contributions is the improvement of the general system's throughput which can be compared directly with table 4 [4] . (6) This implies an increase in the performance of 1456%. The creators of the algorithm focused on the viability of the algorithm more than its performance hence the clear difference.
Conclusions
Present is an ultra-light algorithm for block-based encryption. It possesses one of the most compact encryption methods and, it is the smallest encryption scheme in comparison to the reference size. Due to these structural features, it has awoken great interest in the research of highly-efficient applications in terms of computation with low energy consumption demands. This research in particular is centered in a performance study over hardware platforms aiming at finding ideal functioning conditions for the high-performance applications. A total of four implementations were achieved over four different microcontrollers of fixed hardware, four specific architectures of the algorithms for different degrees of demand and an implementation over a FPGA in a reconfigurable hardware that showed an operation speed 1000 times greater than the one previously reported.
It is noteworthy to mention that the previous investigation of this project was not possible to find any implementation reported of this algorithm on microcontrollers, which makes it unique. In terms of the FPGA implementation, it is well-known that there are previous implementations, ours reports a 26% reduction in terms of the hardware used due to an internal study of the algorithm and the final architecture. This implementation allows standard coding, and maintains the possibility of the use of very high-level tools to describe hardware or automatic C code generators to accelerate the development time of the algorithm, without leaving the light-weight philosophy.
The reduction in hardware use by 30% in comparison to previous implementations over FPGA. The implementation of hardware using standard VHDL programming language, managed a reduction of 26,89% in terms of the use of resources over a Spartan 3AN FPGA from Xilinx. It is worth mentioning that the reference implementation uses standard VHDL and it can be replicated over any FPGA from another manufacturer, hence waiting for equivalent reductions over them.
