Design & Characterization of SHA 3- 256 Bit IP Core  by James, Jeethu et al.
 Procedia Technology  24 ( 2016 )  918 – 924 
Available online at www.sciencedirect.com
ScienceDirect
2212-0173 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of ICETEST – 2015
doi: 10.1016/j.protcy.2016.05.184 
 International Conference on Emerging Trends in Engineering, Science and Technology  
(ICETEST - 2015) 
Design & Characterization of SHA 3- 256 bit IP core 
Jeethu Jamesa *, Karthika Rb, Nandakumar Rc, 
aMar Athanasius College of Engineering and Technology, Kothamangalam,India 
b,cNIELIT, Calicut, India 
Abstract 
In the era of internet and computer networking the need for security have increased rapidly. Various crypto algorithms 
are used for secured data transmission and reception through the network, of which hash function possess a key role in various 
cryptographic protocols. Keccak algorithm is the winner of SHA-3 competition conducted by NIST. SHA-3 consists of different 
variant such as 224, 256, 384 and 512 bit. This paper discusses the design and implementation of SHA-3 256- bit core. The core is designed 
using Verilog HDL and prototyped using Xilinx® Virtex®-6FPGA.  
 
© 2016 The Authors. Published by Elsevier Ltd. 
Peer-review under responsibility of the organizing committee of ICETEST – 2015. 
Keywords: SHA 3; IP Core; Hashing; Cryptography; Security engineering,; Keccak. 
1. Introduction 
              Cryptography is the field concerned with linguistic and mathematical technique for securing information, 
particularly in communication and authentication applications like digital signature and MACs. Cryptographic Hash 
functions[2] have a major role in crypto suites that are cryptographic options supported by the client. Hash functions 
are found in the applications that include random number generation algorithms, verification of data integrity, 
password protection and secured socket layers. The utility is to provide a fixed length message digest for long 
messages. Keccak, the winner of NIST hash function competition for SHA-3[1], is the subsection of this one-way 
hash primitive. Hash sizes of 224, 256, 384 and 512 bits are available of which this paper deals with the FPGA 
implementation of 256-bit variant. SHA-3 possesses an architecture completely dissimilar from SHA-2 hence it is 
not meant to replace other secured hash algorithms. The algorithm claims thick safety margin, and provides security 
 
 
* Corresponding author. Tel.: 9995600611;. 
E-mail address: jeethuthekkanath@gmail.com 
© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of ICETEST – 2015
919 Jeethu James et al. /  Procedia Technology  24 ( 2016 )  918 – 924 
in each round. Keccak is built  upon sponge construction[3] unlike  Merkle-Damgrad (MD) construction[4] scheme 
used in SHA, SHA-2, MD-5. At the security point of view generally crypto-hashes must obey few properties viz, 
x The computation of hashes for any messages should be easy. 
x  It is computationally infeasible to retrieve the message from the given hash value. 
x  It is impossible to modify a message without changing its hash. 
These properties could be summarized as pre-image resistance, second pre- image resistance and collision resistance 
are essential to withstand the obvious cryptanalytic attacks. SHA-3 finds its application mainly in efficient message 
authentication coding and the sponge construction in pseudo random number generation (PRNG) and authenticated 
encryption (AE). Along with other crypto suites hash algorithm can be used in protection of passwords, digital 
signature, file or data identifier since the hash function enable fast look up of data in hash tables. 
2. SHA-3 (256-bit) algorithm 
  SHA-3 uses sponge construction scheme which includes mainly two steps viz. Absorption and 
Squeezing[5], based on a fixed-length permutation (or transformation) and on a padding rule, which builds a 
function, mapping variable-length input to fixed-length output. The input is fed as an element of binary string of any 
length and returns a binary string with any requested length with n, a user-supplied value. It operates on a finite state 
by iteratively applying the inner permutation to it, interleaved with the entry of input or the retrieval of output. 
Sponge construction[6] is illustrated below in Fig. 1.  
 
 
Fig. 1 Sponge construction 
 
The paper consider the initial state as a 5 x 5 x w, where  w { 2; 4; 8; 16; 32; 64} array and the computation is done 
as described below:  
2.1. Absorption phase  
Initially the states is filled with zero’s and first r-bit message block is XOR-ed with the first r-bit of the 
state and array is updated with the new values[7]. This process is continued until all the inputs have been absorbed. 
Meanwhile the message bits goes through block permutations.  
2.2. Squeezing phase 
  The output hash value is truncated from the first r-bit and further transformations are done if the required 
output bit is not obtained. Input bit string is initially padded with 10*1 mutirate padding rule and split into r-bit 
block (r : bitrate) and c-bit (c : capacity). Then the bits are XOR-ed with initial states of the array and passed to the 
920   Jeethu James et al. /  Procedia Technology  24 ( 2016 )  918 – 924 
block permutation function -f were the 5 step permutations are done, after all the input bits are processed absorption 
phase gets completed and it switches to the squeezing phase were the first r-bit are truncated to form the output. Fig. 
2, show the block diagram of top level module.                                                                      
                                          
                                                                         Fig. 2 Block diagram 
2.3. 10*1 padding scheme 
The message input is less than r bits, zeros are padded to make it equal to r bits. If message input is greater 
than r bits, it is equally divided into blocks of r bits and zeros are padded where required. The padding take place in 
such a way that, padded with 0x01 and the output is XOR-ed with the value 0x80 as per the algorithm as follow. 
 
P=M||0x01||0x00||……||0x00 
          P= P ^ 0x00||…..||0x00||0x80 
2.4. f- permutation 
The block permutation consist of 24 rounds of 5 mapping viz. θ, ρ, χ, Π, ι operating on 1600 states bits .The 
conventions used to understand the state mapping are[6] as follows: 
 
x Row: a set of 5 bits with constant y and z Coordinates, i.e. a [*][y][z]. 
x Column: a set of 5 bits with constant x and z Coordinates, i.e. a [x][*][z]. 
x Lane:   a set of 64 bits with constant x and y Coordinates, i.e. a [y][y][*]. 
x Slice  :a set of 25 bits with constant z  Coordinate, i.e. a [*][*][z]. 
  
  The additions and multiplications between the terms are in GF (2). f-Permutations occur only on lanes and 
not in separated bits. The sequence of the procedures is arbitrarily except from θ expression which should take place 
at the beginning of each round. After twenty four rounds the output of last expression is simply truncated to give the 
desired hash function output. The inversion procedure from padding is done in order to retrieve required hash 
output. 
 
Mapping between the states is that 64(5y+x)+z-bit of s is a[x][y][z] for any xϵ Z5,y ϵ Z5,z ϵ Z64.The block 
permutations [5][8]can be explained as follows: 
 
x θ (theta) : α[x][y][z] = α[x][y][z] + α[x-1][y'][z] +α[x+1][y'][z-1]   , for y'=0:4 
x ρ(rho):α[x][y][z] = α[x][y][z- (t+1)(t+2)/2] with t satisfying 0≤  t ≤24 and (0 1 : 2 3)      t (1: 0) =(x : y) or  
t = -1 if x = y = 0 with ( : ) denoted as 2x2  array or 2x1 array. 
x Π (pi) : α[x][y] = : α[x'][y'], with (x : y) = (0 1 : 2 3) (x' : y'). 
x χ (chi) : α[x] = α[x] + (α[x+1] + 1) α[x+2] 
x ι (iota): α= α + RC[ir],RC corresponds to round constant. 
921 Jeethu James et al. /  Procedia Technology  24 ( 2016 )  918 – 924 
3. SHA-3 proposed methodology 
3.1. Core architecture 
As illustrated in the Fig.3,  θ component absorbs the input bits and XOR-ed between the lanes at each 
column[8]. Five columns are left-rotated for one bit and XOR-ed with the previous XOR operations values. The data 
from the last XOR operations are fed to a final XOR stage with the component θ input lanes. ρ component rotate left 
each lane, for each lane rotation are different. The number of rotations per lane resulting from the remainder of the 
division between some fixed values and the length of the lanes. Lane α [0][0] is  at the centre of the 2D array since 
x, y parameters takes the values with the sequence (3,4,0,1 and 2). The same mapping is used in Π component. 
Operations (NOT, AND and XOR) between the lanes are used at χ component. These operations applied for  entire 
rows of lanes for each row. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                                                                                                  Fig. 3 Core architecture 
 
 
3.2   Proposed core I/O diagram 
                                                    
                                             
                                                                                                    Fig. 4 IP core diagram 
922   Jeethu James et al. /  Procedia Technology  24 ( 2016 )  918 – 924 
Input ports are sampled by the core at the rising edge of the clock. All output ports are registered, not 
directly connected to any combinational logic inside the core. For ports wider than 1 bit, its bit index is from the port 
width minus one down to zero. 
3.3 Pin description table 
The different pins used in the proposed IP Core and its detailed description are given in Table 1. below: 
Table 1. Pin table. 
Port Name  Width   Direction  Description 
in  32 In Input message 
is_last 1 In Current input is ready or not 
in_ready 1 In Core is ready to absorb the  input 
byte_num 2 In Select the byte_num 
clk 1 In Clock 
reset 1 In To set the hardware 
out 256 Out The hash output 
out_ready 1 Out The hash result is ready or not 
buffer_full 1 Out Buffer is full or not 
 
Input of our design is a block of 32-bit. in_ready, is_last, byte_num, clk, reset are other input pins. Out is the 256-bit 
output hash, buffer full, out ready are the other output pins respectively. 
3.4 Function table 
The table 2 describe some of the inputs  used during the implementation and its corresponding output. 
Table 2. Function table. 
Input ASCII  Hexadecimal hash output 
 
The quick brown fox jumps over the lazy dog
  
 
4d741b6f1eb29cb2a9b9911c82f56fa8d73b04 
959d3d9d222895df6c0b28aa15 
All that glitters is not gold. 2620d8dd1aaed9aad60e54df4bbad1b55aaf59 
7cbcd912e9dee3e67c4d88bf32 
Better to be safe than sorry. a72ecbc58193fe9020371d687e68bd5edae74 
c53fb1e9f147d97532c2bb0f950 
A friend in need is a friend indeed. 457573c7b05a037806ab2a403a2fc1336a71d 
356dc2a799e25f199c43de7773 
 
4 Simulation and Implementation results 
The key feature of the keccak algorithm is the vividness in the hardware implementation. In this paper the 
simulation and prototyping of Keccak 256 bit algorithm is done on Xilinx® Virtex®-6 FPGA platforms[9][10]. In 
this illustration “The quick brown fox jumps over the lazy dog”is the input message example. Here Xilinx ISE 
Design tool is used to simulate the design and to check the functionality of the design[11]. Once the functional 
verification is done, the design is synthesized in Xilinx ISE.  
To verify the result, can use Xilinx simulation, that is Isim. Here using the program after the synthesis for 
the simulation to obtain the result. Fig. 5. shows the simulated result by using ISim®. ISim®  provides a complete, 
full-featured HDL simulator integrated within ISE. HDL simulation now can be an even more fundamental step 
923 Jeethu James et al. /  Procedia Technology  24 ( 2016 )  918 – 924 
within the design flow with the tight integration of the ISim®  within the design environment. Here using the 
program after the synthesis for the simulation. 
Fig. 6. shows the 256-bit keccak hash value using the ChipScope™ ILA tool. The table 3 shows the on chip 
resource utilization by the design.  
 
 
 
Fig. 5 Simulation using ISim 
 
 
 
                                                                                           Fig. 6 Hardware implementation result 
 
Table 3. Resource utilization Summery. 
Slice Logic Utilization  Used Available Utilization 
Number of Slice  Registers 7,273 301,440 2% 
Number of Slice LUTs 4,197 150,720 2% 
Number used as Memory 32 58,400 1% 
Number of occupied Slices 3,557 37,680 9% 
Number of LUT Flip Flop pairs used 3,217 7,307 44% 
Number of bonded IOBs 264 600 44% 
Number of BUFG/BUFGCTRLs 2 32 6% 
Number of BSCANs 1 4 25% 
Number of STARTUPs 1 1 100% 
Average Fan out of Non-Clock nets 3.49 - - 
 
924   Jeethu James et al. /  Procedia Technology  24 ( 2016 )  918 – 924 
5 Analysis 
The Keccak algorithm was developed, simulated and implemented on Xilinx ISE using verilog Hardware 
Description Language and port the design to VIRTEX 6 (ML 605) FPGA board. Also calculated the  performance of 
the system viz. throughput. The total amount of data processed within a fixed amount of the time is known as 
throughput. It can be calculated by using parameters viz. maximum frequency, block size and clock cycles. The 
equation for throughput calculation is given below: 
 
݄ܶݎ݋ݑ݄݃݌ݑݐ ൌ  כ ̴ 
 
 The design utilizes 24 clock cycles for calculating the hash function of the input bit string. 
 
 
Table 4. Specification Summery 
 
Maximum frequency  (fmax ) 
 
224.339 MHz 
Maximum Throughput (tp )     10.17 Gbps 
Total Memory Usage 340 MB 
Speed grade  -1 
 
6 Conclusions 
SHA- 3, 256 – bit hash function implementation and characterization is described throughout the paper. 
The algorithm was simulated and implemented successfully on Xilinx® Virtex® -6 FPGA (XC6VLX240T-
1FF1156)[12].  The results   validation is done through ChipScope™ ILA tool. The different performance 
characteristics of the design are calculated and tabulated respectively. The maximum frequency of the design,  f-max 
is 224.339MHz. The design uses one clock cycle for each round and throughput obtained is : 10170Mbps. 
References 
[1]Keccak Hash Function, National Institute of Standard and Technology (NIST), Available on  http://csrc.nist.gov/projects/crypto.html 
[2] Secure Hash Standard, FIPS 180/ /180-1/180-2/180-3/180-4 available on http://securityv.isu.edu/isl/fips180.html, 
http://www.nymphomath.ch/crypto/moderne/fip180-1.html, http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf, 
http://csrc.nist.gov/publications/fips/fips180-3/fips180-3_final.pdf, http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf 
[3] Aisha Malikl, Arshad Aziz, Dur- e-Shahwar Kunde, MoizAkhter “Software Implementation of Standard Hash Algorithm (SHA-3) Keccak on 
Intel Core-i5 and Cavium Networks Octeon Plus embedded platform”, 2nd Mediterranean Conference on Embedded Computing MECD, June 30, 
2013. 
[4] Merkle-Damgard Construction, Available on : https://en.wikipedia.org/wiki/Merkle-Damgard Construction 
[5] Guido Bertoni, Joan Daemen, Michel Peeters and Gilles VanAssche, “keccak implementation overview”, National Institute of Standard and 
Technology, version 3.2, May 29, 2012. Available on http://keccak.noekeon.org/KECCAK implementation overview.pdf 
[6] Guido Bertoni, Joan Daemen, Michel Peeters and Gilles VanAssche,“ keccak reference”, National Institute of Standard and Technology, 
version 3.0, Janvary 14, 2011. Available on, http://keccak.noekeon.org/keccak reference.pdf. 
[7] Guido Bertoni, Joan Daemen, Michal Peeters and Gilles VanAssche, “ The Keccak SHA-3 submission”, National Institute of Standard and 
Technology, version 3.0, Janvary 14, 2011. Available on, http://keccak.noekeon.org/Keccak-submission-3.pdf. 
[8] A.Gholipou R, S. Mirzakuchaki, “A Pseudorandom Number Generator with keccak Hash Function”, International Journal of Computer and 
Electrical Engineering, Vol. 3, No. 6, December 2011. 
[9] Deepthi Barbara Nickolas, Mr.A.Sivasankar, “Design of FPGA Based Encryption Algorithm using KECCAK Hashing Functions”, 
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 6, June 2013. 
[10] George Provelengios, Nicolas Sklavos, Paris Kitsos, Christos Koulamas, “FPGA-Based Design Approaches of Keccak Hash Function”, 15th 
Euromicro Conference on Digital System Design, June 20, 2013. 
[11] Stphanie Kerckhof, Franois Durvaux, Nicolas Veyrat-Charvillon, Francesco Regazzoni, Guerric Meurice de Dormale, Franois-Xavier 
Standaert “Compact FPGA Implementations of the Five SHA-3 Finalists”, International Conference, CARDIS, September 2011. 
[12] Xilinx FPGA Datasheets, available on, http://www.xilinx.com/ 
