RunFein: a rapid prototyping framework for Feistel and SPN-based block ciphers by Khalid, Ayesha et al.
RunFein: a rapid prototyping framework for Feistel and SPN-based
block ciphers
Khalid, A., Hassan, M., Paul, G., & Chattopadhyay, A. (2016). RunFein: a rapid prototyping framework for Feistel
and SPN-based block ciphers. Journal of Cryptographic Engineering, 6(4), 299-323.
https://doi.org/10.1007/s13389-016-0116-7
Published in:
Journal of Cryptographic Engineering
Document Version:
Peer reviewed version
Queen's University Belfast - Research Portal:
Link to publication record in Queen's University Belfast Research Portal
Publisher rights
© Springer-Verlag Berlin Heidelberg 2016.
This work is made available online in accordance with the publisher’s policies. Please refer to any applicable terms of use of the publisher.
General rights
Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other
copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated
with these rights.
Take down policy
The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to
ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the
Research Portal that you believe breaches copyright or violates any law, please contact openaccess@qub.ac.uk.
Download date:05. Apr. 2019
Noname manuscript No.
(will be inserted by the editor)
RunFein: A rapid prototyping
framework for Feistel and
SPN based block ciphers
Ayesha Khalid ·
Muhammad Hassan ·
Goutam Paul ·
Anupam Chattopadhyay
Received: date / Accepted: date
Abstract Block ciphers are the most prominent
symmetric-key cryptography kernels, serving as funda-
mental building blocks to many other cryptographic
functions. This work presents RunFein, a tool for rapid
prototyping of a major class of block ciphers, namely
product ciphers (including Feistel network and Sub-
stitution Permutation Network (SPN) based block ci-
phers). RunFein accepts the algorithmic configuration
of an existing/ new block cipher from the user through a
GUI to generate a customized software implementation.
The user may choose from various microarchitectural
templates (unrolled, pipelined, subpipelined) to gener-
ate an HDL description of the cipher. Various modes
of operation and NIST test suite may also be included.
This high-level design approach eliminates the labori-
ous development efforts for VLSI realizations of block
ciphers. It enables a quick design exploration, conse-
quently enabling fast benchmarking in terms of critical
resource estimation of various versions / configurations
of a cipher that vary in terms of security, complexity,
performance. Using RunFein, we have successfully im-
A. Khalid, M. Hassan
Institute for Communication Technologies and Embedded
Systems (ICE),
RWTH Aachen University, Aachen 52074, Germany.
E-mail: {ayesha.khalid, hassanm}@ice.rwth-aachen.de
G. Paul (Corresponding author)
Cryptology and Security Research Unit (CSRU),
R. C. Bose Centre for Cryptology and Security,
Indian Statistical Institute, Kolkata 700 108, India
E-mail: goutam.paul@isical.ac.in
A. Chattopadhyay
School of Computer Engineering,
Nanyang Technological University (NTU), Singapore.
E-mail: anupam@ntu.edu.sg
plemented some well known product ciphers and bench-
marked their performance without significant degrada-
tion against their published literature.
Keywords Block cipher · Feistel network cipher ·
SPN cipher · Product cipher · High-level Synthesis ·
Rapid Prototyping · VLSI Implementation · Loop
unrolling · Bitslicing · Subpipelining
1 Introduction and Motivation
The world of cryptography is highly dynamic where
newer cryptographic proposals, attacks are reported
frequently. Competitions for inviting newer / better
block ciphers, stream ciphers, hash functions get an ac-
tive participation from an ever increasing cryptographic
community. A thorough evaluation of these proposals
on software/ hardware platforms follow. RunFein aids
the cryptographer by enabling a high-level design ap-
proach for rapid prototyping of a block cipher algorith-
mic and microarchitectural specifications as software
and hardware implementations. This section gives two
major reasons for motivation of developing the need of
a rapid prototyping framework for cryptographic func-
tions, followed by methodology adopted by RunFein
tool and our scientific contributions.
1.1 Cryptography is Dynamic
Below we discuss the major reasons fueling the ever-
changing nature of cryptography. For each case, we give
one example (out of many possible) to illustrate the
point.
1. Cryptanalysis : Successful cryptanalytic attempts
render the further use of attacked ciphers vulner-
able as well as open doors for newer subsequent
proposals. Countering cryptanalysis also often re-
quires a modification in the original proposal, e.g.,
RC4+ [11].
2. Better machines : Development of Custom hardware
aids cryptanalytic attacks by enabling even the
brute force attacks for small key sized proposals
today, DES can today be broken in less than a
day [10]. Moreover, architectural updates in com-
puting machines influence cryptographic schemes,
e.g., BLAKE [14], a hash function supports different
word-size versions to cater both 32/64 bit machines.
This is an extended version of the conference paper [26] by
Khalid, Hassan, Chattopadhyay and Paul, presented at ICISS
2013. Sections 1 and 3 are based on [26] with major revision
and refinement. Sections 2, 4, 5, 6 and 7 are completely new
contributions in this work.
2 Ayesha Khalid et al.
Table 1 Parameters and their respective units to evaluate the performance of a cipher on H/W and S/W platforms
Software Platforms Hardware Platforms
µ controllers GPPs GPUs ASICs ASIPs FPGAs
Security key size, IV size, block size, number of rounds, mathematical soundness, known attacks
Cost The buying cost of the cipher IP license
Performance
Throughput cycles/block blocks/second
Latency N/A cycles/block
Resources
Core Area N/A NAND GE, mm2 no. of LUTs
Energy Joules Joules, Joules/access (RAMs)
Data bytes N/A registers bytes
Program lines of code N/A bytes N/A
Device occupancy N/A Cores usage% N/A LUT, RAMs usage%
Flexibility N/A key sizes, block sizes, modes, encryption/decryption etc.
3. Newer applications : The imminent ubiquitous com-
puting era has initiated newer security applica-
tions, e.g., lightweight cryptography for resource con-
strained devices. Consequently, lightweight crypto-
graphic proposals aiming a thrifty area-power bud-
get with reasonable security are frequently pro-
posed, e.g., PRESENT [12].
4. Design trade-off : Most of the block cipher propos-
als support multiple modes of operation and ver-
sions for variable sized key, block size, rounds etc.
These versions let the user choose a performance-
security trade-off, e.g., varying the number of rounds
in Salsa20 [15].
1.2 Evaluating a Cipher’s Quality is Hard
From a cryptanalytic point of view, attack resilience
is the most critical parameter for evaluating sound-
ness of a cipher. For two ciphers with comparable esti-
mated security level, based on their resistance to ma-
jor cryptanalysis efforts over time, the mutual evalua-
tion should however be based on their performance. The
performance criteria depends on the application class;
lightweight cryptographic functions support less com-
plex functions compared to their conventional counter-
parts and hence require lower area, power but enter-
tain smaller key sizes, lower throughput performances.
These conflicting requirements of security and perfor-
mance suggest considering interesting trade-off design
points [7,6].
Evaluating a cipher’s quality is hard since there are
multiple versions/ modes of operations, evaluation pa-
rameters and implementation platforms to choose from.
Table 1 gives a glimpse of multiple parameters used typ-
ically to evaluate the suitability of a cipher for a partic-
ular implementation platform. Developing custom com-
puting architecture and mapping on known processors
are termed here as the hardware and software imple-
mentation platforms, respectively. For software plat-
forms, throughput of a cipher is specified in terms of cy-
cles/byte (stream ciphers, PRNGs), cycles/hash (hash
functions) or cycles/block (block ciphers). For hard-
ware platforms, the basic parameters mentioned in Ta-
ble 1 are sometimes taken up as hybrid combinations,
e.g., Energy/bit, Throughput Per Area Ratio (TPAR),
power-Area-Time (a triple product to quantify design
compactness, throughput and power consumption), etc.
Moreover, we may have multiple performance figures on
the same computing platform according to the software
optimizations or hardware configurations chosen for ci-
pher implementation.
1.3 The RunFein Methodology
Considering the dynamic nature of cryptography,
the in-pour of new ciphers requiring benchmarking
against its existing counterparts is frequent. Vari-
ous cryptographic competitions, including AES [38],
NESSIE [39], CRYPTREC [40], eSTREAM [8], SHA-
3 [41], CAESER [42], received an ever increasing num-
ber of candidate proposals compared to their succes-
sors. Performance is considered a critical criteria for fil-
tering out the finalists out of these proposals. The call
for AES [38] announced that the computational effi-
ciency of both hardware and software implementations
would be taken up as a decisive factor for selection of
the winner. Similarly, the choice of Keccak as SHA-3 fi-
nalist was attributed by NIST to both its good software
performance and excellent hardware performance [43].
Given the increasing importance of fair and fast hard-
ware performance benchmarking, a rapid prototyping
tool specific to cryptographic functions is but impera-
tive.
Quantifying the performance of the cryptographic
proposals as custom VLSI implementations requires
benchmarking against diverse parameters like area,
power, throughput, latency etc. The human-driven pro-
cess of writing and validating HDL for stream ciphers
is slow, error prone and requires expertise both in al-
gorithm and hardware design domains to reach the op-
tions best suited for an application requirement. The
workload is further compounded by the possibility of
benchmarking various points on security-performance
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 3
trade-off by various microarchtitrues exploiting multi-
ple levels of parallelism and bitslicing. RunFein aims to
solve these problems through automation.
Various high-level synthesis (HLS) tools have been
proposed both academically and commercially to auto-
mate the VLSI design cycle. Their architectural opti-
mizations remain however generic. The user does not
have the freedom to choose various hardware microar-
chitectures specific to cryptographic functions class to
rapidly explore performance-resources trade-off. Some
of these tools have slow learning curve as they require
learning a new language. Moreover, the HDL genera-
tion performance shows a dependence on the coding
style of the designer. Consequently their results remain
suboptimal compared to the hand optimized crypto-
graphic implementations. An HLS effort for SHA-3 can-
didates revealed that ranking of candidate algorithms in
terms of performance remains the same independently
whether the HDL implementations are developed man-
ually or generated automatically using high level syn-
thesis tool [44]. However, these implementations do not
match the efficiency of manually written RTL, the fre-
quency and throughput is lower and the area is up to
30% more compared to manual implementations [44].
RunFein presents a language independent interface;
it accepts a sophisticated high-level block cipher de-
sign through a GUI. The user provides three sets of
parameters to the toolflow. Firstly, the algorithmic de-
sign configuration comprising of constructive elements
coming from a set of functionally complete construc-
tive elements to define any block cipher. Secondly, the
user chooses microcarchtrual configuration of the ci-
pher for HDL generation. It includes a mode of op-
eration and one of the various microarchitectures like
unrolled, pipelined, subpipleined, bitsliced implemen-
tations. Thirdly and optionally, the user may specify
a set of testvectors, if already known for the verifica-
tion of the design. The software and hardware genera-
tion engines of the tool generate an optimized software
implementation and a synthesizable HDL description.
The design configuration given to the tool is validated
for completeness and correctness at various stages of
hardware/software generation. These rule checks de-
tect functional and system-level problems much ear-
lier in the design cycle improving design reliability and
shortening time to market. The tool infers the neces-
sary interfaces and structures to implement optimized
HDL along with verification environments and neces-
sary scripts. It provides a seamless end to end verifica-
tion from the configuration to RTL validation/ verifi-
cation environments.
1.4 Original Contribution
With a similar motivation as for RunFein, we ear-
lier presented RAPID-FeinSPN [26], that caters to the
rapid prototyping for block ciphers but covering only a
simple loop folded hardware implementation. RunFein
is a step forward in the direction of hardware optimiza-
tions by offering the user various microarchitectures de-
sign implementation alternatives. The noteworthy con-
tributions of this work are listed.
1. We surveyed a diverse and wide range of block
ciphers to systematically build up a functionally
complete set of constructive elements/ architectural
structures to define the configuration space of any
block cipher.
2. RunFein allows a list of NIST standardized modes of
operations in the software / hardware implementa-
tion of cipher.We integrated NIST test suite for eval-
uation of statistical randomness of the encrypted
data.
3. For hardware implementation, the user has a choice
to pick amongst various microarchitectures config-
ured as per he wishes. This fast exploration of vari-
ous design alternatives significantly shortens the ci-
pher design cycle.
4. The configuration model completeness and RunFein
tool effectiveness is validated by implementing some
prominent block ciphers and benchmarking their
performance to rival their manual implementations.
Rest of the paper is organized as follows. Section 2
discusses the categorization of block ciphers as com-
putational kernels. Section 3 gives RunFein toolflow
and discusses the configuration space of the product
block ciphers. The salient features of the software gen-
eration engine of RunFein are discussed in Section 4.
Section 5 gives the hardware microarchitectures sup-
ported by the hardware generation engine. Section 6
explains the area, power and throughput results of two
prominent ciphers in various hardware configurations
along with a comparison with existing work. Section 7
concludes this paper and provides future roadmap.
2 Dwarfs of Cryptography
For rapid prototyping of a block cipher, RunFein em-
ploys a bottom-up design approach by piecing together
elementary operations to form a complete system. The
idea is similar in spirit to the 13 computational ker-
nel classes or so called Berkeley dwarfs capturing the
major functionality and data movement pattern across
an entire class of important application [27]. A sim-
ilar idea is presented by Intel Recognition-Mining-
Synthesis (RMS) view [28]. This concept of design
4 Ayesha Khalid et al.
based on computational kernels has been exploited
for rapid prototyping in cryptographic applications,
e.g., fast hardware implementation of elliptic curve
arithmetic operations [31], parameterized cryptanalytic
toolflows [29,30], rapid prototyping frameworks for
cryptographic protocols [32,33]. Undertaking these ba-
sic kernels across algorithms of an application class
helps in a generic understanding as well as in an op-
timized implementation [34]. Classifying cryptography
under computational dwarfs [27], makes it a subclass of
combinational logic dwarf, along with other computing
subclasses.
Next, we first justify why the study of block ciphers
out of all the symmetric key cryptography functions are
more significant and then investigate the computation
kernels of block ciphers.
2.1 Workhorses of Symmetric Key Cryptography
Block ciphers enable secrecy of encrypted data, not
beyond a single block of data. However under var-
ious modes of operation they enable data transmis-
sion having major services of Information Security (In-
foSec) including authenticity, integrity and confidential-
ity. These modes transform block ciphers to other cryp-
tographic primitives, making them the workhorses of
symmetric key cryptography and consequently making
their study imperative. Other than these operational
modes, the basic deterministic transform functions of
block cipher serve as elementary kernels or building
blocks for many symmetric key cryptographic proto-
cols. Fig. 1 highlights this constructive nature of the
block ciphers being used as other cryptographic func-
tions including stream ciphers, hash functions, message
authentication codes (MAC) and cryptographically se-
cure pseudo-random number generator (CSPRNG). We
mention a few examples of cryptographic functions
driven from block ciphers in this context.
Stream
Ciphers
Block
Ciphers
Hash
function
CS-
PRNG
AE MAC
HMACCBC-MAC,
OMAC,
PMAC mode
OCB, GCM
EtM modeCTR
mode
OFB, CTR, CFB
modes
Oneway function
const.
Fig. 1 Modes of operation for cryptographic functions
1. Stream Ciphers : Block ciphers are transformed to
stream ciphers under counter mode (CTR) and out-
put feedback mode (OFB) [16]. SOSEMANUK [17],
an eSTREAM finalist stream cipher, uses a block
cipher SERPENT for its construction.
2. Hash functions : Hash functions may be driven from
a block cipher, operating in schemes that make
them non-invertible one-way compression functions.
WHIRLPOOL is based on an AES like block ci-
pher operating under a Miyaguchi-Preneel hashing
construction scheme [19]. More examples borrowing
block cipher constructions include two SHA-3 final-
ists BLAKE [14] and Skein [18].
3. MACs : MACs may be driven from hash functions
(in HMAC mode) or from block ciphers (in OMAC,
PMAC and CBC-MAC mode).
4. CSPRNG : A CSPRNG can be driven from a block
cipher operating in counter mode of operation. Also,
running a stream cipher on a counter returns a
CSPRNG, with its initial state kept secret.
5. Authenticated Encryption: Authenticated encryp-
tion is generically constructed by combining a block
cipher and a MAC operating under a mode of op-
eration, hence simultaneously providing confiden-
tiality, integrity and authenticity assurances on the
data. Various modes of authenticated encryption
have been standardized by ISO [20].
Its worth highlighting that though block ciphers
may serve as the building blocks of many cryptographic
functions, these functions may have other roots of ori-
gin. Most of the popular stream ciphers are constructed
using LFSRs along with some non-linear combining
functions and an FSM. Similarly, many CSPRNGs orig-
inate from number theory problems. Also worth men-
tioning is the fact that cryptographic functions take in-
spiration from each other too. SEAL, HC-128 and HC-
256 are stream ciphers that make use of SHA family of
hash functions for their key expansion phase, SHACAL
is a block cipher based on SHA-1. Many stream ciphers
and CSPRNGs have common roots.
2.2 Ingredients of a Block Cipher
This section presents classification and typical elements
of construction for block ciphers. Since our goal is to de-
fine configuration space of block ciphers for high level
synthesis, we strictly focus on their architectural/ oper-
ational constructs. Their complexity and cryptanalytic
properties are therefore skipped but could be referred
from [45, Chapter 7].
A block cipher is a mapping of a plaintext data block
of size SB(blocksize) to an equal sized ciphertext block
under the parametrization of a key (of size SK , key-
size). This deterministic mapping (encryption) should
be invertible. The inverse function (decryption) gener-
ates the original plaintext given the ciphertext under
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 5
the same key. Classical/ historical block ciphers in-
clude Caesar ciphers, affine ciphers, substitutions ci-
phers, polyalphabetic substitutions, etc. These tech-
niques are proven over time to be cryptanalytically vul-
nerable and not suitable for practical use today [45,
Chapter 7].
The product ciphers make the most popular class
of block ciphers (and lightweight block ciphers) used
today. A product cipher combines multiple data trans-
formations so as to make the resulting cipher is more se-
cure than the individual transformations. These trans-
formations may include permutations (adding diffu-
sion), substitutions (adding confusion), translations
(e.g., XOR), linear transformations (e.g., rotation),
arithmetic operations, modular multiplication, trans-
positions etc. An iterated product cipher involves se-
quential repetition of a set of transformations called a
round function. The round function iterate Nr (round-
count) number of times during encryption/ decryption.
For ith round a subkeyi (of size SSK) is generated. Two
major classes of iterated product ciphers are defined as
follows [45, Chapter 7].
1. A substitution-permutation (SPN) cipher is an iter-
ated product cipher composed of a number of stages
each involving substitutions and permutations. To
ensure inevitability, for each of the generated sub-
keys the round function is a bijection on the round
input.
2. A Feistel cipher is an iterated product cipher whose
each round splits data, passes one half to the round
function and swaps the two data halves. Hence Feis-
tel cipher operate on alternating halves of the ci-
phertext, while the other remains constant. The
round function need not be invertible to allow in-
version/decryption of the Feistel cipher.
2.3 Computational Building Blocks of Symmetric Key
Cryptography
This section attempts to unconventionally classify the
major functions of symmetric key cryptography (block
ciphers, stream cipher, hash functions) based on their
underlying common computational elements.
A small set of three operations, i.e., modular ad-
dition (A), bit rotation (R) and bit wise XORing (X)
make a functionally complete set of operations for build-
ing any cryptographic function [24, Section 5], includ-
ing block ciphers, stream ciphers and hash functions.
The term AXR (later renamed to ARX) was coined
by Ralf-Philipp Weinmann [21] in 2009, however such
designs have been proposed much earlier. This combi-
nation of linear (X, R) and nonlinear (A) operations,
iterated over multiple rounds achieves strong resistance
against known cryptanalysis techniques [24].
Fig. 2 Computational commonalities of symmetric key cryp-
tographic functions
Fig. 2, a subset diagram, captures the computa-
tional kernels of symmetric key cryptography. The bit-
wise shift operation is added to the ARX pool of op-
erations for construction of some new cryptographic
functions like HC series of stream ciphers [5]. Sim-
ilarly, addition of Boolean operations (AND, NOT)
make the computational basis of hash functions includ-
ing MD5 [23] SHA-0,1 [3] and SHA-3 [22]. Interestingly,
SHA-3 (Keccak) originates from the concept of flexi-
ble sponge constructions for cryptographic functions,
however, classification based on the underlying opera-
tions brings SHA-0,1 and SHA-3 simply under the com-
mon axis. SHA-2 [3] requires bitwise shifting as well as
Boolean operations in addition to the ARX pool of op-
erations, indicated in Fig. 2.
Many stream ciphers are based on a Lin-
ear/nonlinear Feedback Shift Register (N/LFSR) whose
inputs are selected from the previous state after lin-
ear/nonlinear functions applied on them. Taking ex-
amples from eSTREAM finalists [8] include SOSE-
MANUK, and all its three finalists in the hardware
profile. FSRs have been employed in the construction
of block ciphers and hash functions too, some examples
are listed in Fig. 2.
Feistel ciphers may use substitutions and permuta-
tions, other than the ARX operations, in their round
functions. XOR is generally used for key whitening the
round values with subkey of that round. Addition op-
eration might not be explicitly used in round opera-
tions, however, a count-up/down counter is always re-
quired for encryption/decryption block realization, re-
spectively. DES has a Feistel structure but employs
SBoxes and PBoxes for its round operation. AES [2]
6 Ayesha Khalid et al.
Fig. 3 RunFein toolflow for software generation, LISA based hardware generation and NIST test suite
does not have any PBoxes and rather uses Galois field
multiplication. Its noteworthy that this computational
categorization highlights only the commonalities as a
trend in cryptographic functions. This categorization is
neither complete nor by definition binding to a particu-
lar class of ciphers. Consequently, exceptions exist, e.g.,
TEA [4] family of lightweight block ciphers (XTEA,
XXTEA) are Feistel Network ciphers by structure and
use shift operations other than ARX. AURORA, a hash
cipher for SHA-3 competition has a structure as a com-
bination of SPN and a generalized Feistel structure [25].
Classifying the cryptographic functions on the basis
of their primitive computational elements brings for-
ward a surprisingly simplistic angle of viewing them,
beneficial to their implementation, both on hardware
and software platforms. RunFein is developed around
the concepts of modularity and extensibility. It sup-
ports constructive composition of cryptographic build-
ing blocks supporting SPN / Feistel network based
block ciphers, which are favorite primitives for block
ciphers today. Additionally, stream ciphers based on
block ciphers (e.g., salsa20 [15]) can be realized using
RunFein. It can also model Lai-Massey structure block
ciphers and can be conveniently extended to support
newer structures/ components if/ when the need arises.
3 RunFein Toolflow
The toolflow of RunFein is graphically shown in Fig. 3.
The user populates the configuration space of a new
block cipher to get customized software and hard-
ware implementations. A sophisticated design capture
is made possible via a GUI to let the user conveniently
specify cipher design and implementation customiza-
tion. The configuration for a cipher could be added,
parameter by parameter, or could be saved and loaded
later. A list of known ciphers is available to instantly
load the configurations for easier manipulation. Run-
Fein validates this design capture for completeness and
correctness at various stages of the toolflow. It success-
fully abstracts away the diversity of the design space
by translating the configuration it to a generic block
cipher template. The configurations undergo a set of
design rule checks before generating the software and
hardware implementations.
3.1 Cipher Configuration Space
A key challenge addressed in this work is to identify a
complete set of algorithmic primitives and architectural
sub-structures that is generic enough to configure a
range of block ciphers and their implementations. After
survey of diverse ciphers, we developed sub-structures
and component lists to develop primitive libraries for
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 7
Table 2 RunFein supported modes of operation for block ciphers
Mode of operation Encryption || Decryption || Initialization Vector
Electronic codebook (ECB) Ci = Ek(Pi) ✓ Pi = Dk(Ci) ✓ -
Cipher block chaining (CBC) Ci = Ek(Pi ⊕ Ci−1) ✗ Pi = Dk(Ci)⊕ Ci−1 ✓ C0 = IV
Propagating CBC (PCBC) Ci = Ek(Pi ⊕ Pi−1 ⊕ Ci−1) ✗ Pi = Dk(Ci)⊕ Pi−1 ⊕ Ci−1 ✗ P0 ⊕ C0 = IV
Cipher Feedback (CFB) Ci = Ek(Ci−1)⊕ Pi ✗ Pi = Dk(Ci−1)⊕ Ci ✓ C0 = IV
Output Feedback (OFB) Ci = Pi ⊕ Oi ✗ Pi = Ci ⊕ Oi ✗ Oi = Ek(Oi−1), I0 = IV
Counter (CTR) Ci = Pi ⊕ Ek(IV ) ✓ Pi = Ci ⊕Dk(IV ) ✓ nonce+ counter = IV
software and hardware realizations. We propose a so-
called layered architecture where each layer specifies a
data transformation specified by operation. To fully ap-
preciate the concept of layers of operations, we consider
the data flow graph of the cipher (and its key expan-
sion) where data is moving from top to bottom. The
layers are then the horizontal divisions of the data flow
diagram.
The configuration parameter set is categorized into
algorithmic parameters, modes of operation, microar-
chitectural parameters and testvectors. (All parameter-
izable attributes that a user must populate are high-
lighted in the proceeding discussion).
3.1.1 Algorithmic Parameters
The parameters to define the algorithmic construction
of block cipher are
– Basic Parameters: The input plaintext to a block
cipher (encryption) and its output ciphertext are
of equal size, blocksize SB (all sizes specified in bits).
The size of Key is specified as SK (for some oper-
ational modes an IV (initialization vector) having
size SIV must also be specified). The granularity of
the cipher is specified as wordsize (SW) of the ci-
pher. Block ciphers iterate a deterministic combina-
tion of operations known as a round. The rounds
operate Nr (roundcount) number of times during
encryption/ decryption.
– Round Layers: For most block ciphers, the data
undergoes an initial and/or final transformation,
before and/or after the rounds processing, respec-
tively. Since these transformations may differ from
each other and the central round transformation,
we name them as round init and round final,
while the round transformation is referred as as
round middle. These three kinds of rounds are
defined by a series of layers of operations. Every
layer comprises of at least one of the following op-
erations, performed exclusively on the user spec-
ified portions of the layer input (this list could be
conveniently extended to accommodate newer oper-
ations).
1. Substitution or permutation boxes (SBox,
PBox).
2. Galois field multiplication GF-mul with an-
other polynomial (primitive polynomial must be
specified).
3. Bitwise operations including rotation, shift-
ing, addition, XOR-ing, ARK (add round
key), AddCounter (XORing with counter).
4. Operations specific to Feistel networks including
split, swap.
5. Compund operations used in popular ciphers,
e.g., shiftrows, MixColumns used in AES [2]
6. No operation (nop)
– Kround Operation: For each round, a subkey (of
size SSK) is generated through key expansion. Like
rounds, key expansion requires iteration of kround
transformation, Nr (roundcount) number of times
to generate subkeys. kround may also have different
definitions for kround init, kround middle and
kround final, each of them are defined by layers
of operations like cipher rounds.
The layers of each round have a layernumber to
specify their order of execution, within that round. The
input to and output from a layer may differ in size (bits)
due to an expansion/ contracting layer operation and
is specified as S lin and S lout, respectively.
3.1.2 Modes of Operation
RunFein lets the user opt from a list of modes of op-
eration to add the chaining dependencies between ad-
jacent blocks of data during encryption / decryption.
Currently, any of the NIST standardized modes of oper-
ation may be chosen for implementation [16], as listed in
Table 2. Here Ci represents ciphertext for the i
th plain-
text block after encryption function parameterized by
the secret key Ek while Pi represents the plaintext after
decryption. Due to the chaining dependencies, multiple
blocks of data cannot be subjected to encryption or
decryption in a parallel fashion for some modes of op-
erations as indicated in the Table 2. For all modes other
than ECB, the user specifies the IV and any additional
parameters required.
8 Ayesha Khalid et al.
3.2 Putting the Things Together
We take up two ciphers and try to workout their
algorithmic configuration according to the discussed
RunFein’s layered architecture definition methodology.
These being AES-128 [2] due to its widespread usage
and PRESENT [12] due to its ultra lightweight nature.
Moreover, both of these ciphers have been standardized
by ISO. We define the configuration space for these ci-
phers in encryption blocks only. The reader is kindly
requested to refer to the documentation of these ciphers
for a detailed understanding of their functionality [12,
2].
Table 3 RunFein basic parameter configuration space
Parameter PRESENT-80 [12] AES-128 [2]
SB(bits) 64 128
SK(bits) 80 128
SSK(bits) 64 128
SW (bits) 4 8
Nr(rounds) 32 10
round init - 1 layer
round middle 3 layers 4 layers
round final 1 layer 3 layer
kround init - 1 layer
kround middle 3 layers 7 layers
kround final - -
3.2.1 PRESENT-80
Table 3 shows the basic parameter configuration space
for 80 bit key of PRESENT cipher. The configuration
parameters for PRESENT-80 are fed to the tool’s GUI
and are stored as an xml configuration file, a snap-
shot of which is shown in Fig. 4. A separate token
ALGORITHM holds the basic parameter, round and
key round operational layers. Basic parameters includes
sizes of block, key and word size and the information
of rounds for encryption. The ROUND token holds the
information for three types of rounds. A round init is
not required hence no layers are defined for it. The
round middle is defined by the following 3 layers of
operation:
– layer0 is the ARK, where input and output to the
layer is equal sized. Data is xored with the subkey,
where
subkey = key[79 : 16].
– layer1 is the SBox, the user specifies a total of 16
SBoxes ( SB
SW
) to be inserted, specified by Word2Sub
being ’-1’. For SBox 2SW values are specified, each
SW bits wide.
– layer2 is the PBox, with a total of SB arguments
∈ [0..SB ].
The arguments for SBox and PBox can be loaded ei-
ther by a text file or added by the user in the edit boxes.
The round final is specified by one layer of ARK, same
as the first layer of round middle. Hence the ciphertext
is taken out after the first ARK layer in the last iter-
ation of cipher encryption. Fig. 4 shows the KROUND
token that configures the key expansion information.
For key expansion in PRESENT-80, the kround init
and kround final are not required and hence defined
as having no layers. The kround middle requires three
layers of operations defined below.
– layer0 is the ROTATE operation configured to
carry out a left rotation by 61.
– layer1 is the SBox. The user specifies one SBox
inserted at word number 19 of the key, the most
significant nibble to the layer input. The rest of the
bits are passed on un-altered.
– layer2 is the AddCounter that XORs the selected
bits of the data (bit 19 till 15) input to the layer
with a 5-bit counter (round counter).
A round counter increments till it reaches Nr − 1
and a valid ciphertext is available.
3.2.2 AES-128 [2]
For AES-128, the corresponding parameters for Run-
Fein are specified as given in Table 3. The round init
requires one operation layer, i.e., ARK. round middle
is defined by 4 layers.
– layer0 is SBox. The user specifies 16 SBoxes to be
inserted along with SBox definition of 256 bytes.
– layer1 is a Shift-rows operation. Its a compound
operation that takes up the layer input as a 2-D
matrix and and re-arranges the words of each rows
with fixed offsets.
– layer2 is a GF −Mix, a compound operation as-
suming 2-D arranged data. The user specifies a 4x4
column coefficients for GF(28) multiplication.
– layer3 is the ARK, that XORs the key with the
data.
Using this layered architecture, a cipher may have
multiple valid definitions. The Shift-rows operation in
layer1 may have been defined using various layers, each
rotating one row of the state matrix, as defined by the
AES specifications. We define it as a standard com-
pound operation since it’s a common operation used in
ciphers other than AES, e.g., LED.
The round final is defined by 3 layers, same as
layer0, layer1 and layer3 of round middle. For each
round a subkey is generated through a kround. The
kround init is a nop layer since the first subkey is the
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 9
Fig. 4 The configuration file snapshot for PRESENT-80 generated by RunFein
input key itself. The kround final is not required and
hence not defined. kround middle requires 7 layers of
operations for its definition as shown in Fig. 5.
– layer0 is a ROTATE left by 8 layer. It takes the
least significant 32-bit word of the key. This layer
also expands 128 bits of input to 160 bits of out-
put by concatenating the input bits unaltered along
with the rotated word output.
– layer1 is the SBox, 4 SBoxes are inserted on the 4
least significant bytes of layer1 input.
– layer2 is a XOR with counter dependent constants
(RCON). The constants are specified by the user
using a text file.
– layer3-layer6 are XOR operations, performing se-
lective xoring of layer inputs as per AES specifica-
tions.
Fig. 12 shows a GUI snapshot of RunFein with AES-
128 configuration.
3.3 Cipher Model Creation and Validation
The RunFein framework provides a sophisticated con-
figuration capture via a GUI (Some snapshopts of the
RunFein GUI are presented in the appendix). It pro-
vides convenient default values in the GUI wherever
necessary, the configuration file with default values for
microarchitecture and testvectors for PRSENT-80 is
shown in Fig. 4 (further discussion follows in follow-
ing section). Other than the parameters, specified by
the user through GUI, some parameters are inferred
by the tool. A counter is required to keep track of
the iterations of the cipher. It counts up or down dur-
ing encryption or decryption of a block of data, respec-
tively. Its size is taken up as ceil(log2(Nr)) bits. Other
than counter, we have two variables namely d state
and k state, that contain the updated data state and
key state, respectively (for hardware implementation
these values are D-flipflops instead).
10 Ayesha Khalid et al.
Before creation of a valid cipher model, the config-
uration parameters given by the user undergo a list of
defined rules checks. The user is prompted in case of a
violation and cipher implementation does not proceed
unless a valid configuration is specified. (Some addi-
tional rules related to hardware microarchitectures are
discussed in Section 5.3.6.)
– Blocksize of any cipher by definition equals the sizes
of plaintext/ ciphertext.
SB = SP = SC
SB = 2m, where m ≥ 1.
– Size rules for key and the subkeys generated.
SB = SSK
SK = 2m, where m ≥ 1.
– Size rules for wordsize of cipher.
SW = 2m, where m ≥ 1.
SW ≤ SK and SW ≤ SB .
SW = n.SB = k.SSK , where n, k ≥ 1.
– For modes of operations (defined in Section 3.1.2),
IV size rule
SB = SIV
– The number of subkeys generated should be equal
to the number of ARK operations, hence each ARK
consumes one key.
– The SBox values are ∈ [0..2SW ].
– PBox, rotation/shifting, XOR operations have ar-
guments ∈ [0..SB ].
– The polynomial coefficients for GF-mul are not ∅.
The configuration file is parsed by RunFein and ci-
pher model is created, for PRESENT-80 and AES-128
it is shown in Fig. 5. The cipher model comprises of
a controller and datapath. The controller is simply the
inferred counter (not shown in Fig. 5), the datapath
of the cipher is constructed by operational layers of
round and kround. A mulitplexer is also inferred at
the input to d state and k state registers, controlled
by the round count. For PRESENT-80, the last round
or round final comprises of ARK layer only and hence
the cipher text is extracted after layer0. For AES-128,
layer0 of kround expands the key and layer3 contracts
it back to 128 bits again.
4 Software Generation Engine
The software generation engine takes either the user
specified configuration of a new cipher or alterna-
tively loads the design configuration of a known cipher
((Fig. 3)). One also specifies data for plaintext, key, IV
(through text files or edit boxes) using the GUI. Run-
Fein compiles the cipher model to generate a high per-
formance, fixed-point ANSI-C description. The code is
enhanced by a simulation environment with user con-
trollable switches for verification, throughput profiling,
data dumping etc. The generated code is not specifi-
cally optimized for a particular General Purpose Pro-
cessor (GPP), however, it has a regular structure and
good code readability.
All the configuration parameters of the cipher (as
specified in xml file listing in Fig. 4) are #defined in
a header file. This includes all basic configuration, test
vectors and the microarhitachture, though software im-
plementation only caters the default values of a simple
iterative loop folded implementation. Data types of reg-
isters, layers and all interfaces are typedef -ed in accor-
dance with their respective granularity specified. Sup-
plementary functions are kept in a separate file, that is
included in the main file during simulation. These func-
tions include datatype conversion functions (e.g., con-
version of hexadecimal to binary arrays and vice versa),
data dumping and verbose simulations. For each oper-
ational layer of round and kround, a separate function
is defined with interface and functionality, as per the
user specified. Layers may operate on operands with
different granularity, i.e., PBox operates on bits, SBox
operates on SW etc. The functions generated include
relevant calls to conversion of granularity functions in
addition to the functionality of the layer operation.
The main body of code, having the controller and
the datapath of the cipher model, is a separate file that
#includes all supplementary and header files. For elab-
oration of code simulation environment, we refer to the
simplistic pseudocode for encryption of one block of
data given in algorithm 1. The plaintext and key are as-
signed to the local variables d state and k state, respec-
tively (line 1,2). k state is updated first byKround init
function. Using the updated key, the d state is updated
using round init function(line 4). The controller part of
the cipher comprises of counter variable, keeping track
of the round under execution. The loop starting in line
6 iterates for roundcount−1 times and keeps updating
the data and key registers. The final round generates
the last k state which is used up by round final to
generate the ciphertext, as given in (line 9,10), respec-
tively. RunFein generated code for AES is presented in
the appendix of [26].
The software generation engine of RunFein gener-
ates a single-threaded, untimed, sequential C model of
the stream cipher with necessary libraries and scripts.
Some of its additional features are highlighted.
– NIST Test Suite: RunFein has the NIST test suite [1]
integrated with it to characterize the statistical qual-
ities of PRNGs. It serves as a first step in determin-
ing the suitability of a PRNG used for cryptographic
purposes. Fig. 13 gives a GUI snapshot of RunFein
for the selection and parameterization of various sta-
tistical tests available for execution as per the user
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 11
d_state
plaintext
0 1
128
S S S
4
...
8
M
C
M
C
M
C
M
C
32
k_state
<<<8
key
1 0
S S S
4
8
S
la
y
e
r0
la
y
e
r1
la
y
e
r2
la
y
e
r3
[31:0]
128
[97:64]
[63:32]
[31:0]
32
la
y
e
r0
la
y
e
r1
la
y
e
r0
la
y
e
r2
layer3
layer4
layer5
layer6
128
RCON
S S S
d_state
S
k_state
<<<< 61
plaintext key
1 00 1
la
y
e
r0
64
64
4
80
[75:0]
...
la
y
e
r0
la
y
e
r1
la
y
e
r2
[79:16]
la
y
e
r1
la
y
e
r2
[19:15]
c
o
u
n
te
r
ciphertext
key
ciphertext
Fig. 5 Layers for the loop folded implementation (with on the fly key expansion) of PRESENT-80 (left) and AES-128 (right)
Input: plaintext, key, configuration
Output: ciphertext
1 d state = plaintext;
2 k state = key;
3 counter = 0;
4 k state = Kround init(counter, k state);
5 d state = round init(counter, k state, d state);
6 for counter=1 till ≤ Nr step 1 do
7 k state = Kround middle(counter, k state);
8 d state = round middle(counter, k state, d state);
end
9 k state = Kround final(counter, k state);
10 ciphertext = round final(counter, k state, d state);
Algorithm 1: RunFein Encryption Pseudocode
wishes. (RunFein caters only the block ciphers, how-
ever, they behave like stream ciphers and CSPRNGs
under certain modes of operation.)
– Verification: For the verification of the generated
model according to the user specified testvectors, a
verification environment is generated. For new pro-
posals, without defined testvectors, the verification
switches may be turned off by the user.
– Performance Profiling: The user may enable a
performance profiling environment in the generated
software implementation to evaluate encryption speed
(in seconds, cycles/ byte) of the cipher design. Pro-
vision of encrypting bulk data from random plain-
text for monitoring data randomness is provided. A
reasonably efficient generated implementation may
be further manually optimized for a specific plat-
form.
5 Hardware Generation Engine
The hardware generation engine requires addition-
ally the microarchitectural configuration of the cipher
model to be specified by the user, other than the algo-
rithmic configuration to generate a complete working
model of the block cipher in synthesizable HDL along
with a testbench and necessary scripts. First the via-
bility of the chosen microarchitecture configuration is
evaluated by RunFein by a list of rule checks. After
design validation, RunFein generates the design imple-
mentation as an ADL and relies on Synopsys Processor
Designer [36] for generation of synthesizable HDL code,
as shown in Fig. 3. This design can be profiled to get
critical parameters like the maximum clock frequency
of the design, chip area and power consumption.
5.1 HDL Toolflow
RunFein employs Synopsys Processor Designer for an
efficient high-level synthesis framework [36]. The HDL
design is generated in a high level language called
Language for Instruction-Set Architectures (LISA) [35].
The language offers rich programming primitives to
capture an implementation of a design with full pro-
grammability to an Application Specific IC (ASIC).
The hardware implementation flow using LISA is shown
in Fig. 6. Besides generating a complete set of soft-
ware development tools (compiler, simulator, assem-
bler, linker), synthesizable HDL code (both VHDL and
12 Ayesha Khalid et al.
Architecture Tools
Assembler
Linker
Simulator
Verification & Profiling
(Validation of Target 
Architecture)
Synthesizeable
RTL Model
Verification & Profiling
(Clock Speed, Chip Area, 
Power Consumption)
Gate Level Synthesis
Synopsys 
Processor Designer
Design Description 
in LISA
Fig. 6 Implementation flow with LISA
Verilog) for the design can be generated automatically
from the LISA processor description. The language al-
lows full control over minute design decisions and pre-
serves the overall structural organization neatly in the
generated hardware description.
The RunFein generated LISA description is con-
verted to a synthesizeable, hierarchical block cipher HDL
and testbench with necessary scripts that can be fur-
ther used to carryout
– Simulations for design verification, gate-level simu-
lation (post-synthesis) using verification tools.
– Logic synthesis of the design for profiling critical
parameters like the maximum clock frequency, chip
area.
– Post-synthesis power consumption estimation with
using back-annotation.
5.2 Operational Layers Inference
The hardware generation engine performs tries to opti-
mize the hardware reuse for middle and final rounds of
the algorithm by gauging the commonalities between
the two. Since for PRESENT-80, the round final is
a single ARK operation, the final ciphertext is there-
fore taken out after first layer of round middle. For
AES-128, the middle round and last round differ only
in one layer, i.e., GF-mul. A bypass mux is automat-
ically inserted, enabled at the final round as shown
in Fig. 5. SBox sharing for bitsliced configurations is
also performed (discussed in subsequent sections). If
for a cipher, the ARK operation is in roundint (e.g.,
in PRESENT-80) subkey is taken from the k state reg-
ister, if it is in roundf inal (e.g., in AES-128), it is taken
from the output of the last layer of k round.
A simplistic mapping of the layers into operations
is carried out. (The extensible RunFein framework en-
ables/ encourages multiple customized LISA definitions
of operations)
– SBoxes are implemented as read-only lookup tables
(LUT).
– Diffusion operations like rotation, shifting, PBoxes
are implemented using rewiring of the inputs and
renders no overhead to the combinational delay of
the circuit.
– GF-mul is implemented by shifting and XORing op-
erations in accordance to the primitive polynomial
of the finite field specified.
– Supported popular compound operations (e.g., Mix-
Columns) have cascaded implementations of their
constructive operations.
5.3 Supported Microarchitectures
Through RunFein the user can quickly explore various
microarchitecture design options residing at different
intensity of the performance-area trade-off. The user
specifies algorithmic configuration of the cipher design
always according to the simplistic loop folded architech-
ture. In addition, he must specify the microarchitec-
ture he wants RunFein to automatically implement. By
tweaking the microarchitecture configuration, he may
opt for parallel implementations (loop subpipelining/
unrolling) duplicating hardware for boasting through-
put or bitsliced designs economizing area/ power at the
expense of lower throughput performance by employing
resource sharing. We discuss these microarchitectures
individually, they are depicted in Fig. 7.
5.3.1 Loop Folded
A typical loop folded block cipher implementation per-
forming one round per clock cycle (Nr cycles per block)
is shown in Fig. 5 and Fig. 7 a). It is the default hard-
ware implementation microarchitecture of RunFein and
serves as a middle point for area-throughput trade-off
between parallel implementations and bitsliced imple-
mentations. The controller comprises of round counter
register, incrementing every cycle (Fig. 8 a). The selec-
tion of plaintext or folded data for d state register is
controlled by this register. A valid ciphertext is gener-
ated when counter register hits Nr.
5.3.2 Loop Unrolled
The loop unrolled configuration replicates round (and
kround) resources u times to execute multiple rounds
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 13
plaintext
Micro-
Architecture
Specific
hardware
Round 1
Round 2
Round 1
Round 2
Round 1
Round 1a
Round 1b Round 2a
Round 2b
Round 1a
Round 1b
Round Nr
Round 1
Round 2
.
.
.
Fig. 7 Various parallel microarchitecture implementations supported by RunFein a) Loop folded b) Unrolled by 2 c) Fully
unrolled d) unrolled by 2 with pipeline e) Subpipelined once f) Subpipelined once and unrolled by 2 with pipeline
Fig. 8 Controller for Various microarchitecture implementations supported by RunFein a) Loop folded b) Unrolled by u c)
Unrolled by u with pipeline d) Subpipelined by s e) Subpipelined by s and unrolled by u with pipeline f) Bitsliced with Sb
in one clock cycle, where u is the unrolling factor. Con-
sequently the critical path of the circuit increases, de-
creasing the maximum operational frequency, the area
also increases. The counter increments by u per cycle
since the design require Nr/u cycles for encryption of a
complete block (Nr/u not being a fraction), as shown in
Fig. 8 b). A higher throughput performance is expected
since the propagation delay and the register setup time
come only once in the combinational delay for u rounds.
This gain in throughput is hard to enumerate without
experimentation hence synthesis profiling is required (a
twice unrolled hardware configuration is shown in Fig. 7
b). Two critical design points relevant to the loop un-
rolling are
– A fully unrolled architecture with u = Nr encrypts/
decrypts of data in a single cycle (Fig. 7 c). The
RunFein hardware generation engine optimizes the
hardware for the round final if it is different from
the round middle. The round middle hardware is
replicated (u − 1)-times following the hardware for
round final instantiated once.
– A loop unrolling with pipelining architecture can be
chosen by the user to automatically insert pipeline
registers between unrolled rounds. Consequently,
the critical path of the design also does not increase
due to unrolling, this design handlesmultiple blocks
of data simultaneously. Configuration in Fig. 7 d)
processes two blocks of data in a total of Nr cycles
boasting throughput by u. A supplementary counter
or s cnt keeps track of the unroll factor, which when
fulfilled generates the load signal for counter to in-
crement directly by u (Fig. 8 c). Hence in subsequent
cycles, u-many valid ciphertexts are generated when
counter equals the roundcount.
5.3.3 Subpipelining
Using RunFein the user may choose to insert a sub-
pipeline between any two layers in a round to reduce
the critical path of the design. To ensure data consis-
tency, for s subpipelines inserted in a cipher round, an
equal number of subpipelines should be specified by the
user to be inserted in kround as well. To do so, the user
must check subpipelining option on to be able to insert
various operations along with a subpipeline register as
shown in the GUI snapshop (Fig. 14). Insertion of each
subpipeline increments the number of multiple blocks
being processed, i.e., s subpipelines make the cipher de-
sign handle (s+1) data blocks simultaneously (for s = 1
Fig. 7 e). A supplementary register s cnt inserted keeps
track of the subpipeline (Fig. 8 d). If the user wishes to
insert a subpipeline within a layer, he must first redefine
that layer as two layers, split at the cut-set point.
5.3.4 Hybrid Microarchitectures
Using RunFein the user may opt for some hybrid par-
allel microarchitecture configurations supporting both
subpipelining and unrolling. Fig. 7 f) shows a hybrid
14 Ayesha Khalid et al.
microarchitecture with subpipeline (s = 1) and un-
rolling with pipeline by a factor (u = 2). Its a multiple
block configuration, handling 4 data blocks simultane-
ously. Consequently, the controller needs a supplemen-
tary register s cnt to keep track of the total iteration
count (Fig. 8 e).
5.3.5 Bitslicing
Through bitslicing, RunFein tiles the parallel loop
folded architecture to work on Sb bits at a time
(Sb < SB). Consequently the design has lower area and
lower throughput, a technique especially interesting for
lightweight block ciphers. In most of the SPN ciphers,
SBoxes account for a significant area portion, e.g., more
than 30% of the PRESENT-80 loop folded implemen-
tation area is contributed by its 17 SBoxes [12]. Hence,
Sb is generally taken as SW or a multiple of it.
The krounds and rounds are sliced to operate the
task of one cycle in SB/Sb cycles. The controller of the
bitsliced architecture changes so that the counter in-
crements once after the s cnt hits SB/Sb. The encryp-
tion of one block requires Sb × Nr cycles as shown in
the Fig. 8 f. The d state and k state are shift registers
(with parallel load/ stores possible), with shift gran-
ularity of Sb. Hence the operations of each layer in a
round is performed on Sb bits and the result is stored
in d state shift register. Similar to the bit slicing of S-
boxes, operations like XOR, Addidion (with carry bit)
can be bit sliced. However, for some operations, the op-
eration slicing requires large extra selection logic, e.g.,
PBoxes, rotation. Since these bit manipulation opera-
tions (when performing in parallel configurations) have
no logic overhead, its wiser not to bitslice them.
Fig. 9 Bitsliced implementation of PRESENT-80
RunFein takes the bislice factor (Sb) of a cipher and
after evaluation the validity of the design generates
a bit-sliced implementation. Fig. 9 shows a bitsliced
Sb = 4 PRESENT-80 implementation requiring Sb/SW
(1) SBox per round, shared between k round and round
calculations. A similar design has been presented for
smallest area footprint of PRESENT-80 in [13]. Since
the Key expansion is generally non-expensive in terms
of resources, bit slicing is not applied to krounds. Hence
the key is loaded in SK/Sb cycles in k state shift reg-
ister but a subkey is calculated in a single cycle. For
the round calculation, 4 bits are xored with one key
nibble and passed through the SBox in each cycle. As
PBox is not bit sliced round calculation requires SB/Sb
cycles plus one for PBox calculation. Since the key ex-
pansion requires only one SBox, the round and kround
share one. Through RunFein, the bisliced and optimized
designs of ciphers having compact HDL implementa-
tion are generated. Values of Sb higher than SW can
be explored through for intermediate design points be-
tween parallel implementation and smallest one with
Sb = SW .
5.3.6 Microarchitecture Validation Checks
When the user desires the LISA based HDL generation,
the cipher configuration and selected microarchitecture
undergoes following checks.
– The unroll factor, u should be a multiple of round
count, Nr = k.u, where k ≥ 1.
– The number of subpipelines s inserted in round
should be the same as that of the ones inserted in
k round. The user may specify dummy subpipelines
at the end of the round or k round to balance the
latency.
– The bitslice width Sb should be a multiple of SW
and a factor of SB . Hence for PRESENT-80 the user
gets the option of Sb = 4, 8, 16, 32.
– Any microarchitecture handling multiple data
blocks cannot be designed to have non-parallel en-
cryption or decryption mode of operation as indi-
cated in the Table 2. For example, in OFB mode,
the microarchitecture for encryption and decryption
should not be subpipelined.
– Bitslicing cannot be combined with any other mi-
croarchitecture to generate a hybrid configuration.
5.3.7 RunFein Limitations
We list here some microarchitectural limitations of
RunFein.
– Both software and hardware implementations gen-
erated by RunFein follow the on-the-fly key ex-
pansion methodology. Alternatively, subkey pre-
computation requires large memory for storing
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 15
SSK×Nr bits of data. Additionally, the delay of sub-
key computation has to be incurred whenever a new
key is used. RunFein does not pre-compute subkeys,
however, converting the generated code to precom-
puted keys approach requires only trivial tweaking.
– Ciphers requiring unequal number of iterations for
round and krounds cannot be implemented using
RunFein. Though this is uncommon for most of to-
days ciphers, the exceptions are AES-192/256 con-
figurations.
– For ciphers having Mix column as a diffusion
operations, bitslicing requires large multiplexing
logic whose overhead exceeds the potential saving
achieved by bit-slicing [47]. Currently, RunFein does
not support a bitsliced microarchitecture for cipher
with Mix Column operation (e.g., AES). For ciphers
with PBoxes, a parallel execution of PBox operation
is performed instead of a bitsliced implmentation as
discussed in the previous section (for PRESENT-
80).
– Currently, RunFein does not support a unified mi-
croarchitecture performing both encryption/ de-
cryption.
6 Experimental Results and Analysis
Using RunFein we implemented the software real-
izations of PRESENT (80, 128), AES (128), KLEIN
(64, 80, 96) and LED (64, 128). The software efficiency
in terms of lines of code and execution time has already
been discussed in [26]. The randomness test using NIST
test suite was also successfully conducted by generating
long streams of encrypted data in CBC, PCBC, OFB
and CFB modes of operation.
6.1 Hardware Implementation and Benchmarking
We implemented various hardware microarchitectures
for PRESENT-80 and AES-128. The generated high
level design description models in LISA were tested
with its software tools generated by Synopsys Processor
Designer (version 2013.06-SP3) including Compiler,
Assembler, Linker, Profiler and Debugger. The gen-
erated synthesizable Verilog HDL based implementa-
tions were tested for correctness usingMentor Graphics
ModelSim (version 10.2c). All designs were synthesized
with Synopsys Design Compiler (version G-2012.06) to
have area, power and maximum frequency profiling. De-
sign synthesis was carried out using the Faraday stan-
dard cell libraries in topographical mode with area op-
timization in mind. The area figures were converted
to equivalent NAND gates (GE). We used three tech-
nology nodes for synthesis, namely, UMC L180E High
Speed FSG Generic II Process 0.18µm CMOS, UMC
L90 Standard Performance Low-K (Regular VT) Pro-
cess 90nm CMOS and UMC SP/RVT Low-K process
65nm CMOS. The foundry typical values (of 1.8 Volt
for the core voltage and 25◦C for the temperature) were
used. The power consumption is estimated by Synopsys
Primetime (version 2009.12) based on gatelevel netlist
switching activity by back annotation.
6.2 Microarchitectures for PRESENT-80
For lightweight block ciphers, low operating frequen-
cies are more relevant due to their stringent power con-
straints, hence 100 KHz clock frequency is considered;
results at 10 MHz are also reported. At 100 KHz, our
RunFein generated PRESENT-80 encryption only loop
folded implementation has a throughput of 200 Kbps
and occupies 1649 GE for 65nm CMOS technology li-
brary as indicated by the first row of Table 4. The power
and area results for the same loop folded implementa-
tion, synthesized at 10 MHz are indicated in the first
row of Table 5.
For comparison with the manually optimized re-
ported implementations, we take up the results for loop
folded PRESENT-80 encryption estimates in [13] with
three different CMOS technology libraries as indicated
by the first column of Table 6. This implementation on
180nm reportedly consumes 1650 and 1706 gates at 100
KHz and 10 MHz, respectively. Our implementation,
on a comparable technology library, consumes 1750 for
both 100 KHz and 10 MHz operating frequency, mak-
ing our results having 100 and 46 gates more, respec-
tively [13]. This area-gap is far too small to be consid-
ered an overhead and possibly can be attributed to the
difference in the vendor libraries, synthesis optimiza-
tions settings or different versions of synthesis tool.
Table 6 PRESENT-80 bitsliced encryption @ 100 KHz
Area (GE) RunFein Area (GE) [13]
(Sb) 65nm 90nm 180nm 180nm 250nm 350nm
64 1649 1519 1751 1650 1594 1525
32 1462 1379 1602 - - -
16 1264 1203 1403 - - -
8 1182 1121 1313 - - -
4 1107 1081 1265 1075 1169 1000
6.2.1 Bitslicing
For bitslicing, we generated implementations with vari-
ous possible bitslice width, i.e., Sb = 4, 8, 16, 32. Conse-
quently the reduction in area, power and throughput is
seen as a trend on 65nm CMOS technology library and
16 Ayesha Khalid et al.
Table 4 PRESENT-80 encryption bitsliced implementation results for 65 nm CMOS tech. library @ 100 KHz
Bitsliced Cycles SBoxes Area (GE) Power (uW) Throughput
width (Sb) /round used Combinational Sequential Total Static Dynamic Total (Kbps)
64 1 16+1 896.25 752.50 1648.75 10.28 424.54 434.82 200.00
32 3 8 693.25 768.75 1462.00 9.84 121.85 131.69 66.67
16 5 4 488.25 775.50 1263.75 9.12 68.23 77.34 40.00
8 9 2 396.50 785.50 1182.00 8.98 33.46 42.44 22.22
4 17 1 128.75 978.50 1107.25 8.60 32.54 41.14 11.76
Table 5 PRESENT-80 encryption bitsliced implementation results for 65 nm CMOS tech. library @ 10 MHz
Bitsliced Cycles SBoxes Area (GE) Power (uW) Throughput
width (Sb) /round used Combinational Sequential Total Static Dynamic Total (Kbps)
64 1 16+1 891.00 752.50 1643.50 10.28 454.57 464.84 20.00
32 3 8 694.25 768.75 1463.00 9.84 150.09 159.92 6.67
16 5 4 486.00 777.00 1263.00 9.12 95.30 104.42 4.00
8 9 2 396.50 785.50 1182.00 8.98 54.92 63.90 2.22
4 17 1 128.75 978.50 1107.25 8.60 55.51 64.11 1.18
an operating frequency of 100 KHz in Table 4 and 10
MHz in Table 5. Fig. 10 and Fig. 11 graphically show
the trade-off design points for area and power saving,
respectively, against the loss in throughput for various
Sb widths.
Fig. 10 PRESENT-80 bitsliced encryption area throughput
trade-off @ 100 KHz
Fig. 11 PRESENT-80 bitsliced encryption area power
trade-off @ 100 KHz
For comparison we take up the the smallest reported
area for PRESENT-80 by hand-crafted implementation,
requiring 1000 gates [13]. Their implementation area
footprints for Sb = 4 on various technology libraries
are reported in Table 6. For the same operating fre-
quency (and consequently same throughput), our area
estimates when synthesized on 90nm technology library
come as close as 1081 GE. The implementation results
for PRESENT-80 with higher bit sliced widths have
not yet been reported. RunFein accelerates exploration
of these intermediate design points by enabling proto-
typing of bitsliced architectural customizations. Some
novel results are presented in Fig. 10 and Fig. 11 for
resources-performance trade-off.
6.2.2 Unrolling without Pipelining
Using RunFein we employ various unroll factors for the
32 rounds of PRESENT-80 encryption design. Table 7
gives the area, power and throughput estimates when
the design is unrolled by various factors. A fully un-
rolled design achieves the highest Throughput Per Area
Ratio, however also consumes the most area and power
in comparison.
Table 7 PRESENT-80 encryption unrolled implementa-
tions, 65nm @ 100 KHz (Tp is Throughtput)
Unroll Cycles Area Power Tp
factor, u / block (GE) (uW) (Mbps)
1 32 1648.75 434.82 0.20
2 16 2279.75 450.00 0.40
4 8 3396.25 494.08 0.80
8 4 5859.75 522.12 1.60
16 2 10712.25 645.61 3.20
32 1 19817.75 754.00 6.40
6.2.3 Subpipelining
Through subpipelining we generated some novel high
throughput realizations of PRESENT-80 cipher that
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 17
have not been reported till date. For a loop folded
implementation , the maximum operating frequency is
profiled to be 3.7 GHz as indicated by the Table 8. We
subpipeline it twice for achieving high throughput per-
formance.
– First Subpipeline: The critical path for the loop
folded implementation (Fig. 5 left) exists from the
k state register, through the 3 round layers, the
multiplexer and till the d state register. Since PBox
poses no combinational delay due to rewiring, its
prudent to break the critical path by inserting a
subpipeline between layer0 and layer1 of the ci-
pher, shown by the single dotted line in Fig. 5.
A corresponding subpipeline between layer1 and
layer2 of the k round is also opted. Consequently
the subpipelined circuit’s operating frequency in-
creases, raising the throughput to 8.1 Gbps.
– Second Subpipeline: The critical path now exists be-
tween the subpipeline register and the d state reg-
ister in round. For a further increase in the op-
erating frequency we break this critical path be-
tween layer1 and layer2 of round by a second
subpipeline (with a corresponding subpipeline be-
tween layer0 and layer1 of k round) as shown by
double dotted lines in Fig. 5. The corresponding
operating frequency however decreases. This is at-
tributed to the supporting control hardware inserted
to tackle the 2 subpipelines. A 2-bit supplemen-
tary counter (s counter) counting up to the number
of subpipelines is inserted in addition to the 5-bit
counter for rounds. The critical path now exists in
the controller, i.e., between s counter and counter,
prohibiting further speedup by pipelining.
Table 8 PRESENT-80 subpipelined encryption results for
65nm CMOS, (Tp is Throughtput)
s Max Freq. Area (GE) Tp
(GHz) comb. seq. total (Gbps)
0 3.71 2502.75 856.00 3358.75 7.42
1 4.05 2320.75 1803.50 4124.25 8.1
2 4 2818.75 2657.00 5475.75 8.0
6.3 Microarchitectures for AES-128
For comparison of RunFein generated realization for
AES-128 with a similar architecture hand-crafted re-
alization, we took up the RTL implementation of a
loop folded AES-128 encryption core available at Open
Cores [37]. Since RunFein does not register the I/Os
of the cipher implementation, we removed the registers
for plaintext and ciphertext from open cores RTL for
enabling equitable comparisons. Both of these RTL re-
alizations were synthesized using the 65nm technology
library with same versions of synthesis tools and set-
tings at 10 MHz and 100 MHz operating frequencies,
the area footprints obtained are comparable as shown
in Table 10.
The area overhead of around 5% for the opencores
RTL is attributed to its several differences compared
to RunFein design. Firstly, instead of putting a multi-
plexer for bypassing the GF-mul stage in AES round,
a separate layer of 128 bit XORs is inserted to get the
ciphertext after the last round. Secondly, it maintains a
32-bit register to retain RCON value from a LUT, Run-
Fein has no register for that. The consequent sequential
area overhead can be seen in Table 10. Thirdly, it does
not reuse the 32 bit XORs for calculation of keywords in
layer3 till layer6 of key rounds. Consequently, 5 XORs
(32 bits each) are used for least significant keyword, 4
XORs for the words next to it and so on. RunFein uses
only 5 XORs in total for that, consequently their area
overhead for combinational logic is higher.
Table 10 AES-128 encryption results for 65nm CMOS
Source
Op. freq. Area (GE)
(MHz) Comb. Seq. Total
Opencores [37]
10 14540 1389 15929
100 14600 1389 15989
RunFein
10 13825 1300 15125
100 13867 1300 15167
6.3.1 Unrolling without Pipelining
The loop based AES-128 implementation may be un-
rolled by a factor of 2, 5 or 10 for a potential increase in
the throughput performance of the design. Table 9 gives
the increase in area and consequently the throughput
improvement when the design is unrolled and profiled
for the maximum achievable frequency. Interestingly,
the highest throughput/ area efficiency of the design is
achieved with unroll factor 2. For higher values of loop
unrolling, the gain in throughput is diminished by the
large number of SBoxes and wide bus based selection
circuitry.
6.3.2 Subpipelining
For a loop folded generated implementation of AES-
128, the maximum operating frequency is profiled to
be 1.65 GHz as indicated by the Table 11. The crit-
ical path is found to exist from the d state register,
through the 4 round layers, the multiplexer and back
to the d state register. To break this critical path we in-
dicate RunFein to place a subpipeline between layer0
18 Ayesha Khalid et al.
Table 9 AES-128 unrolled encryption implementation results for 65nm CMOS tech. library
Unroll no. of Max. Freq Area (GE) Throughput Throughput/Area
factor (u) SBoxes (GHz) Combinational Sequential Total (Gbps) (Mbps/GE)
1 16+4 1.65 54666.00 1461.50 56127.50 21.12 14.45
2 32+8 0.90 120293.50 1449.75 121743.25 23.04 15.89
5 80+20 0.30 169780.25 1406.75 171187.00 19.2 13.65
10 160+40 0.12 704315.25 1474.25 705789.50 15.36 10.42
and layer1 of the cipher round and a corresponding
pipeline between layer1 and layer2 of the k round, as
shown by the single dotted line in Fig. 5. The RTL
for the pipelined architecture is profiled to operate on
a frequency as high as 2.25 GHz, with a 28.8 Gbps
of throughput. The critical path now exists between
d state register and the pipeline register, i.e., the SBox
layer. A further exploration of breaking critical path
is possible by partitioning the SBox tables into 2 or
more levels (instead of using one 256 entry SBox we
use 8 with 32 entry SBoxes) and inserting pipelining
in between. Similarly, the Galois field inversion of the
S-box using sub-fields of 4, 2 bits can be used for lower
area footprints. The required multiple layers of oper-
ations for sub-fields inversion and operations can be
subpipelined for achieving higher performance [46].
Table 11 AES-128 subpipelined encryption results for 65nm
CMOS
s max freq. Area (GE) Throughput
(GHz) comb. seq. total (Gbps)
0 1.65 54666 1461.50 56127.50 21.12
1 2.25 49896 3464.75 53360.75 28.8
7 Conclusion and Future Work
We present RunFein, an extensible framework for the
rapid prototyping of block ciphers into customizable
hardware and software implementations. It offers a so-
phisticated design capture of the algorithmic and struc-
tural specifications of a cipher by the user through a
GUI. The algorithmic design requires specification of
layers of atomic operations for key expansion and round
transformations. The hardware implementation is aided
by a commercial high-level synthesis framework. The
architectural specifications of a loop folded configura-
tion of cipher is automatically transformed by Run-
Fein according to the microarchitecture configuration
specified by the user (loop unrolling, bitslicing, sub-
pipelining). A thorough design viability is validation
before design rapid prototyping. We took up some no-
ticeable block ciphers with various different architec-
tural specifications for implementation using RunFein.
Equitable comparisons for area-throughput-power were
carried out. Our results rivals the best available hand-
written IP cores. Additionally, some novel optimiza-
tion’s results for PRESENT-80 (bitslicing) have also
been reported.
RunFein’s high-level design approach eliminates the
laborious development efforts for VLSI realization/ ver-
ification of block ciphers. It aids the cryptographic com-
munity by enabling speedy benchmarking against criti-
cal resources like area, throughput, power, latency and
allows allows design exploration of various microar-
chitectural design alternatives. We see RunFein as a
first instance of a tools framework suite for high-level
realization of domain-specific cryptography functions
(block ciphers). Extensions to other cryptograhpic func-
tions would follow. We are enthusiastic to extend this
work in various directions.
– A similar rapid prototyping tool for stream ciphers,
called RunStream, is in the pipeline.
– Inclusion of cryptanalytic tools for block ciphers ci-
phers is intended.
– An automatic software generation of parallel pro-
gramming for GPU-accelerated machines is on the
roadmap.
– We plan to take up unified hardware microarchi-
tecture supporting both encryption/ decryption of
ciphers.
References
1. A. Rukhin, J. Soto, J. Nechvatal, M. Smid and E. Barker.
A Statistical Test Suite for the Validation of Random Num-
ber Generators and Pseudo Random Number Generators
for Cryptographic Applications, NIST Special Publication
800-22 Available at csrc.nist.gov/groups/ST/toolkit/
rng/documents/SP800-22b.pdf.
2. Advanced encryption standard. Federal Information Pro-
cessing Standard, FIPS-197 (2001): 12.
3. Secure Hash Standard (SHS) In FIPS PUB 180-4, In-
formation Technology Laboratory National Institute of
Standards and Technology Gaithersburg, March 2012
Available at http://csrc.nist.gov/publications/fips/
fips180-4/fips-180-4.pdf.
4. D. J. Wheeler, R. M. Needham. TEA, a tiny encryption al-
gorithm. In Fast Software Encryption, Springer Berlin Hei-
delberg, pp. 363-366, January 1995
5. H. Wu. The stream cipher HC-128. Available at http://
www.ecrypt.eu.org/stream/p3ciphers/hc/hc128_p3.pdf.
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 19
6. L. Batina, J. Lano, N. Mentens, S. B. Ors, B. Preneel and
I. Verbauwhede. ”Energy, performance, area versus security
trade-offs for stream ciphers. In The State of the Art of
Stream Ciphers” In The State of the Art of Stream Ciphers,
ECRYPT Workshop Record, pp. 302-310, 2004
7. A. Chattopadhyay and G. Paul. ”Exploring security-
performance trade-offs during hardware accelerator de-
sign of stream cipher RC4.” In VLSI and System-on-Chip
(VLSI-SoC), 2012 IEEE/IFIP 20th International Confer-
ence on. IEEE, 2012.
8. eSTREAM: the ECRYPT Stream Cipher Project. Avail-
able at http://www.ecrypt.eu.org/stream.
9. F. Chabaud and A. Joux. ”Differential collisions in SHA-
0.” In Advances in CryptologyCRYPTO’98, Springer Berlin
Heidelberg. pp. 56-71,1998
10. Break DES in less than a single day In Press
release demonstrated at a 2009 workshop Available
at http://www.sciengines.com/company/news-a-events/
74-des-in-1-day.html.
11. S. Maitra and G. Paul. ”Analysis of RC4 and proposal of
additional layers for better security margin.” In Progress in
Cryptology-INDOCRYPT, pp. 27-39. Springer Berlin Hei-
delberg, 2008.
12. A. Bogdanov, L. R. Knudsen, G. Le, C. Paar, A.
Poschmann, M. J. B. Robshaw, Y. Seurin and C. Vikkelsoe.
PRESENT: An Ultra-Lightweight Block Cipher. In Pro-
ceedings of CHES 2007.
13. Rolfes, C., Poschmann, A., Leander, G., and Paar, C.
Ultra-lightweight implementations for smart devicessecu-
rity for 1000 gate equivalents. In Smart Card Research and
Advanced Applications, Springer Berlin Heidelberg, pp. 89–
103, 2008.
14. J.Aumasson, L. Henzen, W. Meier and R. Phan. SHA-3
proposal BLAKE ver 1.3, 2010. Available at https://www.
131002.net/blake.
15. D. J. Bernstein. The Salsa20 family of stream ciphers.
In New Stream Cipher Designs: The eSTREAM Finalists,
Springer-Verlag, 2008, pp. 84–97.
16. M. Dworkin. Recommendation for block cipher modes of
operation. Methods and techniques. In NIST Special Pub-
lication 800-38A, 2001
17. C. Berbain, O. Billet, A. Canteaut, N. Courtois, H.
Gilbert, L. Goubin, A. Gouget, L. Granboulan, C. Lau-
radoux, M. Minier, T. Pornin and H. Sibert Sosemanuk, a
fast software-oriented stream cipher. In New Stream Cipher
Designs: The eSTREAM Finalists, Springer-Verlag, 2008,
pp. 98–118.
18. N. Ferguson, S. Lucks, B. Schneier, D. Whiting, M. Bel-
lare, T. Kohno, J. Callas and J. Walker. The Skein Hash
Function Family, Version 1.3. http://www.skein-hash.
info/sites/default/files/skein1.3.pdf, October 2010.
19. Barreto, P. and V. Rijmen,. The Whirlpool hashing func-
tion. In First open NESSIE Workshop, Leuven, Belgium,
Vol. 13, pp. 14–33, 2000
20. Authenticated encryption-security techniques In
ISO/IEC 19772:2009. Retrieved March 12, 2013.
21. R.-P. Weinmann. AXR - Crypto Made from Modular Ad-
ditions, XORs. In Dagstuhl Seminar 09031, January 2009.
Available at http://www.dagstuhl.de/Materials/Files/
09/09031/09031.WeinmannRalfPhilipp.Slides.pdf.
22. G. Bertoni, J. Daemen, M. Peeters and G. Van Assche.
Keccak sponge function family main document. Submission
to NIST, round 3, 2011.
23. R. Rivest. The MD5 Message-Digest Algorithm. In RFC
1321 by MIT Laboratory for Computer Science and RSA
Data Security, April 1992 Available at http://www.faqs.
org/rfcs/rfc1321.html.
24. D. Khovratovich and I. Nikolic´. Rotational cryptanalysis
of ARX. In Fast Software Encryption 2010, LNCS vol. 6147,
Springer, pages 333–346.
25. T. Iwata, K. Shibutani, T. Shirai, S. Moriai and T. Ak-
ishita. AURORA: A Cryptographic Hash Algorithm Fam-
ily. Submission to NIST, 2008.
26. A. Khalid, M. Hassan, A. Chattopadhyay and G. Paul.
RAPID-FeinSPN: A Rapid Prototyping Framework for
Feistel and SPN-Based Block Ciphers. In Information Sys-
tems Security (pp. 169-190). Springer Berlin Heidelberg.
2013
27. K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P.
Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker,
J. Shalf, S.W. Williams and K. A. Yelick. The land-
scape of parallel computing research: A view from berke-
ley. UCB/EECS-2006-183, EECS Department, University
of California, Berkeley
28. P. Dubey. Teraflops for the masses: Killer apps of tomor-
row. In Workshop on Edge Computing Using New Com-
modity Architectures (UNC), 2006, volume 23.
29. G. Leurent. ARX Toolkit. Available at http://www.di.
ens.fr/~leurent/arxtools.html.
30. N. Mouha, V. Velichkov, C. De Cannie´re and B. Pre-
neel. S-function Toolkit. Available at http://www.ecrypt.
eu.org/tools/s-function-toolkit.
31. M. Ernst, S. Klupsch, O. Hauck and S. A. Huss.
Rapid Prototyping for Hardware Accelerated Elliptic Curve
Public-Key Cryptosystems. In Proceedings of the 12th In-
ternational Workshop on Rapid System Prototyping (RSP
’01), 2001.
32. Akinyele, Joseph A., et al. Charm: A framework for
rapidly prototyping cryptosystems. In Journal of Crypto-
graphic Engineering, pp. 1-18, 2013
33. Lacy, John B., Donald P. Mitchell, andWilliamM. Schell.
CryptoLib: Cryptography in software. In Proc. of Fourth
USENIX Security Workshop, pp. 1-18, 1993.
34. K. Shahzad, A. Khalid, Z. E. Ra´kossy, G. Paul and
A. Chattopadhyay. CoARX: a coprocessor for ARX-
based cryptographic algorithms. In Proceedings of the
50th Annual Design Automation Conference (DAC ’13),
doi=10.1145/2463209.2488898, 2013.
35. A. Chattopadhyay, H. Meyr and R. Leupers. LISA: A
Uniform ADL for Embedded Processor Modelling, Im-
plementation and Software Toolsuite Generation. In P.
Mishra, N. Dutt (editors) Processor Description Lan-
guages, Morgan Kaufmann, pp. 95–130, 2008.
36. Synopsys Processor Designer. Available at
http://http://www.synopsys.com/Systems/BlockDesign/
processorDev/Pages/default.aspx.
37. Simple AES (Rijndael) IP Core http://opencores.org/
project,aes_core.
38. Announcing development of a federal information
processing standard for advanced encryption stan-
dard. National Institute of Standards and Technology,
Docket No. 960924272-6272-01, RIN 0693-ZA13, January
2, 1997. http://csrc.nist.gov/archive/aes/pre-round1/
aes_9701.txt
39. NESSIE: New European Schemes for Signatures, In-
tegrity, and Encryption IST-1999-12324, January, 2000.
https://www.cosic.esat.kuleuven.be/nessie/
40. CRYPTREC: Cryptography Research and Evaluation
Committees. Japanese Government Cryptographer Com-
petition. March 7, 2012. http://competitions.cr.yp.to/
cryptrec.html
41. SHA-3 Cryptographic Hash Algorithm Competition.
NIST competition for Secure Hash Algorithm, 2007. http:
//csrc.nist.gov/groups/ST/hash/sha-3/index.html
20 Ayesha Khalid et al.
42. CAESAR: Competition for Authenticated Encryption:
Security, Applicability, and Robustness A portfolio of au-
thenticated ciphers, 2013. http://competitions.cr.yp.
to/caesar.html
43. Third Round Report of the SHA-3 Cryptographic Hash
Algorithm Competition. National Institute of Standards
and Technology, NISTIR 7896, November 2012. Available
at http://nvlpubs.nist.gov/nistpubs/ir/2012/NIST.IR.
7896.pdf
44. Gaj, Kris, Jens-Peter Kaps, Venkata Amirineni, Marcin
Rogawski, Ekawat Homsirikamol, and Benjamin Y. Brew-
ster. ”ATHENa-automated tool for hardware evaluatioN:
toward fair and comprehensive benchmarking of crypto-
graphic hardware using FPGAs.” In Field Programmable
Logic and Applications (FPL), 2010 International Confer-
ence on, pp. 414-421. IEEE, 2010.
45. Menezes, Alfred J., Paul C. Van Oorschot, and Scott
A. Vanstone. ”Handbook of applied cryptography.” CRC
press, 1996.
46. Satoh, Akashi, et al. ”A compact Rijndael hardware ar-
chitecture with S-box optimization.” Advances in Cryptol-
ogyASIACRYPT 2001. Springer Berlin Heidelberg, 2001.
239-254.
47. Moradi, Amir, et al. ”Pushing the limits: A very com-
pact and a threshold implementation of AES.” Advances
in CryptologyEUROCRYPT 2011. Springer Berlin Heidel-
berg, 2011. 69-88.
Appendix
We present here some GUI snapshots of various tabs
of RunFein tool. CRYKET (CRYptographic Kernels
Toolkit) caters to rapid prototyping of various cryp-
tographic functions while RunFein is an instance of it
dealing with block ciphers.
RunFein: A rapid prototyping framework for Feistel and SPN based block ciphers 21
Fig. 12 Round layers operational specification for AES-128 in RunFein
Fig. 13 NIST Test Suite parameter selection tab in RunFein
Fig. 14 Microarchitectural specification for subpipelined implementation in RunFein (+ pipe specifies pipeline)
