Compact crypto implementations for embedded security by Verbauwhede, Ingrid
Ingrid Verbauwhede, K.U.Leuven - COSIC 1
IEEE Benelux - BCRYPT workshop – 1 June 24, 2010
Compact crypto implementations for 
embedded security 
or
crypto + embedded systems = security?
Ingrid Verbauwhede
KULeuven – COSIC
E-mail: ingrid.verbauwhede@esat.kuleuven.be
Slide Acknowledgements: 
Current & former Ph.D. students
IEEE Benelux - BCRYPT workshop – 2 June 24, 2010
Outline
• Merge: Embedded systems & Security
• Definition: secure embedded systems
• Illustrate with examples
• Challenges
• Conclusions
+
Benelux Embedded
Systems chapter
IUAP – Belgian Fundamental
Research on Cryptology and
Information Security
Ingrid Verbauwhede, K.U.Leuven - COSIC 2
IEEE Benelux - BCRYPT workshop – 3 June 24, 2010
Embedded security: definition (1)
Old Model (simplified view):
-Attack on channel between communicating parties
-Encryption and cryptographic operations in black boxes
-Protection by strong mathematic algorithms and protocols
IEEE Benelux - BCRYPT workshop – 4 June 24, 2010
Embedded security: definition (2)
New Model (also simplified view):
-Attack channel and endpoints
-Encryption and cryptographic operations in gray boxes
-Protection by strong mathematic algorithms and protocols
-Protection by secure implementation
Need secure implementations not only algorithms
Ingrid Verbauwhede, K.U.Leuven - COSIC 3
IEEE Benelux - BCRYPT workshop – 5 June 24, 2010
Embedded Security: definition (3)
NEED BOTH
• Efficient, lightweight implementations
– Within power, area, timing budgets
– Public key: 2048 bits RSA, 200 bit ECC on 8 bit C and 100 W
– Public key on a passive RFID tag 
• Trustworthy implementation
– Resistant to attacks
– Active attacks: probing, power glitches, JTAG scan chain
– Passive attacks: side channel attacks
IEEE Benelux - BCRYPT workshop – 6 June 24, 2010
Illustrate with examples
• Example 1: Secret Key: KATAN, KTANTAN
• Example 2: NIST SHA3 – how not to do it
• Example 3: Public key for RFID
Ingrid Verbauwhede, K.U.Leuven - COSIC 4
IEEE Benelux - BCRYPT workshop – 7 June 24, 2010
Secret key: KATAN, KTANTAN
Christophe De Cannière, Orr Dunkelman and 
Miroslav Knežević
CHES 2009
[slide courtesy: M. Knežević]
IEEE Benelux - BCRYPT workshop – 8 June 24, 2010
KATAN/KTANTAN Design Goal
• Minimum logic (i.e. gates) to implement a secret key algorithm
Alternatives:
• Stream ciphers
– To ensure security, the internal state must be twice the size of the key.
– No good methodology on how to design these.
• Use a standardized block cipher:  AES
– The smallest implementation consumes 3.1 Kgates.
– Designed for HW and SW implementations
• Other block ciphers?
– HIGHT, mCrypton, DESL, PRESENT,…
– Can we do better/different?
Ingrid Verbauwhede, K.U.Leuven - COSIC 5
IEEE Benelux - BCRYPT workshop – 9 June 24, 2010
Design Goals
• Secure block cipher
– Address Differential/Linear cryptanalysis, Related-Key/Slide 
attacks, Related-Key differentials, Algebraic attacks.
• Efficient block cipher
– Small foot-print, Low power consumption, Reasonable 
performance (+ possible speed-ups).
• Application driven
– Does an RFID tag always need to support a key agility?
– Some low-end devices have one key throughout their life 
cycle.
– Some of them encrypt very little data.
– Tune algorithm to the application!
IEEE Benelux - BCRYPT workshop – 10 June 24, 2010
KATAN/KTANTAN Block Ciphers
• Block ciphers based on Trivium
(its 2 register version – Bivium).
• Block size: 32/48/64 bits. 
• Key size: 80 bits.
• Share the same number of rounds – 254.
• KATAN and KTANTAN are the same up to the key 
schedule.
• In KTANTAN, the key is fixed and cannot be changed!
Ingrid Verbauwhede, K.U.Leuven - COSIC 6
IEEE Benelux - BCRYPT workshop – 11 June 24, 2010
Block Cipher – HW perspective
Block size
Key size
MemoryDatapath + Control“redundant” logic
IEEE Benelux - BCRYPT workshop – 12 June 24, 2010
Design Rationale – Memory Issues (1)
• For a more compact cipher, a larger ratio of the area is 
dedicated for storing the intermediate values and key bits.
Ingrid Verbauwhede, K.U.Leuven - COSIC 7
IEEE Benelux - BCRYPT workshop – 13 June 24, 2010
Design Rationale – A Story of a Single Bit
• Assume we have a parallel load of the key and the plaintext.
• A single Flip-Flop has no relevance – MUXes need to be used.
• 2to1 MUX + FF = Scan FF:  Beneficial both for area and power.
D Q
CK
0
1
SEL
clock
A_init
A[i‐1]
start
A[i]MUX2
7.25 ~ 13.75 GE
D Q
CK
TD
SEL
A[i]A_init
A[i‐1]
start
clock
≡
6.25 ~ 11.75 GE
A_init
5 ~ 7.75 GE
 (64 + 80 + 8) × 6.25 = 950 GE 
IEEE Benelux - BCRYPT workshop – 14 June 24, 2010
Design Rationale – Control Part
• How to control such a simple construction?
 IR stands for Irregular update Rule.
 We basically need a counter only. Can it be simpler than 
that?
 Let the LFSR that is in charge of IR play the role of a counter.
Ingrid Verbauwhede, K.U.Leuven - COSIC 8
IEEE Benelux - BCRYPT workshop – 15 June 24, 2010
KATAN32 – Control Part
7 6 5 4 3 2 1 0T 1-bit
ready
IEEE Benelux - BCRYPT workshop – 16 June 24, 2010
IR
L1
L2
K79
K78
KATAN32 – Round Function
K 60
5
9
4
9
4
8
1
2
1
1 0…
7
9
7
8 1………
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0
1
6
1
5
1
8
1
7
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0T 1-bit
Ingrid Verbauwhede, K.U.Leuven - COSIC 9
IEEE Benelux - BCRYPT workshop – 17 June 24, 2010
Implementation Results
• All designs are synthesized with Synopsys Design Vision 
version Y-2006.06, using UMC 0.13µm Low-Leakage CMOS 
library.
* Throughput is estimated for frequency of 100 kHz.
IEEE Benelux - BCRYPT workshop – 18 June 24, 2010
SHA3 – competition:
how not to do it
Ingrid Verbauwhede, K.U.Leuven - COSIC 10
IEEE Benelux - BCRYPT workshop – 19 June 24, 2010
“Flexibility” Requirements
• Wide range of 
platforms
• Wide range of 
message digests
[of course, also security requirements]
IEEE Benelux - BCRYPT workshop – 20 June 24, 2010
SHA-3: “cost” requirements
• Power 
consumption?
• Energy to hash 
one message?
Ingrid Verbauwhede, K.U.Leuven - COSIC 11
IEEE Benelux - BCRYPT workshop – 21 June 24, 2010
SHA3- results
• NIST asks for a Swiss knife
• But often you need a 
specialized knife
• Certainly for embedded 
applications
Bread knife
Surgeon’s 
knife
IEEE Benelux - BCRYPT workshop – 22 June 24, 2010
High-Throughput Implementations
http://ehash.iaik.tugraz.at/wiki/SHA-3_Hardware_Implementations
SHA-3, Fully Autonomous, Tillich et al. Benchmarking
SHA-3, Core Functionality, Various Authors
SHA-3, Fully Autonomous, Various Authors
* Scaled to 0.18µm CMOS technology
Ingrid Verbauwhede, K.U.Leuven - COSIC 12
IEEE Benelux - BCRYPT workshop – 23 June 24, 2010
SHA 3 gate counts
[slide courtesy: Miroslav Knežević]
IEEE Benelux - BCRYPT workshop – 24 June 24, 2010
SHA3 conclusion
• SHA3 Hash functions are HUGE compared to:
• Secret key algorithms
– AES: from 3000 gates and up
– KATAN: around 1000 gates
• Public key algorithms
– ECC: around 10.000 gates
• Throughput similar to
– High speed secret key implementations
NEED: domain specific hash functions
Ingrid Verbauwhede, K.U.Leuven - COSIC 13
IEEE Benelux - BCRYPT workshop – 25 June 24, 2010
Public Key: ECC for RFID
[slide courtesy: 
Yong Ki Lee, 
Lejla Batina]
IEEE Benelux - BCRYPT workshop – 26 June 24, 2010
Challenge 1: security problems
Scalability
26
Replay Attack
Anti-cloning
Privacy
…
EC-RAC
Protocol
Schnorr
Protocol
Okamoto
Protocol
DoS
Public key 
Crypto
Ingrid Verbauwhede, K.U.Leuven - COSIC 14
IEEE Benelux - BCRYPT workshop – 27 June 24, 2010
Challenge 1: Security problems
• Current RFID standards:
– No security
– Or simple self-destruct password (8 to 32 bits)
• Security challenges RFID:
– Anti-cloning (make it difficult to ‘copy’ RFID)
– Replay attack (query the tag and reuse that info)
– ‘Tracking’ attacks => privacy problems
– Scalability: security for large sets of tags 
– Backward/forward un-traceability
Needs Public key 
IEEE Benelux - BCRYPT workshop – 28 June 24, 2010
Challenge 2: design constraints
Side-channel 
Attacks  
28
Performance
Area
Power
Rabin
ECC/HECC
NTRUPublic key 
Crypto
Ingrid Verbauwhede, K.U.Leuven - COSIC 15
IEEE Benelux - BCRYPT workshop – 29 June 24, 2010
Challenge 2: constraints
Passive RFID tag:
• Area: less than 20.000 gates 
• Low Power: total budget varies from 50 to 100 
microWatt
• Budget for crypto: less than 15 microWatt!
• Clock frequency: factor of 13.56 MHz
• Execution time target: one point multiplication 
less than 250 msec.   
IEEE Benelux - BCRYPT workshop – 30 June 24, 2010
Design Steps
• Step 1: protocols
Security dreams
Physical realityPhysical reality
??
• Step 2: algorithms
• Step 3: arithmetic
• Step 4: processor
• Step 5: circuits
Ingrid Verbauwhede, K.U.Leuven - COSIC 16
IEEE Benelux - BCRYPT workshop – 31 June 24, 2010
Step 1: Solutions with Asymmetric-key 
Algorithms
• Conventional public-key authentication
– Schnorr protocol, Okamoto Protocol
– Vulnerable against the tracking attack
• GPS scheme
– A variant of Schnorr protool
– Secure transfer of a tag’s ID is not solved
• Rabin Encryption
– Requires a large key size and transmission
– A compact architecture : WiSec’09(Feldhofer,Oren)
IEEE Benelux - BCRYPT workshop – 32 June 24, 2010
A General EC Authentication Protocol 
(Schnorr protocol)
   
xPrPxr
rPrxrPrryPR




1
22
1
2121
1
21 )(
 A tag’s public key can be derived using exchanged messages
=> tracking attack
Ingrid Verbauwhede, K.U.Leuven - COSIC 17
IEEE Benelux - BCRYPT workshop – 33 June 24, 2010
Observation for RFID Protocols?
 Minimize the computation load on tags
 We need to transfer computation load to the reader/server 
as much as possible
 We cannot just transfer ID of a tag
 A tag’s ID is what we need to keep in secret to avoid 
tracking
 The protocol is a “many to one” protocol
 A tag’s public key (xP) does not need to be publicly known
 It can be securely stored and used in the server 
IEEE Benelux - BCRYPT workshop – 34 June 24, 2010
Step 2: EC based Security Processor
– Operations we need (e.g. EC-RAC)
• Modular Operation
– Modular Multiplication: rs1∙x1 (mod n)
– Modular Addition: rt1+rs1x1 (mod n) 
=>  Perform on a 8-bit specialized Micro-Controller
• EC Point multiplication
– rt1∙P, (rt1+rs1x1)∙Y
=> Perform on a 163 bit Elliptic Curve co-processor
8 bit versus 163 bit ?? modulo operations are less frequent 
and not time critical, hence multiplex mod operations
Ingrid Verbauwhede, K.U.Leuven - COSIC 18
IEEE Benelux - BCRYPT workshop – 35 June 24, 2010
Overall architecture
IEEE Benelux - BCRYPT workshop – 36 June 24, 2010
Step 3: EC Point Multiplication
a+b, a∙b, a2
Modular Arithmetic Operation
(Addition, Multiplication, Squaring)
P+Q, 2∙P
Point Addition,
Point Doubling
k∙P
Scalar 
Multiplication
Montgomery 
Algorithm
Lopez-Dahab
Algorithm
Sakiyama
Modular ALU
≈600 GE (Control)
* GE: Gate Equivalent (a 2-input NAND)
≈1.2k GE (Control)
+ 6×163 registers
≈900 GE (Control)
+ 3×163 registers
Total Area = 2.7 GE (Control) + 9×163 registers ≈ 80% !!
Ingrid Verbauwhede, K.U.Leuven - COSIC 19
IEEE Benelux - BCRYPT workshop – 37 June 24, 2010
Step 4: Optimization Approach
– Reduce Registers: 9→5 (4 registers reduction)
• Common Z-coordinate system : 1 register ↓
• Redesign Modular ALU : 1 register ↓
• Register reuse : 2 registers ↓
– ‘Point Add/Dbl algorithm’ and ‘Modular ALU’
– Reduce Multiplexer Complexity
• A special Circular Shift Register File
• Extra 30% reduction in the register file
– Side Channel Resistant
IEEE Benelux - BCRYPT workshop – 38 June 24, 2010
Modular ALU (MALU)
Multiplication: 163/d 
Squaring:       1 
Addition:       1
Operation   : Cycles
Share XOR array
Ingrid Verbauwhede, K.U.Leuven - COSIC 20
IEEE Benelux - BCRYPT workshop – 39 June 24, 2010
Circular Shift Register
Cost: need more cycles to get data in correct register …
Overall cost: less than 2% compared to point multiplication
IEEE Benelux - BCRYPT workshop – 40 June 24, 2010
Register File Management: shift example
Add
Square
Swap
Shift
Shift
Shift
Multiply
Ingrid Verbauwhede, K.U.Leuven - COSIC 21
IEEE Benelux - BCRYPT workshop – 41 June 24, 2010
Estimated numbers
• Results: ECC co-processor that can compute:
– ECC point multiplications (163 by 4)
– Scalar modular operations (8 bit processor with redundancy)
• Schnorr (secure ID transfer, but no tracking protection): one PM
• More advanced protocols: up to four PM on tag
• Technology: 0.13 micron CMOS low power version
• Size: d=4, 14.500 gates, 
• Time: 60.000 cycles for one PM
• Clock at 616 KHz, 97 msec for one PM at 22 microWatt
Area
[ECC Processor]
51%
20%
29%
Registe
r File
MALU
μC
IEEE Benelux - BCRYPT workshop – 42 June 24, 2010
Conclusion
Shows ONE path:
• Protocol design: 
randomized access
• Public key: ECC many 
design options
• Architecture: 8 bit micro 
& 163 EC processor
• Specialized register file
• Full custom layout
Security dreams
Physical realityPhysical reality
??
Ingrid Verbauwhede, K.U.Leuven - COSIC 22
IEEE Benelux - BCRYPT workshop – 43 June 24, 2010
Future work:
• Do we have all the required 
properties covered?
• Can privacy issues be 
optional, at least for some 
applications?
• Light weight crypto?
• How to store keys securely?
• Use physical properties: PUF 
based security ?
• What about side-channel 
security?
Security dreams
??
Physical realityPhysical reality
