Tree Parity Machine Rekeying Architectures by Volkmer, Markus & Wallner, Sebastian
ar
X
iv
:c
s/0
50
20
62
v1
  [
cs
.C
R]
  1
4 F
eb
 20
05
Tree Parity Machine Rekeying Architectures
Markus Volkmer and Sebastian Wallner
Hamburg University of Science and Technology
Department of Computer Engineering VI
D-21073 Hamburg, Germany
{markus.volkmer,wallner}@tuhh.de
Abstract
The necessity to secure the communication between hardware compo-
nents in embedded systems becomes increasingly important with regard
to the secrecy of data and particularly its commercial use. We suggest
a low-cost (i.e. small logic-area) solution for flexible security levels and
short key lifetimes. The basis is an approach for symmetric key exchange
using the synchronisation of Tree Parity Machines. Fast successive key
generation enables a key exchange within a few milliseconds, given realis-
tic communication channels with a limited bandwidth. For demonstration
we evaluate characteristics of a standard-cell ASIC design realisation as
IP-core in 0.18µ-technology.
1 Introduction
For embedded systems like handheld devices, smartcards, mobiles or other wire-
less communication devices security concepts need to be developed, in order
to keep privacy and still (commercially) exploit the merits of such devices in
widespread and everyday use [2]. This is, for example, of particular interest for
the smartcard- or RFID-industry, where the secrecy of data is directly linked
to the commercial prosperity of a product. Also, the economic importance to
secure information technology applications in the automotive area is becoming
eminent along with the protection of firmware, access control, anti-theft pro-
tection, up to scenarios like the hacking of vital vehicle functions such as an
antilock braking system (see e.g. [17]).
Yet, the often relatively small size and severe power consumption constraints
of these devices limit the available size for additional cryptographic hardware
components [4, 27, 28]. This holds in particular for sensor networks, RFID-
systems and near field communication devices. Secure hardware is thus es-
pecially demanded for ubiquitous and pervasive computing, and the need and
research efforts manifest in first conferences on security in pervasive computing
[26].
1
Hardware-cryptosystems are often based on hard-coded secret keys as the
basic secret. It is good common practice to obey the often cited Kerckhoffs
Principle [10] (‘no security through obscurity’) and not base the security of
a crypto-system on the secrecy of the device or algorithm it employs. The
security of a system is thus only as strong as the secrecy of the (fixed) keys.
But some of the most effective attacks on a crypto-system involve no ciphertext
analysis but instead find flaws in the key-management. Furthermore, insecure
bus communication as reported in [7] (regarding the video game console market),
allows attacks still above the chip level by sniffing internal buses. In embedded
system environments, functions are being realized (at least partly) in hardware
and often lack online system access. The changing of a fixed key, as any other
security update, is difficult or even impossible – i.e. too expensive.
The exchange of a common secret key over a public channel is dominated
by methods based on number theory since the invention of the Diffie-Hellman
key exchange protocol in 1976 [5]. Computational security is based on the dif-
ficulty of the discrete logarithm problem in El Gamal [6], which is considered
as difficult as the factorisation problem of a product of long prime numbers as
in RSA [20]. Such asymmetric algorithms need to perform a lot of computa-
tional intensive arithmetics on typically limited embedded microcontrollers. In
a particular GSM mobile phone, for example, two algorithms are combined to
meet performance requirements: an asymmetrical algorithm with a 1024 bit key
for key exchange and a symmetrical algorithm using only 128 bit for the key
and voice encryption [24]. This also demonstrates the often necessary tradeoff
between the level of security and the available resources.
The state-of-the-art, regarding applications in embedded systems, is repre-
sented by Elliptic Curve Cryptography and the generalisation to Hyper-Elliptic
Curves (see e.g. [18]). Without a reduction of the security, these representations
allow to reduce the size of the numbers to calculate with. Yet, more complex
expressions need to be calculated. After all, a (frequent) key exchange is often
of prohibitive cost, especially in the often changing topology of pervasive or
ad-hoc networks.
In this paper we present a small hardware solution for secure data exchange
with flexible security levels and short key lifetimes. It is based on a fast succes-
sive key generation and exchange process. We use a hardware-friendly algorithm
for secure symmetric key exchange by synchronisation of socalled Tree Parity
Machines [9]. We define architectures, using this key exchange concept, that
allow fast successive key generation and exchange. The key exchange ranges
within milliseconds for realistic channels and can be performed in parallel (or
multiplexed) to encryption and the encrypted communication process. Addi-
tionally, we provide the architectures with a flexible rekeying functionality to
enable full exploitation of the achievable exchange rates. This particularly in-
creases the cost for a successful immediate (online) attack, as opposed to a
subsequent (offline) analysis on recorded information. Our focus is on secure
data exchange between hardware components in embedded systems like RAM,
FLASH-type ROM, (co-)processors and on bus-communication in general. Envi-
ronments in which security can also be of moderate concern are also considered.
2
In the following, we introduce the neural network structure and a learning
algorithm (section 2), also in order to already point out advantageous proper-
ties for a hardware realization. The synchronization effect leading to the key
exchange property is explained. Algorithmic security implications on the re-
alization of our architecture are described in section 3. Section 4 comprises
the architectural design of the proposed hardware component with its rekeying
functionality. Here, we also refer to design decisions prepared in the previous
section. In section 5, we present results from an FPGA and an ASIC imple-
mentation on silicon area, possible clock and key exchange rates (throughput).
We conclude the paper in section 6 with a short summary and an outlook on
possible further extensions also referring to current research activities.
2 Tree Parity Machines for Key Exchange
In [9], Kinzel et. al. proposed a symmetric key exchange method based on the
fast synchronisation of two identically structured Tree Parity Machines (TPMs).
The particular tree structure has non-overlapping binary inputs, discrete weights
and a single binary output as depicted in Fig. 1a. Studying interacting neural
x21 x2N xK1 KNxx1N11x
w1j
A/B
w2j
A/B
OA/B
1
yA/B y2
A/B y
K
A/B
σ Σ(    ) σ Σ(    ) σ Σ(    )
Π
Kj
A/B
w
(a)
x
TPM
x
O
TPM
O
A
B
A
B
(b)
Figure 1: (a) The tree parity machine (TPM) generates a single output – the
parity of the outputs of the hidden units. (b) For mutual learning, outputs on
commonly given inputs are exchanged between the two parties A and B.
networks in general (cf. [14, 11, 3]), the authors focused the phenomenom of fast
synchronization by mutual learning TPMs and its potential for a cryptographic
approach, not involving large numbers and methods from number theory. Their
exchange protocol is realized implicitly by a mutual adaptation process between
the two parties A and B, not involving large numbers and methods from number
theory.
3
2.1 Structure of a Tree Parity Machine
In the following, we describe the implemented parallel-weights version using
hebbian learning (cf. [9]). Weights are identical in both TPMs after synchroni-
sation. The anti-parallel-weights version, using anti-hebbian learning and lead-
ing to inverted weights at the other party, is omitted for brevity. The notation
A/B denotes equivalent operations for the parties A and B. A single A or B
denotes an operation which is specific to one of the parties.
The TPM (see Fig. 1a) consists of K hidden units (1 ≤ k ≤ K) in a single
hidden-layer with non-overlapping inputs (the tree structure) and a single unit
in the output-layer.
Each hidden unit receives different N inputs (1 ≤ j ≤ N), leading to an
input field of size K · N . The vector-components are random variables with
zero mean and unit variance. The output OA/B(t) ∈ {−1, 1}, given bounded
weights wA/Bkj (t) ∈ [−L,L] ⊆ Z (from input unit j to hidden unit k) and common
random inputs xkj(t) ∈ {−1, 1}, is calculated by a parity function of the signs
of summations:
O
A/B(t) =
K∏
k=1
y
A/B
k (t) =
K∏
k=1
σ(αA/Bk (t)) =
K∏
k=1
σ
(
N∑
j=1
w
A/B
kj (t) xkj(t)
)
. (1)
The common random inputs can also be kept secret between the parties, yield-
ing authentication (see Section 2.2). σ is a party-specific modified sign-function,
that defines an agreement between the two parties on an opposite sign in case
of a sum αA/Bk (t) ∈ Z of zero:
σ(αA/Bk (t)) :=
{
1 , αA/Bk (t) > 0 ∨ α
A
k (t) = 0
−1 , αA/Bk (t) < 0 ∨ α
B
k (t) = 0 .
(2)
From the communicated output, the outputs of the hidden units cannot be
uniquely determined. There are multiple combinations for a signed or unsigned
output, depending on the number of hidden units K.
2.2 Key Exchange by Mutual Learning and Synchronisa-
tion
The so-called bit package variant was chosen for implementation (cf. [9]). Due
to an reduction of (physical) output exchanges by an order of magnitude, it
is advantageous for practical communication channels with a certain protocol
overhead.
Parties A and B start with an individual randomly generated initial weight
vector wA/Bkj (t0) – their secret. After a set of b > 1 presented inputs, where b
denotes the size of the bit package, the corresponding b TPM outputs (bits)
OA/B(t) are exchanged over the public channel in one package (see Fig. 1b).
The b sequences of hidden states yA/Bk (t) ∈ {−1, 1} are stored for the subsequent
learning process.
A hebbian learning rule is applied to adapt the weights, using the b outputs
and b sequences of hidden states. They are changed only on an agreement on the
4
parties’ outputs. Furthermore, only weights of those hidden units are changed,
that agree with this output:
w
A/B
kj (t) :=


w
A/B
kj (t− 1) +O
A/B(t) xkj(t) , O
A(t) = OB(t) ∧
OA/B(t) yA/Bk (t) > 0
w
A/B
kj (t− 1) , otherwise.
(3)
Updated weights are bound to stay in the maximum range [−L,L] ⊆ Z by
reflection onto the boundary values
w
A/B
kj (t) :=
{
sign
(
w
A/B
kj (t)
)
L , |wA/Bkj (t)| > L
w
A/B
kj (t) , otherwise.
(4)
In iterating the above procedure, each component of the weight vectors per-
forms a random walk with reflecting boundaries. This implies a trajectory in
a weight space of (2L+ 1)KN points. Two corresponding components in wAkj(t)
and wBkj(t) receive the same random component of the common input vector
xkj(t). After each bounding operation (Eq. 4), the distance between the com-
ponents is successively reduced to zero. Synchrony is achieved when both parties
have learned to produce each others outputs. They remain synchronised (see
learning rule Eq. (3)) and continue to produce the same outputs on every com-
monly given input. This effect in particular leads to common weight-vectors
in both TPMs in each of the following iterations. These weights have never
been communicated between the two parties and can be used as a common
time-dependent key for encryption and decryption respectively. Such secret key
agreement based on interaction over a public insecure channel is also discussed
under information theoretic aspects by Maurer [13], especially with regard to
unconditional security. Furthermore, synchrony is achieved only for common in-
puts. Thus, keeping the common inputs secret between A and B can be used to
have an authenticated key exchange. There are 2KN − 1 possible inputs in each
iteration, yielding as many possible initialisations for a pseudo random num-
ber generator. Shamir et al. conferred to such a synchronization over multiple
rounds as a gradual type of Diffie-Hellman key exchange [12], because Diffie-
Hellman has a single round that transmits several bits. Obviously, a test for
synchrony cannot practically be defined by checking whether weights in both
nets have become identical. One therefore tests on successive equal outputs in
a sufficiently large number of iterations tmin, such that equal outputs by chance
are excluded:
∀t ∈ [t′, · · · , t′ + tmin] : O
A(t) = OB(t) . (5)
The number of outputs (bits) required to achieve synchronisation is lower than
the size of the key [16]. Synchronisation time is finite for discrete weights. It is
almost independent on N and scales with lnN for very large N . Furthermore,
it is proportional to L2 [16]. Our investigations/experiments confirmed that
the average synchronisation time is distributed and peaked around 400 for the
parameters given in [9].
5
3 Security and Rekeying Functionality
The symmetric key-exchange protocol can generate long keys by fast calculations
and building the secure channel is of linear complexity. It scales with the size
K · N of the TPM structure [9], which defines the size K ·N · L of the key.
In order to still allow comparisons with the literature we chose L = 4 for our
implementation. The time to synchronise roughly doubles in comparison to
L = 3, while for the attacker the same time increases by orders of magnitude
(see [21]).
The security of the key exchange manifests in algorithm-specific properties
and can be fully exploited by appropriate hardware design. Next to other general
(algorithmic) aspects on the security of the exchange method as described in
[9, 12] , the tracking of the weights is hard in comparison to synchronisation –
practically even harder when implemented in hardware.
The key exchange protocol has been attacked by several eavesdropping ap-
proaches, which always require full knowledge of the TPM structure in use. We
will describe their basic properties in order to clarify the security aspect. Due to
the nature of the key exchange and its attacks, only probabilistic definitions of a
‘successful attack’ can be provided. One can distinguish between two classes of
attacks. The first class comprises attacks, that can be defeated by appropriately
increasing the parameter L. The consequence is, that the learning time of an
attacker is significantly longer than the synchronisation time. The security in-
creases proportional to L2 while the probability of a successful attack decreases
exponentially with L [16]. Among these so defeatable attacks, which try to syn-
chronise faster than the two parties [8], are the Naive Attack, that uses a single
or an ensemble of several identically structured TPMs. The Genetic Attack even
comprises a population of thousands of TPMs, whose internal representations
are optimised by a genetic algorithm [12]. A successful attack is defined here as
synchronising faster than the parties A and B and could be realized for K = 2
in 50% of all cases. But, already for L = 3, this attack has shown to be less
effective than the Flipping Attack. The complexity of such attacks (especially
in hardware), with hundreds or thousands of TPMs plus an additional (genetic)
algorithm, is obviously high. The Flipping Attack defines a successful attack as
having 98% overlap with the weights of one party, when parties A and B are
already synchronous. For L > 2, an ensemble of 10000 Flipping Attackers was
found less effective than a single attacker, which revokes its practical use [8].
All of the previously sketched attacks can be made arbitrarily costly and thus
practically defeated by increasing L, which significantly decreases the probability
of a successful attack. The approach thus remains computationally secure for
sufficiently large L [21, 25].
The only attack, which does not seem to be affected by an increase of L
(but still by an increase of K) is the socalled Majority Flipping Attack. It uses a
hundred of coordinated and communicating TPMs [25]. Yet, the given definition
of a successful attack is problematic: When A and B have fully synchronised,
the attacker has 98% average overlap (i.e. a fraction) with the weights, in 50%
of all cases. For 99% average overlap, the probability reduces to 25%. This
6
indicates that the difficulty lies in achieving the last percents. The authors
chose this definition, because of the strong fluctuations they observed in the
success probability. But the definition of overlap is an average overlap over all
hidden units. Thus an attacker does not know, which of the K ·N components
of the weights (the key) are correct in a real attack scenario. In currently used
symmetric encryption algorithms, the flipping of a single bit only already leads
to a complete failure in decryption. Thus practically, one still has to perform a
subsequent brute-force attack, which would then only be successful in 50% of all
cases. Keep in mind, that A and B already have one key (while the attacker has
98% with a probability of 0.5) and start encryption and data transmission. The
attacker needs to perform his brute force attack plus the attack on the encrypted
data in parallel. Furthermore, the rekeying principle and the achievable short
key lifetimes (cf. Section 3.1) aim at an online usage of the exchanged keys for
secure transmission.
All formulated attacks can hardly be performed online. Only an offline
attack on the previously recorded exchanged information seems realistic. Last
but not least, note that all of the existing attacks are based on knowing the
common inputs and thus refer to a non-authenticated key exchange, in which
man-in-the-middle attacks are possible as well.
3.1 Feasible Immediate Rekeying
We propose to minimise the key lifetime as much as possible employing imme-
diate rekeying, that allows to exploit the speed of key exchange and features of
our hardware component. Such a rekeying process normally is to be avoided
due to the computational cost of a new key exchange. Strategies are developed
to increase the key lifetime without affecting the security (see e.g. [1]). Yet,
using the TPM principle allows for efficient rekeying in the kHz-range (see also
Section 5).
Next to several other propositions (cf. [9]) concerning the en-/decryption,
one particular proposition (cf. [11]) is to take each (common) weight vector
after synchronisation for en-/decryption. On the one hand, a new potential
key is present in each step, which can then be used block-wise. On the other
hand, an opponent then also has the chance to synchronize using the ongoing
communication (cf. Section 2.2) and get a key.
We suggest to permanently generate (i.e. synchronise) new keys in parallel or
multiplexed to the encryption-transmission-decryption of data, using the most
recent key exchanged. In this case, the key is only used to encrypt a certain small
subset of the plaintext. As soon as a new key has been exchanged, it is used
for encryption. This especially allows to realize short key lifetimes, enabling a
certain security level by many smaller keys instead of one large key.
Consequently, in our hardware design, we allow an external unit to demand
a key exchange service. Our TPMRAs will continuously synchronise new keys,
as long as data needs to be exchanged. Once a crypt-unit uses the first key,
synchronisation is triggered again to always provide a next key. In this way,
the related hardware resources are consequently used and keys are exchanged
at a maximum rate subject to hardware constraints, average synchronisation
time and available channel bandwidth. Furthermore, it allows to implement
services like periodic or even adaptive rekeying. Security is thus increased in
our hardware implementation through feasible immediate rekeying, the mere
speed of key exchange and the achievable short key lifetimes.
4 Tree Parity Machine Rekeying Architectures
It is important to note that, with respect to a hardware implementation, only
signs and bounded integers are processed within the algorithm. The result of the
outer product in Eq. (1) can be realized without multiplication. The product
within the sum is only changing the sign of the weight. Thus, the most complex
structure to be implemented is an adder. The complexity of such a unit is thus
even less than the complexity of a linear filter, which requires a full multiply-
accumulate structure. Yet, the inherent parallelism can be exploited here as
well. The branches in Eq. (2) are only based on a test for the sign or a test on
equality to zero, also easily done in hardware.
Furthermore, only sign-operations and additions are present in the learning
rule (Eq. (3), well suited for a hardware implementation. The bit package ex-
change can either be realized serially or via a parallel bus, depending on the users
requirements and the intended application. The amount of registers needed for
storage increases in the bit package variant, finally imposing a tradeoff area vs.
speed.
Equal (pseudo-)random inputs are realized by equally initialised Linear Feed-
back Shift Registers (LFSR) or a Cyclic Redundancy Code (CRC). Different
(secret) initial weights can either be fixed (device-specific), or they can be pro-
vided by an additional application-specific device or by a thermal noise device.
The synchronisation criterion (Eq. (5)) basically comprises a counter.
The proposed Tree Parity Machine Rekeying Architectures (TPMRAs) are
functionally separated into two main structures. One structure essentially com-
prises the Key Handshake and Bit Package Control. The other structure con-
tains the TPM Unit and its control state machine.
4.1 Key Handshake and Bit Package Control
As described in Section 2.2, we implemented the bit package generalisation of the
protocol (cf. [9]). The overall structure of the TPMRAs is shown in Fig. 2. It
consists of three functional blocks: a Key Handshake and Bit Package Control,
the TPM unit and a Watchdog timer.
The Watchdog timer supervises the number of interactions needed for a key-
exchange between two parties (Eq. (5)). If there is no synchronisation within a
specific time (remember that the synchronisation time is distributed), a signal
(sync error) indicates a synchronisation error. It is programmable for variable
average synchronisation times subject to the chosen TPM structure.
8
key_com
key_cha
Keyreq_key error
sync_
package
Bit−
BP_req
BP_ack
Bit Package Control
Key Handshake & 
Tree Parity
Machine
Watchdog
Figure 2: Basic diagram of the Tree Parity Machine Rekeying Architectures.
The Key Handshake and Bit Package Control handles the key transmission
with an encryption unit and the bit package exchange process with the other
party. It accomplishes the bit packaging by partitioning the parity bits from the
TPM unit in tighter bit slices. Due to different computation cycles between two
key exchange parties, the rekeying procedure employs a key request (req key),
a key changed (key cha) and a key commit (key com) handshake protocol (see
Fig. 2). A key is handed over via the internal bus (Key) to an encryption unit
when the synchronisation process is finished. For our application domain in
embedded system environments, we choose a fixed bit package length of 32 bit
for physically parallel exchange and synchronisation over a 32 bit wide bus (Bit
Package). The bit package exchange process uses a simple request/acknowledge
handshake protocol (BP ack, BP req).
4.2 Tree Parity Machine Unit
The TPM unit comprises the logic for the TPM structure, such as the logic for
calculating the parity bits as explained in Section 2. It consists of the TPM
control, a Cyclic Redundancy Code (CRC) generator, a Parity Computation
unit and a Weight Adjustment unit. A register bank holds the data for the
hidden unit and the weights of the network as shown in Fig. 3.
The TPM control is realized as simple finite state machine (FSM) which
executes the initialisation of the TPM and the learning process with the bit
package from the other party. The Parity Computation unit calculates the
summation and the parity bit (Eq. (1) and 2). The weight adjustment unit
accomplishes the learning rule (Eq. (3) and (4)).
The CRC random generator generates the pseudo random bits for the inputs
of the TPM. It is initialised by a vector which is equal for both parties. For the
purpose of authentication, the initial value would have to be kept secret.
9
Key
Parity Bit
Bit Package
Weights
Hidden Units
Registers
Tree Parity
Machine
Control
(FSM)
Generator
CRC Random
Parity
Computation
Weight
Adjustment
Figure 3: Internal structure of the Tree Parity Machine Unit.
5 Implementation and Performance
We designed and simulated parameterisable, serial and semi-parallel TPMRekey-
ing Architectures, using VHDL to implement an FPGA- and an ASIC-realisation.
For both architectures we appoint the integer range L to 4, as explained in
Section 3. In the serial architecture, the synaptic summation is performed by
Time Devision Multiple Access (TDMA) of an L-bit adder, while the semi-
parallel form uses TDMA of six L-bit adders in parallel. The details of the
TPMRA implementations (key length K ·N ·L, serial or semi/fully-parallel re-
alisation) must be chosen with respect to the target environment, including the
used parameters, the timing, the available channel capacity and the available
chip-area, of course.
We realized a two party prototype system based on two XCV300E-8 (Virtex
E) FPGAs from Xilinx in order to investigate and demonstrate the functionality
of our architectures. Standard cell ASIC prototype realisations were built to
verify the suitability of the TPMRAs for typical embedded system components.
We chose K = 3 and varied N up to 49 for a resulting key size of 3 · 49 · 4 = 588
bit. This choice for N already allows a remarkable key length and still keeps
the average synchronisation time low (cf. [9]). The underlying process was
a 0.18µ six-layer CMOS process with 1.8V supply voltage based on the UMC
library. The design was synthesised using the Synopsys Design-Compiler and
was mapped using Cadence Silicon Ensemble.
The area (Fig. 4a) of the TPMRA realisations scale approximately linear
(around one square-millimetre) due to the linear complexity of the adders. The
serial TPM realisation consumes less area (i.e. less hardware resources). Note,
that most of the area is consumed by the bit packaging, because of the necessary
storage of the inputs for the learning (cf. Section 2.2).
Obviously, the achievable clock speed (Fig. 4b) in the serial variant is sig-
nificantly higher than in the semi-parallel version. This is due to the necessity
of a longer clock tree for the additional registers to store partial results.
Additionally, we established the throughput (i.e. keys per second) subject
10
0.4
0.6
0.8
1
1.2
1.4
1.6
84 156 228 300 588
0.43
0.54
0.67
0.81
1.45
0.49
0.65
0.82
0.98
1.66Semi-parallel TPMRA
Serial TPMRA
(a) Area [mm2] vs. key length [bit]
200
250
300
350
400
450
500
84 156 228 300 588
515 515 510
478
454
242
200 196 200
182
Semi-parallel TPMRA
Serial TPMRA
(b) Speed [MHz] vs. key length [bit]
10000
15000
20000
25000
30000
35000
40000
84 156 228 300 588
43269
27844
21937
16992
9326
33006
25387 24094 23641
17839
Semi-parallel TPMRA
Serial TPMRA
(c) Average key exchange rate [Hz] vs.
key length [bit] (idealised infinite chan-
nel bandwidth)
912
1125
9281
17636
400 1000 1056000
871
1064
Semi-parallel TPMRA
Serial TPMRA
(d) Average key exchange rate [Hz] vs.
the channel bandwidths [kbps] of I2C,
CAN and PCI (burst mode)
Figure 4: Post-synthesis results for chip-area (logic) (a) and achievable clock-
frequency (b) vs. key length. Average key exchange rate (avg. synchronisation
time of 400 iterations) vs. key length is plotted in (c). A practically finite
channel capacity is neglected here. Plot (d) is log-scaled and shows average key
exchange rates for a 588 bit key and a selection of typical channels with their
capacities. All data refers to a UMC 0.18 micron six-layer standard cell process.
to the average synchronisation time of 400 iterations for different key lengths in
Fig. 4c. We assumed the maximally achievable clock frequency with regard to
each key length, which can be achieved by Digital Phase Lock Loop (DPLL), re-
gardless of the systems clock frequency. Furthermore, we appointed the average
synchronisation time of 400 iterations for all key lengths, although it is really
always less than the size of the key (a worst-case scenario). This data refers to
an idealised infinite channel bandwidth, neglecting the transmission delay. For
key lengths smaller than approximately 180 bit, the serial TPMRA has a higher
11
throughput (in the range of 2.5 · 104 to more than 4 · 104 keys per second) due
to the higher clock frequency (Fig. 4b). Beyond this point, the semi-parallel
version achieves a higher throughput, exploiting the parallel computation.
Figure 4d shows the same information, but for three real communication
channels and their bandwidths, given a key length of 588 bit. The chosen log-
scale allows to see the small difference regarding the throughput (up to around
1000 keys per second) for an I2C and CAN -bus. Only for buses of higher
bandwidth such as the PCI-bus, the two architectures show a significantly
different throughput (reaching the kHz-range). In the case of an 32 bit PCI-
bus in burst mode, the theoretical maximum throughput (as in Fig. 4c) can be
achieved. We also considered other bus systems (e.g. packet based systems like
WLAN). The results are similar, due to their small bandwidth in comparison to
the PCI-bus. Obviously, the bottleneck is the underlying communication-bus,
as it is also typical in other domains (processor-bus-bottleneck). Given a high-
speed communication channel, the proposed key exchange and rekeying in the
kHz-range allows us to use rather weak encryption algorithms (cf. Section 3), as
the security may rely on fast rekeying. Of course, any other more sophisticated
encryption algorithm like AES or 3-DES can also be used.
The achievable average key-exchange rates of the TPMRAs in the kHz-range,
allow to increase the security through a feasible frequent key exchange. Short
key lifetimes can be realized efficiently. Also, any successful online attack must
at least achieve the same performance, requiring significant hardware expenses.
This does not appear to be feasable. Using different keys for encryption and
transmission of different blocks of data, increases the difficulty for an attack on
the encrypted data.
Due to the small area in the range of one square-millimetre, we regard the
field of application principally as an IP-core in embedded system environments.
A particular focus can be smartcards or transponder-based applications such as
RFID-systems and devices in ad-hoc networks [19], in which a small area for
cryptographic components is mandatory.
6 Summary and Outlook
We presented a solution for secure communication in embedded system envi-
ronments via Tree Parity Machine Rekeying Architectures. Our investigations
confirm the results as presented in [9] and stress the advantages of a hardware
implementation. The silicon area lies within a square-millimetre and allows to
exchange keys of practical size within about a millisecond. The proposed ex-
change in parallel to encryption-transmission-decryption also allows for efficient
rekeying schemes and short key lifetimes.
Next to algorithmic extensions to further increase the security [21, 22, 15, 23],
architectural improvements or variants include a fully serial realisation with
TDMA usage of a single TPM unit. This further decreases the area consumption
but at the cost of an increase in necessary cycles for one output bit. A stream
cipher variant, using output bits directly via the Blum-Blum-Shub bit generator,
12
was suggested already in [9] and its implementation in hardware is particularly
suited for streaming applications. The relatively small size of the TPMRAs
allows an implementation in embedded systems with only small overhead. They
are especially suited for devices of limited resources and even more in moderate
security scenarios. Consequently, the integration of our architectures into such
a system and its practical evaluation is subject to future work.
References
[1] M. Abdalla and M. Bellare, Increasing the lifetime of a key: A comparative
analysis of the security of rekeying techniques, Advances in Cryptology –
Asiacrypt 2000 Proceedings (T. Okamoto, ed.), LNCS, vol. 1976, Springer
Verlag, 2000.
[2] Ross Anderson, Protecting embedded systems – the next ten years, Proc.
of the 3rd International Workshop on Cryptographic Hardware and Em-
bedded Systems, CHES 2001 (Paris, France), LNCS, vol. 2162, Springer
Verlag, May 14-16, 2001, pp. 1–2.
[3] S. Bornholdt and H.G. Schuster (eds.), Handbook of graphs and networks,
ch. Theory of interacting neural networks, Wiley VCH, 2003.
[4] Ernst Bovelander, Smart card security, State of the Art in Applied Cryp-
tography – Course on Computer Security and Industrial Cryptography (Re-
vised lectures) (B. Preneel and V. Rijmen, eds.), LNCS, vol. 1528, Springer
Verlag, Jun 1997.
[5] W. Diffie and M. E. Hellman, New directions in cryptography, IEEE Trans-
actions on Information Theory IT-22 (1976), no. 6, 644–654.
[6] T. El Gamal, A public key cryptosystem and a signature scheme based
on discrete logarithms, IEEE Transactions on Information Theory IT-31
(1985), no. 4, 469–472.
[7] A. Huang, Keeping secrets in hardware: the Microsoft X-BOXTMcase study,
Proc. of the Workshop on Cryptographic Hardware and Embedded Sys-
tems, CHES 2002, LNCS, vol. 2523, Springer Verlag, August 13 - 15 2003,
pp. 213–227.
[8] I. Kanter and W. Kinzel, Neural cryptography, Proc. of the 9th Interna-
tional Conference on Neural Information Processing, ICONIP 2002 (Singa-
pore), Nov. 18-22 2002.
[9] I. Kanter, W. Kinzel, and E. Kanter, Secure exchange of information by
synchronization of neural networks, Europhysics Letters 57 (2002), no. 1,
141–147.
[10] A. Kerckhoffs, La cryptographie militaire, Journal des sciences militaires
IX (1883), 5–38, (pp. 161-191, Feb. 1883).
13
[11] W. Kinzel and I. Kanter, Interacting neural networks and cryptography,
Advances in Solid State Physics (B. Kramer, ed.), vol. 42, Springer Verlag,
2002.
[12] Alexander Klimov, Anton Mityagin, and Adi Shamir, Analysis of neural
cryptography, Advances in Cryptology – ASIACRYPT 2002, LNCS, vol.
2501, Springer Verlag, 2002, pp. 288–298.
[13] Ueli Maurer, Secret key agreement by public discussion, IEEE Transactions
on Information Theory 39 (1993), no. 3, 733–742.
[14] R. Metzler, W. Kinzel, and I. Kanter, Interacting neural networks, Phys.
Rev. E 62 (2000), no. 2, 2555–2565.
[15] R. Mislovaty, E. Klein, I. Kanter, and W. Kinzel, Public channel cryptog-
raphy by synchronization of neural networks and chaotic maps, Phys. Rev.
Lett. 91 (2003), no. 118701.
[16] R. Mislovaty, Y. Perchenok, I. Kanter, and W. Kinzel, Secure key-exchange
protocol with an absence of injective functions, Phys. Rev. E 66 (2002),
no. 066102.
[17] C. Paar, Eingebettete Sicherheit im Automobil. ESCAR – Embedded IT-
Security in Cars, Cologne, Germany, 18.-19. Nov. 2003.
[18] J. Pelzl, T. Wollinger, and C. Paar, Low cost security: Explicit formulae
for genus-4 hyperelliptic curves, Tenth Annual Workshop on Selected Areas
in Cryptography, SAC 2003, Springer Verlag, 2003.
[19] W. Rankl and W. Effing, Smart card handbook, Wiley & Sons Ltd., 2000.
[20] Ron Rivest, Adi Shamir, and Len Adleman, A method for obtaining digital
signatures and public key cryptosystems, Communications of the ACM 21
(1978), no. 2, 120–126.
[21] M. Rosen-Zvi, E. Klein, I. Kanter, and W. Kinzel, Mutual learning in a
tree parity machine and its application to cryptography, Phys. Rev. E. 66
(2002), no. 066135.
[22] A. Ruttor and W. Kinzel, Repulsive feedback mechanisms in neural cryp-
tography, 2003.
[23] A. Ruttor, W. Kinzel, L. Shacham, and I. Kanter, Neural cryptography with
feedback, Phys. Rev. E 69 (2004).
[24] Rhode & Schwarz, Topsec gsm, data sheet pd0757.6904.21, aug 2001.
[25] L. N. Shacham, E. Klein, R. Mislovaty, I. Kanter, and W. Kinzel, Co-
operating attackers in neural cryptography, Preprint www.arxiv.org/cond-
mat/0312068, dec 2003.
14
[26] Frank Stajano, Security in pervasive computing, Proc. of the 1st Interna-
tional Conference on Security in Pervasive Computing, SPC 2003 (D. Hut-
ter, ed.), LNCS, vol. 2802, Springer Verlag, 2004, p. 1.
[27] Klaus Vedder and Franz Weikmann, Smart cards – requirements, proper-
ties, and applications, State of the Art in Applied Cryptography – Course
on Computer Security and Industrial Cryptography (Revised lectures),
LNCS, vol. 1528, Springer Verlag, June 1997, pp. 307–331.
[28] T. Wollinger, J. Guajardo, and C. Paar, Cryptography in embedded systems:
An overview, Proc. of the EmbeddedWorld 2003 Exhibition and Conference
(Nu¨rnberg, Germany), Design & Elektronik, Nu¨rnberg, Feb. 18-20 2003,
pp. 735–744.
15
