Parity-Based Concurrent Error Detection Schemes for the ChaCha Stream
  Cipher by Rieger, Viola & Zeh, Alexander
Parity-Based Concurrent Error Detection Schemes
for the ChaCha Stream Cipher
Viola Rieger and Alexander Zeh
Research and Development Center
Infineon Technologies AG, Munich, Germany
{viola.rieger, alexander.zeh}@infineon.com
Abstract—We propose two parity-based concurrent er-
ror detection schemes for the Quarterround of the ChaCha
stream cipher to protect from transient and permanent
faults. They offer a trade-off between implementation over-
head and error coverage. The second approach can detect
any odd-weight error on the in-/output and intermediate
signals of a Quarterround, while the first one requires less
logic.
I. INTRODUCTION
The ChaCha stream cipher was introduced by Bern-
stein in 2008 [1] as a successor of the Salsa cipher
family [2]. Both algorithms are based on pseudorandom
functions using ADD, ROTATE and XOR (ARX) oper-
ations. In order to provide confidentiality, authenticity
and integrity for data, Authenticated Encryption with
Associated Data (AEAD) takes on greater significance
and realization in hardware for embedded systems seems
fruitful. With the authenticator Poly1305, Bernstein pro-
posed a software optimized Message Authentication
Code (MAC, [3]), that is well suited to be used in
combination with the ChaCha algorithm. Concurrent
Error Checking (CED, [4], [5]) was developed to detect
faults within functional or logical building blocks of
embedded circuits, such as ALUs, adders or individual
gates. CED enables an efficient, testable and robust
design. Parity-based CED was applied to substitution-
permutation networks in [6] and, e.g., applied to AES
by Bertoni [7] and Natale [8]. To our knowledge, no
CED scheme has been proposed for ChaCha so far.
This paper is structured as follows: we recall prelim-
inaries on parity codes and the parity-bit prediction
for basic operations on binary vectors in Section II.
Section III contains basic operations of the ChaCha-
Poly1305 AEAD scheme and intermediate signals in a
Quarterround of ChaCha are defined. The transformation
of a Quarterround into a code-disjoint circuit based on
a single parity-bit prediction is described in Section IV.
Thm. 4 gives the expression for the overall parity-bit and
the error coverage is proven in Thm. 10. Our second
group-based parity prediction is described in Section V
and Thm. 13 proofs its error coverage.
II. PRELIMINARIES
Let Fq denote the finite field of order q. For two
integers a, b with b > a, we denote by [a, b] the integer
set {i ∈ Z : a ≤ i ≤ b} and let [b] be the shorthand
notation for [1, b]. Similarly, [n, k, d]q denotes the param-
eters of a q-ary linear code of length n, dimension k,
and minimum Hamming distance d. A generator matrix
of a linear [n, k, d]q code C over Fq is a k × n matrix
whose rows form a basis of C. The binary [n, n− 1, 2]2
parity code is defined as the code with generator matrix
G = (I 1n−1), where I is the (n− 1)× (n− 1) identity
matrix and 1n−1 is (1 1 · · · 1)T . It is well-known, that
any error e ∈ Fn2 of odd Hamming weight added to a
codeword c of an [n, n−1, 2]2 parity code C results in a
vector (c+e) 6∈ C that is detectable. In the following we
use the calculation of the one-bit redundancy of a parity
code. The bitwise XOR of two binary vectors a, b ∈ Fn2
is denoted by a⊕ b and a b is the addition of a and b
in Fn2 .
A parity-bit of x = (x0 x1 · · · xn−1) ∈ Fn2 is
defined as the Boolean function p : Fn2 7→ F2, where
p(x)
def
=
⊕n−1
i=0 xi. Let a, b ∈ Fn2 , then we have:
p(a⊕ b) = p(a)⊕ p(b). (1)
The parity-bit of the sum of two binary vectors is:
p(a b) = p(a)⊕ p(b)⊕ p(cv(a, b)), (2)
where c = cv(a, b) ∈ Fn2 is the so-called carry vector as-
sociated with a and b and obtained during the calculation
of their sum. The entries of c are given by
ci =
{
0, for i = 0,
aibi ∨ (ai ⊕ bi)ci−1, ∀i ∈ [1, n− 1].
(3)
ar
X
iv
:1
90
4.
06
07
3v
1 
 [c
s.I
T]
  1
2 A
pr
 20
19
The addition a b of two vectors a, b requires more logic
gates than the XOR a ⊕ b and is therefore more error-
prone. Hence, a variety of self-checking adders were
developed such as, e.g., a parity-checked carry look-
ahead adder introduced by Nicolaidis in [9].
III. STREAM CIPHER CHACHA
The ChaCha algorithm transforms a 512-bit state
matrix V ∈ F4×4232 into a unique and irreversible 512-bit
output block. Encryption and decryption are performed
by calculating the XOR of the keystream and the input
data. ChaCha operates on 32-bit words, and makes use
of a 256-bit key K = (key0 key1 · · · key7) and a 64-bit
nonce N? = (nonce0 nonce1) (in several specifications
96 bits are reserved for the nonce and 32 bits for the
counter). The constant CST = (cst0 cst1 cst2 cst3),
Algorithm 1: ChaCha(N,K,CTR,N?)
Input : Rounds N ∈ {8, 12, 20},
Key K = (key0 key1 · · · key7) ∈ F1282 ,
Counter CTR = (ctr0 ctr1) ∈ F642 ,
Nonce N? = (nonce0 nonce1) ∈ F642 .
Output: Updated matrix V ∈ F4×4
232
.
1 (cst0 cst1 cst2 cst3)←
(0x61707865 0x3320646E 0x79622D32 0x6B206574);
2 M ←

cst0 cst1 cst2 cst3
key0 key1 key2 key3
key4 key5 key6 key7
ctr0 ctr1 nonce0 nonce1
; // Init.
3 V ←M ;
4 for i← 0 to N/2− 1 do
5 QR(v0,0, v1,0, v2,0, v3,0);
6 QR(v0,1, v1,1, v2,1, v3,1);
7 QR(v0,2, v1,2, v2,2, v3,2);
8 QR(v0,3, v1,3, v2,3, v3,3);
9 QR(v0,0, v1,1, v2,2, v3,3);
10 QR(v0,1, v1,2, v2,3, v3,0);
11 QR(v0,2, v1,3, v2,0, v3,1);
12 QR(v0,3, v1,0, v2,1, v3,2);
13 V ← V M ; // Entry-Wise 32-bit Sum
14 return V ;
where cst0 = 0x61707865, cst1 = 0x3320646E, cst2 =
0x79622D32 and cst3 = 0x6B206574 is predefined.
The counter CTR = (ctr0 ctr1) corresponds to the
message block index i. In order to process the i-th
message block, the initial state matrix M is transformed
in a series of N rounds, where according to [1] N is
suggested to be set to 8, 12 or 20 (see Algorithm 1).
The ChaCha algorithm allows to process two rows of
V . In this, even-numbered rounds affect the columns of
the matrix V , while odd-numbered rounds modify the
diagonal elements of V . Both transformations apply the
Algorithm 2: QR(a, b, c, d)
Input : a, b, c, d ∈ F322 .
Output: a, b, c, d ∈ F322 (Updated Values).
1 a← a b; // a0 ← a b
2 d← (d⊕ a)≪16; // d0 ← d⊕ a, d1 ← d0≪16
3 c← c d; // c0 ← c d1
4 b← (b⊕ c)≪ 12; // b0 ← b⊕ c0, b1 ← b0≪12
5 a← a b // a′ ← a0  b1
6 d← (d⊕ a)≪ 8; // d2 ← d1 ⊕ a′, d′ ← d2≪ 8
7 c← c d // c′ ← c0  d′
8 b← (b⊕ c)≪ 7; // b2 ← b1 ⊕ c′, b′ ← b2≪ 7
9 return
nonlinear Quarterround function shown in Algorithm 2.
Each Quarterround QR(a, b, c, d) is based on four addi-
tions in F322 , four XORs and four rotations which operate
on the 32-bit input words a, b, c and d. A Quarterround
updates each input word twice, allowing each input word
to affect the other words.
IV. PARITY-BASED CODE-DISJOINT CIRCUIT
In this section we describe a parity-based code-disjoint
circuit [10] for the Quarterround (Algorithm 2), which
is the essential part of ChaCha [1]. We investigate its
resistance against transient and permanent faults, that
can affect the input signals a, b, c, d ∈ F322 , as well as
the intermediate signals a0, b0, c0, d0, b1, b2, d1, d2 ∈ F322
given in the comments of Algorithm 2.
For the following analysis we consider the data path
of a Quarterround as illustrated on the left side of
Fig. 1. The right side of Fig. 1 shows our group-based
parity prediction, which is part of Section V. The input
of the data path is a, b, c, d ∈ F322 and the output
are the vectors a′, b′, c′, d′ ∈ F322 . The following four
intermediate signals in F2 are defined as
α
def
= p(cv(a, b)), β
def
= p(cv(c, d1)),
γ
def
= p(cv(a0, b1)), δ
def
= p(cv(c0, d′)),
(4)
where cv(a, b) denotes the carry vector of a and b given
in (3). The four intermediate bits α, β, γ and δ defined
in (4) will, in addition to a, b, c, d ∈ F322 , be used to
transform a Quarterround into a code-disjoint circuit.
Further on, they are used for our group-based parity
approach (GBPP) (described in Section V).
A parity-based code-disjoint circuit [10] extends the
classical parity prediction by additionally encoding the
inputs of a given circuit into codewords of the parity
code. Hence, we first develop a parity prediction for one
Quarterround.
Definition 1 (Parity Prediction). Let f be a function with
input x ∈ Fm2 and output y ∈ Fn2 . A parity prediction
ppf of f is a function, such that
ppf (x) = p(f(x)) = p(y), ∀x ∈ Fm2 .
The design of the parity prediction ppf for a given
function f can be optimized in terms of, e.g., gate count
and/or error coverage.
Now, we develop a parity prediction for Algorithm 2,
where m = n = 128, x = (a b c d) and y = (a′ b′ c′ d′)
according to Def. 1. Therefore, we calculate four parity
bits for each component of the output vector (a′ b′ c′ d′).
Lemma 2 (Parity Prediction of a′). Consider the output
a′ ∈ F322 of a Quarterround given in Algorithm 2. Its
parity bit is
p(a′) = p(b)⊕ p(c)⊕ p(d)⊕ β ⊕ γ.
Proof: With (2) for the output signal a′ = a0  b1,
we have
p(a′) = p(a0)⊕ p(b1)⊕ γ
= p(a0)⊕ p(b)⊕ p(c0)⊕ γ,
and with p(c0) = p(c)⊕ p(d0)⊕ β, we obtain
p(a′) = p(a0)⊕ p(b)⊕ p(c)⊕ p(d0)⊕ β ⊕ γ. (5)
Inserting p(d0) = p(a0)⊕ p(d) in (5) leads to
p(a′) = p(a0)⊕ p(b)⊕ p(c)⊕ p(a0)⊕ p(d)⊕ β ⊕ γ,
= p(b)⊕ p(c)⊕ p(d)⊕ β ⊕ γ.
Lemma 3 (Parity Prediction of b′, c′, d′). Consider the
outputs b′, c′, d′ ∈ F322 of a Quarterround given in
Algorithm 2. Their parity bits are
p(b′) = p(a)⊕ p(b)⊕ p(c)⊕ α⊕ β ⊕ γ ⊕ δ,
p(c′) = p(b)⊕ p(d)⊕ γ ⊕ δ,
p(d′) = p(a)⊕ p(c)⊕ α⊕ β ⊕ γ.
Proof: Similar to the proof of Lemma 2.
Theorem 4 (Parity Prediction of a Quarterround). Let
(a′ b′ c′ d′) ∈ F1282 be the output of a Quarterround
given in Algorithm 2. Its parity bit is
ppQR(a, b, c, d) = p(b)⊕ p(c)⊕ β.
Proof: Due to space limitations, we omit the proof.
Note that the direct calculation of the parity bit of
a Quarterround as given by Thm. 4 can be realized by
64 gates to determine the parities of the inputs, i.e., 31
XOR gates for the calculation of p(b) (resp. p(c)), and
two XOR gates to calculate p(b)⊕p(c)⊕β. Another 127
gates are needed to calculate the parities of the output
(a′ b′ c′ d′) and one XOR gate to compare the parities.
This results in 192 two-input XOR gates.
The following corollary states the circuit for trans-
forming a Quarterround into a code-disjoint circuit as
proposed in [10]. In addition, it allows to detect an odd-
weight error affecting the input (a b c d).
Corollary 5 (Single Output Code-Disjoint Circuit). Let
the input and output as well as the parity prediction be
as in Thm. 4. Then
p((a b c d))⊕ ppQR(a, b, c, d) = p(a)⊕ p(d)⊕ β.
To obtain the error coverage, we consider er-
rors e ∈ F322 \ {0} on the intermediate signals
a0, b0, b1, b2, c0, d0, d1, d2 of Algorithm 2. Therefore, we
define the following two notations of affected vectors.
Notation 6 (Erroneous Vector). Let a ∈ F322 and e ∈
F322 \{0}. An erroneous vector a˜ is defined as a˜ = a⊕e.
Notation 7 (Potentially Error-Affected Vector). Let a ∈
F322 and ec ∈ F322 . A vector, that can be affected by an
error is denoted as c = c⊕ ec.
Using Notation 6 and 7 and for c = p(ab), the parity
bit of the modulo addition as in (2) with erroneous input
a˜ is
c = p(a˜ b) = p(a˜)⊕ p(b)⊕ p(cv(a˜, b)), (6)
where cv(a˜, b) = cv(a, b) ⊕ ec. The weight of ec ∈ F322
can be different from the weight of e (it can even be
zero).
Lemma 8 (Detectable Errors Affecting b0). Assume an
error e ∈ F322 with odd Hamming weight is added to the
intermediate signal b0 in the data path of a Quarterround
(Algorithm 2). Then, the parity prediction as in Thm. 4
will detect it.
Proof: The initially corrupted vector is b˜0 = b0 ⊕ e
and affects b1 and the output signal b′. Possible corrupted
intermediate signals are γ, δ, b2, and the output signals
a′, b′, c′, d′. The parity bit of b′ calculated by Algorithm 2
is as follows:
p(b′) = p(b˜0)⊕ p(c′), (7)
and for d′, we have
p(d′) = p(a′)⊕ p(d1). (8)
Clearly, for the calculated parity we have:
p((a′ b′ c′ d′)) = p(a′)⊕ p(b′)⊕ p(c′)⊕ p(d′) (9)
and inserting (7) and (8) into (9) gives:
p(a′)⊕ p(b′)⊕ p(c′)⊕ p(d′)
= p(a′)⊕ p(b˜0)⊕ p(c′)⊕ p(c′)⊕ p(a′)⊕ p(d1)
= p(b˜0)⊕ p(d1). (10)
The calculated parity ppQR(a, b, c, d) as given in Thm. 4
is not affected by e. Hence, the difference between
ppQR(a, b, c, d) and (10) is p(e) and therefore will be
nonzero if e has odd Hamming weight.
Algorithm 3: GBPP(p(a), p(b), p(c), p(d), α, β, γ, δ)
Input : p(a), p(b), p(c), p(d), α, β, γ, δ ∈ F2.
Output: p(a), p(b), p(c), p(d) ∈ F2 (Updated Values).
1 p(a)← p(a)⊕ p(b)⊕ α;
2 p(d)← p(d)⊕ p(a);
3 p(c)← p(c)⊕ p(d)⊕ β;
4 p(b)← p(b)⊕ p(c);
5 p(a)← p(a)⊕ p(b)⊕ γ;
6 p(d)← p(d)⊕ p(a);
7 p(c)← p(c)⊕ p(d)⊕ δ;
8 p(b)← p(b)⊕ p(c);
9 return
Lemma 9 (Detectable Errors Affecting c0). Assume an
error e ∈ F322 with odd Hamming weight is added to the
intermediate signal c0 in the data path of a Quarterround
(Algorithm 2). Then, the parity prediction as in Thm. 4
will detect it.
Proof: Similar to the proof of Lemma 8.
Theorem 10 (Error Coverage of Code-Disjoint Circuit
for Algorithm 2). The single output code-disjoint circuit
as stated in Corollary 5 for Algorithm 2 detects every
odd-weight error e ∈ F322 that affects
E1) the input signals a, b, c, d,
E2) the intermediate signals b0, c0, b1, b2, d2, and
E3) the output signals b′, d′.
Proof: The statement E1 follows from the properties
of a code-disjoint circuit as proven in [10]. The coverage
on odd-weight errors on b0 (resp. c0) was proven in
Lemma 8 (resp. Lemma 9). From this the coverage for
b1 (resp. b2) follows. The coverage of odd-weight errors
that affect d2 and the output signal d′ is similar: The error
that propagates via the addition to the output signal c′
is copied to b′ and therefore is canceled out in the sum
p(b′)⊕p(c′), but the parity of d′ is affected and therefore
detected. Clearly, an odd-weight error affecting b′ (as
stated in E3) is covered, because no other output signals
are affected.
Note that, errors in the intermediate signals a0, d0, d1,
and in the output signals a′, c′ are not detected.
V. OUR GROUP-BASED PARITY PREDICTION
To further improve the error coverage for hardware
implementations of the ChaCha algorithm, we apply
the method of parity prediction to the processed 32-
bit words. Our approach calculates a parity bit for each
of the four 32-bit components a, b, c, d of the input
vector of a Quarterround (Algorithm 2) of ChaCha.
Fig. 1 illustrates our group-based parity prediction (Al-
gorithm 3). Algorithm 3 outputs updated values of
QR GBPP
≪ 16
≪ 12
≪ 8
≪ 7
a b c d
a0
b0
b1
b2
c0
d0
d1
d2
a′ b′ c′ d′
p(a) p(b) p(c) p(d)
p(a0)
p(a′)
p(b0)
p(b′)
p(c0)
p(c′)
p(d0)
p(d′)
α
γ
β
δ
Figure 1. Our group-based parity prediction for one Quarterround
of ChaCha.
p(a), p(b), p(c), p(d) and processes the intermediate sig-
nals α, β, γ, δ ∈ F2 as defined in (4). Clearly, Algo-
rithm 3 is the direct translation of Algorithm 2 using (2)
and (1) and the fact that the parity-bit is not changed by
a bit-wise rotation (marked by ≪ in Algorithm 3).
We prove the error coverage of our approach (Algo-
rithm 3) and give an estimation of the area usage in
terms of required gates.
In the following, we analyze the error coverage of the
proposed parity prediction scheme (Algorithm 3) for the
intermediate signals a0, b0, c0 and d0 (see Lemma 11,
12). The error coverage of the remaining intermediate
signals, i.e., b1, b2, d1, d2 is then summarized in Thm. 13.
Lemma 11 (Detectable Errors Affecting a0). Assume an
error e ∈ F322 of odd Hamming weight is added to a0 in
the data path of a Quarterround (Algorithm 2). Then, at
least one output of our group-based parity prediction in
Algorithm 3 will detect it.
Proof: The error e on a0 can affect the intermediate
signals β, γ, δ, b0, b1, c0, d0, d1, d2, b2, as well as all out-
put vectors a′, b′, c′ and d′. The initially corrupted vector
is a˜0 = a0 ⊕ e. The parity bit calculation based on the
output vector d′ of the data path (Algorithm 2) can be
expressed as follows:
pd(d′) = p(a′)⊕ p(d1)
= p(a˜0)⊕ p(b0)⊕ p(γ)⊕ p(d1), (11)
and with b0 = b⊕ d1 = b⊕ (c d1) = b⊕ c⊕ d1 ⊕ β,
we obtain from (11),
pd(d′) = p(a˜0)⊕ p(b)⊕ p(c)⊕ p(d1)⊕ p(β)
⊕p(γ)⊕ p(d1)
= p(a˜0)⊕ p(b)⊕ p(c)⊕ p(β)⊕ p(γ˜).(12)
The fourth output bit of our group-based parity predic-
tion can be similarly expressed. In addition, it is possibly
affected by the error via β and γ, i.e.:
p(d′) = p(a0)⊕ p(b)⊕ p(c)⊕ p(β)⊕ p(γ). (13)
Therefore the comparison of the calculated parity pd(d′)
of d′ from the data path as given in (12) and our parity
prediction given in (13) is
pd(d
′)⊕ p(d′) = p(e)
and will be nonzero if e has odd Hamming weight.
Lemma 12 (Detectable Errors Affecting b0, c0, d0). As-
sume an error e ∈ F322 of odd Hamming weight is added
to b0 or c0 or d0 in the data path of a Quarterround
(Algorithm 2).
Proof: Similar to the proof of Lemma 11.
Theorem 13 (Odd-Weight Error on All Intermediate
Signals of A Quarterround). Our group-based parity pre-
diction according to Algorithm 3 for the Quarterround
(Algorithm 2) detects every odd-weight error e ∈ F322
that affects
E1) the intermediate signals a0, b0, c0, d0, b1, b2, d1, d2,
E2) the intermediate signals α, β, γ, δ and
E3) the output signals a′, b′, c′, d′.
Proof: Due to space limitations, we omit the proof.
The GBPP requires overall 265 gates. These are: 124
XOR gates for the calculation of the parity of the input
words a, b, c, d, another 124 XOR gates for the parity of
the outputs a′, b′, c′, d′, 12 additions, and 4 XOR gates in
combination with a 4-input OR gate to merge the results
of the four parity bit comparisons. With the usage of fault
secure adders as proposed in [5], it is possible to detect
any odd-weight error on input, output and intermediate
signals.
REFERENCES
[1] D. J. Bernstein, “ChaCha, a variant of Salsa20,” in Workshop
Record of SASC 2008: The State of the Art of Stream Ciphers,
2008.
[2] ——, “The Salsa20 Family of Stream Ciphers,” in New Stream
Cipher Designs, ser. Lecture Notes in Comput. Sci. Springer,
2008, no. 4986, pp. 84–97.
[3] ——, The Poly1305-AES Message-Authentication Code.
Springer, 2005, pp. 32–49.
[4] P. Liden, P. Dahlgren, R. Johansson, and J. Karlsson, “On
Latching Probability of Particle Induced Transients in Com-
binational Networks,” in Proc. of IEEE 24th Intern. Symp. on
Fault-Tolerant Comput., Jun. 1994, pp. 340–349.
[5] M. Goessel, V. Ocheretny, E. Sogomonyan, and D. Marienfeld,
New Methods of Concurrent Checking, 1st ed. Springer, 2008.
[6] R. Karri, G. Kuznetsov, and M. Goessel, “Parity-Based Con-
current Error Detection in Symmetric Block Ciphers,” in IEEE
Intern. Test Conf. (TC), 2003, pp. 919–926.
[7] G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri,
“Error Analysis and Detection Procedures for a Hardware
Implementation of the Advanced Encryption Standard,” IEEE
Trans. Comput., vol. 52, no. 4, pp. 492–505, Apr. 2003.
[8] G. D. Natale, M. L. Flottes, and B. Rouzeyre, “A Novel Parity
Bit Scheme for SBox in AES Circuits,” in IEEE Design and
Diagnostics of Electr. Circuits and Syst., Apr. 2007, pp. 1–5.
[9] M. Nicolaidis, “Carry Checking/Parity Prediction Adders and
ALUs,” IEEE Trans. on Very Large Scale Integr. (VLSI) Syst.,
vol. 11, no. 1, pp. 121–128, Feb. 2003.
[10] H. Hartje, E. S. Sogomonyan, and M. Gossel, “Code-Disjoint
Circuits for Parity Codes,” in Asian Test Symposium (ATS ’97),
Nov. 1997, pp. 100–105.
