Abstract-We propose two parity-based concurrent error detection schemes for the Quarterround of the ChaCha stream cipher to protect from transient and permanent faults. They offer a trade-off between implementation overhead and error coverage. The second approach can detect any odd-weight error on the in-/output and intermediate signals of a Quarterround, while the first one requires less logic.
I. INTRODUCTION
The ChaCha stream cipher was introduced by Bernstein in 2008 [1] as a successor of the Salsa family [3] . Recent trends show the dissemination of lightweight ciphers in the automotive area, where connectivity of Electronic Control Units (ECUs) will be achieved, e.g., via high-speed automotive Ethernet [5] . Simultaneously the fast advancement of autonomous driving leads to further emerging requirements on automotive functional safety [9] . The co-existence of functional safety and automotive cybersecurity necessitates considerations on how functional safety requirements on cryptographic functions can be covered with low software complexity and high throughput. With the authenticator Poly1305, Bernstein proposed a software optimized Message Authentication Code (MAC, [2] ), that is well suited to be used in combination with the ChaCha algorithm. RFC7905 [6] specifies ChaCha-Poly1305 as a cipher for the (Datagram) Transport Layer Security ((D)TLS) protocol [11] and their realization in hardware for embedded systems seems fruitful. Concurrent Error Checking (CED, [12, 7] ) was developed to detect faults within functional or logical building blocks of embedded circuits, such as ALUs, adders or individual gates. CED enables an efficient, testable and robust design. Parity-based CED was applied to substitution-permutation networks in [10] and applied to AES by Bertoni [4] . To our knowledge, no CED scheme has been proposed for ChaCha so far. This paper is structured as follows: we recall preliminaries on parity codes, parity-bit prediction for basic operations on binary vectors and introduce the Quarterround of the ChaCha algorithm in Section II. The transformation into a code-disjoint circuit based on a single parity-bit prediction is described in Section III. Thm. 4 gives the expression for the overall paritybit and the error coverage is proven in Thm. 10. Our second group-based parity prediction is described in Section IV and Thm. 13 proofs its error coverage.
II. PRELIMINARIES AND STREAM CIPHER CHACHA
Let F q denote the finite field of order q. For two integers a, b with b > a, we denote by [a, b] the integer set {i 2 Z : a  i  b} and let [b] be the shorthand notation for [1, b] . Similarly, [n, k, d] q denotes the parameters of a q-ary linear code of length n, dimension k, and minimum Hamming distance d. A generator matrix of a linear [n, k, d] q code C over F q is a k ⇥ n matrix whose rows form a basis of C. The bitwise XOR of two binary vectors a, b 2 F n 2 is denoted by a b and a b is the addition of a and
The parity-bit of the sum of two binary vectors is:
where c = cv(a, b) 2 F n 2 is the so-called carry vector cv associated with a and b and obtained during the calculation of their sum. The entries of c are given by
The addition a b of two vectors a, b requires more logic gates than the XOR a b and is therefore more error-prone. Hence, a variety of self-checking adders were developed such as, e.g., a parity-checked carry look-ahead adder introduced by Nicolaidis in [13] .
The ChaCha stream ciphers [1] uses bitwise addition (exclusive OR) , 32-bit addition modulo 2 32 and constantdistance rotation on an internal state to the left n. The ChaCha algorithm transforms a 512-bit state matrix V 2 F 
III. PARITY-BASED CODE-DISJOINT CIRCUIT
In this section we describe a parity-based code-disjoint circuit [8] for the Quarterround (Algorithm 1), which is the essential part of ChaCha [1] . We investigate its resistance against transient and permanent faults, that can affect the input signals a, b, c, d 2 F 
where cv(a, b) denotes the carry vector of a and b given in ( , be used to transform a Quarterround into a code-disjoint circuit. Further on, they are used for our group-based parity approach (described in Section IV). A parity-based code-disjoint circuit [8] extends the classical parity prediction by additionally encoding the inputs of a given circuit into codewords of the parity code. Hence, we first develop a parity prediction for one Quarterround. 
The design of the parity prediction pp f for a given function f can be optimized in terms of, e.g., gate count and/or error coverage. Now, we develop a parity prediction for Algorithm 1, where m = n = 128, x = (a b c d) and 
and with p(c
, we obtain
Inserting
.
2 of a Quarterround given in Algorithm 1. Their parity bits are
Proof: Similar to the proof of Lemma 2.
Based on the previous two lemmas, we now can state the expression of the parity-bit of the output vector of a Quarterround of the stream cipher ChaCha.
Theorem 4 (Parity Prediction of a Quarterround
be the output of a Quarterround given in Algorithm 1. Its parity bit is
Proof: Clearly, we have:
Inserting the results from Lemma 2 and Lemma 3 for
Note that the direct calculation of the parity bit of a Quarterround as given by Thm. 4 can be realized by 64 gates to determine the parities of the inputs, i.e., 31 XOR gates for the calculation of p(b) (resp. p(c)), and two XOR gates to calculate p(b) p(c)
. Another 127 gates are needed to calculate the parities of the output (a 0 b 0 c 0 d 0 ) and one XOR gate to compare the parities. This results in 192 two-input XOR gates. The following corollary states the circuit for transforming a Quarterround into a code-disjoint circuit as proposed in [8] . In addition, it allows to detect an odd-weight error affecting the input (a b c d).
Corollary 5 (Single Output Code-Disjoint Circuit). Let the input and output as well as the parity prediction be as in Thm. 4. Then p ((a b c d) 
To obtain the error coverage, we consider errors e 2 F 
2 of Algorithm 1. Therefore, we define the following two notations of affected vectors.
Notation 6 (Erroneous Vector
). Let a 2 F 32 2 and e 2 F 32 2
\{0}.
An erroneous vector e a is defined as e a = a e. Using Notation 6 and 7 and for c = p(a b), the parity bit of the modulo addition as in (2) 
where cv(e a, b) = cv(a, b) e c . The weight of e c 2 F 32 2 can be different from the weight of e (it can even be zero). 
and for d 0 , we have
Clearly, we have for the calculated parity:
and inserting (7) and (8) in (9) gives:
The calculated parity pp QR(a,b,c,d) as given in Thm. 4 is not affected by e. Hence, the difference between pp QR(a,b,c,d) and (10) is p(e) and therefore will be nonzero if e has odd Hamming weight.
Lemma 9 (Detectable Errors Affecting c 0 ). Assume an error e 2 F 32 2 with odd Hamming weight is added to c 0 in the data path of a Quarterround (Algorithm 1). Then, at least one output of our group-based parity prediction will detect it.
Proof: Similar to the proof of Lemma 8. Proof: Clearly, the statement E1 follows from the properties of a code-disjoint circuit as proven in [8] . The coverage on odd-bit errors on b 0 (resp. c 0 ) was proven in Lemma 8 (resp. Lemma 9). From this the coverage for b 1 (resp. b 2 ) follows. The coverage of odd-bit errors that affect d 
IV. OUR GROUP-BASED PARITY PREDICTION
To further improve the error coverage for hardware implementations of the ChaCha algorithm, we apply the method of parity prediction to the processed 32-bit words. Our approach calculates a parity bit for each of the four 32-bit components a, b, c, d of the input vector of a Quarterround (Algorithm 1) of ChaCha. Fig. 1 illustrates our group-based parity prediction. The algorithm illustrated in Fig. 1 
and with
, we obtain from (11),
The fourth output bit of our group-based parity prediction can be similarly expressed. In addition, it is possibly affected by the error via and , i.e.:
Therefore the comparison of the calculated parity
from the data path as given in (12) and our parity prediction given in (13) is
and will be nonzero if e has odd Hamming weight. Proof: Due to space limitations, we omit the proof. Our group-based parity prediction requires overall 265 gates. These are: 124 XOR gates for the calculation of the parity of the input words a, b, c, d, another 124 XOR gates for the parity of the outputs a 0 , b 0 , c 0 , d 0 , 12 additions, and 4 XOR gates in combination with a 4-input OR gate to merge the results of the four parity bit comparisons. With the usage of fault secure adders as proposed in [7] , it is possible to detect any odd-weight error on input, output and intermediate signals.
