We propose an easy-to-implement hard-decision majority-logic decoding algorithm for Reed-Muller codes RM (r, m) with m ≥ 3, m/2 ≥ r ≥ 1. The presented algorithm outperforms the best known majority-logic decoding algorithms and offers highly parallel decoding. The result is of special importance for safety-and time-critical applications in embedded systems. A simple combinational circuit can perform the proposed decoding. In particular, we show how our decoder for the three-error-correcting code RM(2, 5) of dimension 16 and length 32 can be realized on hardware level.
I. INTRODUCTION
E MBEDDED systems are becoming ubiquitous and an integral part of our everyday life. Addressing functional safety is a major challenge with increasing complexity. Typical examples of safety-critical embedded systems include vehicle safety or driver assistance systems with accident prevention. However, functional safety is becoming more prevalent not just in the automotive sector, but also in industrial markets such as aviation, solar energy, and the medical sector (e.g., [14] ). Memory devices increasingly provide built-in error correction in order to restore corrupted data [15] and also to maximize the number of writes in flash memory [16] . With information and communication technology components becoming ever smaller and more complex, the probability for hardware immanent error arises.
Decoders taking advantage of the cyclic structure of shortened Reed-Muller codes accommodate the increasing demand for less space consumption -at the cost of the decoding duration [12] . On the other hand, several recursive algorithms were developed allowing decoding with only O (min (r, m − r) · 2 m ) operations [10] , [13] . Though the number of operations could be reduced, all these operations need to be executed one after another. Therefore, these algorithms require much parallel time. Parallel time is defined as Manuscript received February 6, 2013; revised July 20, 2013. The editor coordinating the review of this paper and approving it for publication was T.-K. Truong.
The work of M. Huber was supported by the Deutsche Forschungsgemeinschaft (DFG) via a Heisenberg grant (Hu954/4) and a Heinz Maier-Leibnitz Prize grant (Hu954/5). The work of P. Hauck the time the algorithm takes if all its modules are parallelized to the maximum possible amount. Thus, cyclic as well as recursive decoders are not designed for correcting errors in parallel. However, for all safety-critical applications, where real-time control is ranked first, decoding multiple positions in parallel saves precious time. Decoders based on majority-logic can accomplish this task. Furthermore, in embedded systems, very simple hard-decision algorithms are mostly preferable to soft-decision algorithms [10] , [13] . Therefore, hard-decision decoders for Reed-Muller codes using decision by majority are an attractive option for forward error correction in realtime on hardware level. A majority-logic decoding algorithm was first proposed by Reed [1] . Reed's algorithm consists of r + 1 decoding steps in which majority voting is performed. Chen [3] , [4] significantly improved Reed's decoding algorithm by reducing the number of decoding steps. In particular, if Reed-Muller codes RM (r, m) with m ≥ 3, m/2 ≥ r ≥ 1 are employed, Chen's algorithm consists of only two decoding steps. In this case, up to O 2 3m−2r functions are called concurrently and Chen's algorithm can be executed in constant parallel time provided majority voting also takes constant parallel time.
The authors in [17] investigated how far the number of majority votes in Chen's algorithm can be reduced while focusing on information bits. They established upper and lower bounds for the complexity. But an explicit instruction how to construct a decoder is only provided for a few codes. Furthermore, their decoding process depends on the encoding procedure.
In the present paper, we propose a new hard-decision decoding algorithm for all Reed-Muller codes RM (r, m) with m ≥ 3, m/2 ≥ r ≥ 1. Our decoder is easy to design for software and hardware applications. The algorithm decodes all bits, i.e., information and redundancy, without considering the encoding process. Compared to state-of-the-art majoritylogic decoders, our algorithm is less complex. In contrast to recursive decoders [10] , [13] , our decoder enables massively parallel decoding in constant parallel time.
The paper is organized as follows. Section II introduces the notation and preliminaries on Reed-Muller codes. In Section III, we revisit Chen's decoding algorithm and analyze its complexity. In Section IV, we present in detail our new decoding algorithm including proof of correctness, pseudocode, estimation of complexity and an example for RM (2, 5) . Our algorithm is compared to Chen's algorithm 0090-6778/13$31.00 c 2013 IEEE in terms of complexity in Section V. The paper concludes in Section VI with further advantages of our algorithm in comparison to other classes of decoders.
II. NOTATION AND PRELIMINARIES
The binary Reed-Muller code RM (r, m) is a [n, k, δ] code with n := 2 m , k := r i=0 m i , δ := 2 m−r which guarantees correcting up to δ/2 − 1 errors. We number the vectors in Z m 2 in arbitrary order starting from zero. Every position i ∈ {0, 1, . . . , n − 1} in a binary word of length n is identified by v i ∈ Z m 2 . Then, we characterize a set of vectors S ⊆ Z m 2 by its incidence vector
Given r, m and the specific ordering, the Reed-Muller code RM (r, m) is generated by all incidence vectors that characterize d-flats with d = m − r [5] . Therefore, we denote by RM (r, m) not one code but a family of equivalent codes depending on the chosen ordering in Z m 2 . For the rest of the paper, let m/2 ≥ r ≥ 1, m ≥ 3. Furthermore, we generally assume a codeword c := (c 0 , c 1 , . . . , c n−1 ) ∈ RM (r, m) was sent through a noisy channel and z := (z 0 , z 2 . . . , z n−1 ) = (c 0 , c 1 , . . . , c n−1 ) + (e 0 , e 1 , . . . , e n−1 ) = c + e ∈ Z n 2 was received where at most δ/2 − 1 errors occurred. For any vectors v, w ∈ Z n 2 , let v · w ∈ Z 2 denote the scalar product (over Z 2 ) of the two vectors v and w. Let S ⊆ Z m 2 be arbitrary. The scalar product z · χ S is called the check-sum of S. Since
it is not necessary to consider all n entries of z. To reduce the complexity of computing the check-sum of S, we only take into account the |S| entries
In the following, we will say that S possesses t errors if and only if
In particular, we call S odd or even if S possesses an odd or even number of errors, respectively. Note that S is odd if and only if e · χ S = 1.
The majority function μ : {0, 1} s → {0, 1} is defined as follows:
where x represents the largest integer not greater than x.
III. CHEN'S TWO-STEP MAJORITY-LOGIC DECODING OF REED-MULLER CODES -REVISITED Chen's decoding algorithm [3] , [4] corrects in two majoritylogic steps all n positions. It operates on flats of dimension r + 1 or less and performs majority voting.
A. The Idea
Chen's algorithm takes advantage of the following proposition.
Proposition 1: Let S ⊂ Z m 2 be arbitrary. Suppose there exist S 1 , . . . , S N ⊆ Z m 2 with N ≥ δ − 2 which intersect pairwise in S, i.e., S i ∩ S j = S for all i, j = 1, . . . , N, i = j. Then S is odd if and only if more than N/2 sets S i are odd.
Proof: Suppose S possesses t errors. Beyond these t errors, up to δ/2−1−t further errors occurred while transmitting the codeword. Therefore, at least N − (δ/2 − 1 − t) sets S i must possess the same number of errors as S, namely t errors.
Hence, if t is odd, more than N/2 sets S i are odd. On the other hand, if t is even, at least N/2 sets S i are even and therefore at most N/2 sets S i are odd.
According to Proposition 1, it can be deduced whether a set S in Z m 2 is odd or even, once we have this information about δ − 2 arbitrary supersets of S, intersecting pairwise in S. For some sets, namely d-flats with d ≥ r + 1, this information can be easily gained. Let us consider a d-flat V with d ≥ r + 1. Then, its incidence vector, χ V , is a codeword of RM (m − r − 1, m), the dual code of RM (r, m) [5] . Thus,
Hence, V is odd if and only if the check-sum of V equals one.
Reed [1] proposed an algorithm comprising r + 1 steps in which Proposition 1 is applied. Taking into account the check-sums of certain (r + 1)-flats, the algorithm computes in the first step whether certain r-flats are even or odd using majority-logic. In each step ρ = 1, 2, . . . , r+1, it is iteratively decided whether the (r + 1 − ρ)-flats are odd or even. In the final step, the algorithm yields the number of errors in 0-flats where every 0-flat corresponds to a single position.
Analyzing Reed's algorithm, Chen noticed that several steps can be omitted. In the case of m ≥ 3, m/2 ≥ r ≥ 1, Chen showed that for every position i = 0, 1, . . . , n − 1, there exist δ − 2 r-flats intersecting pairwise in {v i }. In addition, each r-flat is the pairwise intersection of δ − 2 (r + 1)-flats [3] , [4] . This observation is the basis for a two-step majority-logic algorithm to decode all n positions. The first step is identical to the one in Reed's algorithm where the second step deduces the number of errors in 0-flats directly from the results for r-flats.
B. The Algorithm
Chen's algorithm operates on a set of flats of dimension 0, r and r + 1, say F , which meets the following conditions. a) {v i } ∈ F for all i = 0, 1, . . . , n − 1. b) For every 0-flat {v} ∈ F, there exist r-flats
We call a set of flats admissible if it satisfies these three conditions. Furthermore, we say W i is used for decoding of V and V i is used for decoding of {v}, i = 0, 1, . . . , δ − 3. 
By proving the existence of an admissible set in [3] , [4] , Chen indicates a strategy how to decode all positions in two steps using majority-logic. 
The corresponding algorithm consists of four function levels.
Input: the received word z ∈ Z n 2 Require: at most δ/2 − 1 errors occurred Output: the actual transmitted codeword from RM (r, m)
The symbol "+" represents an addition in Z 2 . If not more than δ/2 − 1 errors occurred, η equals the error pattern e such that the actual transmitted codeword c is returned. The term twostep decoding refers to the two steps in line 2 and 3 testing for majority.
C. The Complexity
At each of the four function levels, a specific function is called multiple times. All function calls at the same function level can be carried out simultaneously. In Table I , we specify for each function level how often the corresponding function is called (simultaneously) and how many inputs the function gets. In total, O nδ 2 functions are called in Chen's algorithm.
IV. IMPROVED DECODING ALGORITHM
Our new decoding algorithm consists of two majority-logic steps. In contrast to Chen, we test less times for majority and compute less check-sums. More precisely, we substitute Step b) in Chen's decoding procedure (Proposition 2) by a more efficient method, while we maintain Step a). There are two main reasons why our new algorithm is less complex than Chen's decoding procedure. First, instead of considering arbitrary flats for decoding, we use every r-flat for all its 2 r positions. Second, we never consider (r + 1)-flats. Instead, we developed a new approach where we focus solely on r-flats.
A. The Theoretical Approach
We start constructing a set of r-flats, F , having the characteristics specified in the following proposition.
Proposition 3 (Proposition 2.3 in [17] ): There exist δ · (δ − 2) r-flats in Z m 2 such that the intersection of any two of them has at most size 1 and every v ∈ Z m 2 is contained in exactly δ − 2 of these r-flats.
In the proof of this proposition, the authors of [17] verify the existence by demonstrating how to construct such a set of rflats. At the very beginning, 
We can state two facts. First, for every vector v ∈ Z m 2 and for every subspace
Second, every two r-flats have at most one vector in common because the intersection of the underlying subspaces is trivial. Thus, the set of r-flats
comprising δ · (δ − 2) r-flats, meets the conditions stated in Proposition 3. The algorithm we will propose operates on this set of r-flats. Before we present our algorithm, we will explain its mathematical background in Theorem 5 using the following notations.
Definition 4:
Theorem 5: a) An error occurred at position j ∈ {0, 1, . . . , n − 1}, i.e., e j = 0, if and only if at least δ/2 flats from F containing v j are odd. b) A flat w l,i + U l ∈ F is odd if and only if μ l = ς l,i . Before we prove Theorem 5, we state some general properties of flats.
Proof: a) Obvious. b) Clearly, (w l,i + U l ) , (w l,j + U l ) ⊂ W l,i,j and with a)
c) Follows from a) and b).
Proof of Theorem 5: Assertion a) directly follows from Proposition 1. Proceeding to part b), let i ∈ {0, 1, . . . , δ − 1} and l ∈ {0, 1, . . . , δ − 3} be arbitrary. We will prove that the following statements are equivalent.
i
Thus, by Proposition 1, the flat w l,i + U l is odd if and only if at least δ/2 of these (r + 1)-flats W l,i,j are odd resulting in the formula stated in ii). ii) ⇔ iii) Similarly to equation (1), we have
for every j = 0, 1, . . . , δ − 1, j = i, where the second equality follows from Lemma 6.
We show now that |{0 ≤ s ≤ δ − 1 | ς l,s = 1}| = δ/2. Suppose |{0 ≤ s ≤ δ − 1 | ς l,s = 1}| = δ/2. It follows from (2) that for every s = 0, 1, . . . , δ − 1 with ς l,s = 1
Applying the already proved equivalence i) ⇔ ii), we conclude w l,s + U l is odd for every s = 0, 1, . . . , δ − 1 with ς l,s = 1. Thus, δ/2 r-flats are odd. Since these r-flats are pairwise disjoint by Lemma 6 a), we have at least δ/2 errors, a contradiction. Hence, |{0 ≤ s ≤ δ − 1 | ς l,s = 1}| = δ/2.
Let us assume μ l = ς l,i . Then, by the definition of μ l and what we have shown before, there exist at least δ/2+1 scalars, say ς l,j0 , . . . , ς l,j δ/2 , being unequal to ς l,i . According to equation (2), we have e · χ W l,i,js = 1 for all s = 0, 1, . . . , δ/2.
On the other hand, assuming μ l = ς l,i , there are at most δ/2 − 1 scalars ς l,j differing from ς l,i . By equation (2), less than δ/2 of the e · χ W l,i,j , j = 0, 1, . . . , δ − 1, j = i, are 1.
B. The Algorithm
Our new algorithm is strongly based on Theorem 5. Tracing back in which r-flats every position is contained enables us to design the decoding procedure.
Therefore, we define mappings φ l , l = 0, 1, . . . , δ − 3, from {0, 1, . . . , n − 1} to {0, 1, . . . , δ − 1} ensuring that v i ∈ w l,φ l (i) + U l and therefore v i + U l = w l,φ l (i) + U l for all i = 0, 1, . . . , n − 1, l = 0, 1, . . . , δ − 3. Once the decoder has been constructed, this mapping between positions and r-flats is no longer needed. 
First, the scalar ς l,i is computed for every r-flat w l,i + U l ∈ F. Second, after evaluating the majority function at (ς l,0 , ς l,1 , . . . , ς l,δ−1 ) for each l = 0, 1, . . . , δ − 3, the value μ l is added to the scalars ς l,0 , . . . , ς l,δ−1 where the symbol "+" represents an addition in Z 2 . This guarantees with reference to Theorem 5 that each ς l,i equals one if and only if w l,i + U l is odd. Finally, the value one is assigned to η j if and only if the majority of the scalars ς 0,φ0(j) , ς 1,φ1(j) , . . . , ς δ−3,φ δ−3 (j) assumes one. Provided not more than δ/2 − 1 errors occurred, η equals the error pattern e and c = z + η.
C. The Complexity
Our algorithm consists of five function levels. Analogous to Chen's algorithm, a specific function is called multiple times at each function level and all function calls at the same function level can be carried out simultaneously (see Table II ). Because m ≥ 2r and therefore, δ 2 ≥ n, overall, O δ 2 functions are called in our algorithm.
D. An Example for RM (2, 5) with Electronic Schematic
For every i, i = 0, 1, . . . , 31, let the vector v i :=
2 be the binary representation of i such that
For reasons of clarity, we primarily write i instead of v i .
The Reed-Muller code RM (2, 5) is an [n = 32, k = 16, δ = 8]-code correcting three errors. A generator matrix G is given by ⎛ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
First, the decoder itself needs to be created. As presented in Section IV-A, we construct six two-dimensional subspaces U 0 , U 2 , . . . , U 5 The mappings φ l , l = 0, 1, . . . , 5, are specified in Table III ensuring v i + U l = w l,φ l (i) + U l for all i = 0, 1, . . . , 31.
After constructing the underlying geometrical structure of our decoder, we consider the following example.
Let m = (1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0) be the message word. Then, c = m · G = 1 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 1 1 0 1 0 is the corresponding codeword from RM (2, 5) . Suppose c was sent through a noisy channel and z = 0 0 1 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 1 1 0 1 1 was received with errors at positions 0,1 and 31.
The decoding can be performed as stated in Section IV-B. Input: z 1: (ς 0,0 , ς 0,1 , . . . , ς 0,7 ) = (0, 1, 1, 1, 1, 1, 1, 1) , (ς 1,0 , ς 1,1 , . . . , ς 1,7 ) = (0, 0, 1, 0, 1, 1, 1, 1) , (ς 2,0 , ς 2,1 , . . . , ς 2,7 ) = (0, 1, 0, 1, 1, 0, 1, 1) , (ς 3,0 , ς 3,1 , . . . , ς 3,7 ) = (0, 1, 0, 1, 1, 0, 1, 1) , (ς 4,0 , ς 4,1 , . . . , ς 4,7 ) = (0, 0, 1, 1, 1, 1, 1, 0) , (ς 5,0 , ς 5,1 , . . . , ς 5,7 ) = (1, 1, 0, 0, 0, 0, 0, 1) , 2: μ 0 := μ (ς 0,0 , ς 0,1 , . . . , ς 0,7 ) = μ (0, 1, 1, 1, 1, 1, 1, 1 0 , ς 1,1 , . . . , ς 1,7 ) = μ (0, 0, 1, 0, 1, 1, 1, 1) = 1, μ 2 := μ (ς 2,0 , ς 2,1 , . . . , ς 2,7 ) = μ (0, 1, 0, 1, 1, 0, 1, 1) = 1, μ 3 := μ (ς 3,0 , ς 3,1 , . . . , ς 3,7 ) = μ (0, 1, 0, 1, 1, 0, 1, 1) = 1, μ 4 := μ (ς 4,0 , ς 4,1 , . . . , ς 4,7 ) = μ (0, 0, 1, 1, 1, 1, 1, 0) = 1, μ 5 := μ (ς 5,0 , ς 5,1 , . . . , ς 5,7 ) = μ (1, 1, 0, 0, 0, 0, 0, 1) = 0, 3: (ς 0,0 , ς 0,1 , . . ., ς 0,7 )
Pμ
Pμ Fig. 1 . The proposed decoder for RM (2, 5) with input z and output c provided not more than three errors occurred. = (ς 0,0 , ς 0,1 , . . . , ς 0,7 ) + (μ 0 , μ 0 , μ 0 , μ 0 , μ 0 , μ 0 , μ 0 , μ 0 ) = (1, 0, 0, 0, 0, 0, 0, 0) , (ς 1,0 , ς 1,1 , . . ., ς 1,7 ) = (1, 1, 0, 1, 0, 0, 0, 0) , (ς 2,0 , ς 2,1 , . . ., ς 2,7 ) = (1, 0, 1, 0, 0, 1, 0, 0) , (ς 3,0 , ς 3,1 , . . ., ς 3,7 ) = (1, 0, 1, 0, 0, 1, 0, 0) , (ς 4,0 , ς 4,1 , . . ., ς 4,7 ) = (1, 1, 0, 0, 0, 0, 0, 1) , (ς 5,0 , ς 5,1 , . . ., ς 5,7 ) = (1, 1, 0, 0, 0, 0, 0, 1) , 4: η 0 := μ ς 0,φ0(0) , ς 1,φ1(0) , . . . , ς 5,φ5(0) = 1, η 1 := μ (1, 1, 1, 1, 1, 1) = 1, η 2 := μ (0, 1, 1, 0, 0, 1) = 0, η 3 := μ (0, 1, 1, 0, 1, 0) = 0, ... η 30 := μ (1, 0, 0, 0, 0, 0) = 0, η 31 := μ (1, 1, 1, 1, 1, 1) = 1, Fig. 1 and Fig. 2 The parity-majority module P μ corresponding to any l ∈ {0, 1, . . . , 5} with input z ψ l (0) , z ψ l (1) , . . . , z ψ l (31) and output ς l,0 , ς l,1 , . . . , ς l,7 ∈ Z 8 2 . In the first layer, even parity generators compute the check-sums and return ς l,0 , ς l,1 , . . . , ς 1, 7 from top to bottom. The majority gate in the second layer returns μ l . Using XOR gates, μ l and ς l,0 , ς l,1 , . . . , ς l,7 are combined in the third layer.
sons of clarity and comprehensibility, we structure the decoder (see Fig. 1 ) such that six identical modules, one for every l = 0, 1, . . . , 5, execute line 1, line 2 and line 3 of the proposed algorithm (cf. Section IV-B). A schematic of such a paritymajority module, denoted by P μ, is presented in Fig. 2 . The blocks labeled with ω 0 , ω 1 , . . . , ω 5 and ω −1 0 , ω −1 1 , . . . , ω −1 5 do not contain any logic gate. They just represent fixed wirings permuting the 32 inputs. The corresponding permutations ψ 0 , ψ 1 , . . . , ψ 5 are specified in Table IV. More precisely, within the block ω l , the input signals, indexed from 0 to 31, are rearranged in the order ψ l (0), ψ l (1), . . . , ψ l (31) such that the i-th signal comes on position j where ψ l (j) = i. Thus, the 32-bit input of the l-th module P μ is just z ψ l (0) , z ψ l (1) , . . . , z ψ l (31) . The module P μ processes these signals and returns the eight output signals (ς l,0 , ς l,1 , . . . , ς l,7 ). Recalling that ς l,i , l = 0, 1, , . . . , 5, i = 0, 1, . . . , 7, states whether w l,i + U l is odd or even, every signal ς l,i needs to be conveyed to those four different majority gates corresponding to the four vectors contained in w l,i + U l . Therefore, within block ω −1 l , the 32 signals are reordered such that the signal on position i, i = 0, 1, . . . , 31, is transferred to position ψ l (i). Applying this second permutation, it is ensured that the i-th signal yields information for determining the i-th entry of the codeword, c i .
V. COMPARISON OF COMPLEXITY
In this section, we compare our algorithm with Chen's algorithm in terms of number of function calls as well as in terms of depth and size of circuits realizing the algorithms. Clearly, the number of function calls is correlated with time complexity where depth and size of a circuit provide information about parallel time and space consumption, respectively. 
A. Number of Function Calls
An overview of the executed functions with respect to the number of inputs and how often each is called in Chen's and the proposed algorithm is provided in Table V . Apparently, decoding with our method instead of Chen's algorithm reduces the number of check-sums to be computed by an order of n and the number of majority votes to be decided by an order of δ. The parameterized data of Table V is illustrated by way of example in Table VI .
B. Size and Depth of Combinational Circuits
We want to investigate the size and depth of combinational circuits realizing Chen's and the proposed decoding algorithm. Therefore, we need to consider concrete implementations of the functions, majority vote and check-sum.
In the following, we assume majority voting is performed in constant time by a single majority gate, a specific linear threshold gate. Linear threshold gates compute for a given threshold T ∈ R and for given weights w 1 , . . . , w s ∈ R the Boolean function ϑ :
(cf., e.g., [9, ch. 1, sect. 1.1]). Thus, a majority gate with s inputs is a linear threshold gate where each weight equals one and the threshold equals s/2 + 1.
An even parity generator is a combinational circuit which computes the even parity bit from the input bits. The even parity bit is set to one if and only if the number of input bits which take on the value one is odd. Every check-sum z · χ S , S := v i1 , v i2 , . . . , v i |S| ⊆ Z m 2 , can be calculated by an even parity generator taking z i1 , z i2 , . . . , z i |S| as input.
Even parity generators of depth log 2 (N ) can be simply built out of N − 1 XOR gates. It is not surprising that even parity generators with N inputs and of constant depth require more than a polynomial (in N ) number of unbounded fan-in AND, OR and NOT gates [6] . But by using linear threshold gates, constant depth and polynomial size can be achieved. Minnick showed in 1961 that an (2N )-bit even parity generator of depth 2 can be constructed with N + 1 linear threshold gates [2] . Furthermore, at most 2 √ 2N − 2 + 4 linear threshold gates are required for an (2N )-bit even parity generator of depth 3 [8] . In fact, the parity function with N inputs can be realized by a threshold circuit of any given depth d ≥ 2 and size O dN 1/(d−1) [7] .
Recalling the particular function levels of our new algorithm, it can be implemented in a circuit of any given depth d ≥ 6 and size s New (d) = O δ 2 · d · (n/δ) 1/(d−5) . In this case, the circuit consists of two layers of XOR gates, two layers of majority gates and d − 4 layers of linear threshold gates. On the other hand, Chen's algorithm can be realized by a circuit of depth d ≥ 5 and size s Chen (d) = O n · δ 2 · d · (n/δ) 1/(d−4) with one layer of XOR gates, two layers of majority gates and d−3 layers of linear threshold gates. Note that for all d ≥ 6,
where min d≥6 δ · (n/δ) 1−1/((d−4)(d−5)) = δ · (n/δ) 1/2 = (δn) 1/2 .
Hence, using our new instead of Chen's algorithm, the size of the decoder with a fixed depth can be reduced by at least an order of (δn) 1/2 . Furthermore, compared to our depth-efficient decoder, the number of gates in a size-efficient circuit realizing Chen's algorithm is still higher by an order of at least δ: s Chen (d)/s New (6) = O δ · (n/δ) 1/(d−4) .
VI. CONCLUSION
In the present paper, we proposed a new hard-decision majority-logic decoding algorithm for Reed-Muller codes RM (r, m) with m ≥ 3, m/2 ≥ r ≥ 1. We showed how to design the decoder by explaining how to construct its underlying geometrical structure. Therefore, our algorithm is easy to implement in both software and hardware. In embedded systems, the proposed decoder can be realized by a simple non-clocked combinational circuit without any registers or flip-flops. Regarding the number of operations, recursive decoders [10] , [13] usually outperform those based on majoritylogic [1] , [3] , [4] and the proposed one. However, if decoding is to be performed as fast as possible, parallel processing of the functions is appropriate. Clearly, this cannot be sufficiently achieved by recursive algorithms. Their decoding hierarchy is too deeply nested in order to allow fast parallel decoding. Therefore, if algorithms are evaluated on the basis of the required parallel time, majority-logic decoding is preferable to recursive decoding.
We aimed to construct an algorithm which decodes in constant parallel time but is less complex than the best known majority-logic decoders. In fact, Chen's [3] , [4] as well as the presented algorithm offers decoding with a constant level of nesting. Nevertheless, using the new method instead of Chen's algorithm, the number of function calls and space consumption can be reduced by at least an order of n and δ, respectively. Thus, the proposed decoder is a good candidate when massively parallel decoding of all bits in real-time or near real-time is desired.
