Reliable computer systems employ error control codes (ECCs) to protect information from errors. For example, memories are frequently protected using single error correction-double error detection (SEC-DED) codes. ECCs are traditionally designed to minimize the number of redundant bits, as they are added to each word in the whole memory. Nevertheless, using an ECC introduces encoding and decoding latencies, silicon area usage and power consumption. In other computer units, these parameters should be optimized, and redundancy would be less important. For example, protecting registers against errors remains a major concern for deep sub-micron systems due to technology scaling. In this case, an important requirement for register protection is to keep encoding and decoding latencies as short as possible. Ultrafast error control codes achieve very low delays, independently of the word length, increasing the redundancy. This paper summarizes previous works on Ultrafast codes (SEC and SEC-DED), and proposes new codes combining double error detection and adjacent error correction. We have implemented, synthesized and compared different Ultrafast codes with other state-of-the-art fast codes. The results show the validity of the approach, achieving low latencies and a good balance with silicon area and power consumption.
I. INTRODUCTION
As technology scaling increases, the information stored in key elements of a computer system, such as registers and memories, may be perturbed by different physical mechanisms [1] - [3] . Traditionally, error control codes (ECCs) have been extensively employed in computers as a very efficient method to protect information against errors [4] . The design of ECCs is in continuous evolution, adapting their coverage to new design needs and error conditions. However, when using ECCs to increase computers reliability, the protected circuits also increment their delay (due to data encoding and decoding processes), silicon area occupied and power consumption (both caused by the additional interconnection lines and/or storage needed, as well as by the encoder and decoder circuits).
Hence, the challenge when designing ECCs is to reduce this overhead. According to the requirements, the design The associate editor coordinating the review of this manuscript and approving it for publication was Luca Cassano.
of new ECCs tries to provide a good balance between error coverage, redundancy, and efficiency of their encoding and decoding circuits in terms of area, delay and power consumption.
For example, the main objective of an ECC designed to protect the memory is to reduce, for a given coverage, the redundancy. That is, the number of (redundant) bits added by the ECC [5] , [6] . As they are added to each word stored in memory, its minimization is a key criterion in the ECC design.
Nevertheless, redundancy is not so important for all computer components. For instance, the register file is an integral element of any microprocessor architecture. Although its overall area is small, this is one of the most frequently accessed components. Corrupted data in registers can quickly spread to other elements of the system, due to their high access rate. If protected by using an ECC, the temporal overhead of the register file should be reduced as much as possible, as this element is in the critical path of the processor pipeline. An excessive delay introduced by encoding and decoding operations might cause the register file to be a VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ bottleneck, requiring a longer clock cycle and resulting in a reduction on the working frequency. In the same way, the clock cycle may also be affected when adding ECCs to other registers, like inter-stage latches in pipelined processors. Current technology scaling processes enable manufacturing high-speed (faster) and high-density (smaller) processors, but this makes registers to be much more sensitive to errors. The problem becomes more important as the voltage level supply used to operate the registers decreases [7] . In this case, the memory cell critical charge and the energy needed to provoke a single-event upset (SEU) in storage is reduced [8] . As shown by different experiments, in addition to traditional single-cell upsets (SCUs), this energy reduction can provoke multiple-cell upsets (MCUs), i.e. simultaneous errors in more than one cell induced by a single particle hit [2] , [9] , [10] . Usually, they occur in the same word and, very likely, in adjacent bits [11] .
As deduced from previous paragraphs, protecting registers against multiple adjacent errors seems of upmost importance [6] . This challenge increases as soft error rate does, as predicted in [12] . In this way, the mechanisms for register protection should be as fast as possible [13] . Different approaches are summarized in [14] and [15] . Other remarkable alternatives are the use of interleaving of simple codes [4] (employed in this paper for comparison) and the use of hardened memory cells [16] . Although some authors conclude that ECCs are a less interesting option, modern processors for servers use ECCs in registers for fault tolerance (e.g. Intel R Xeon R Processor E7 Family [17] ).
This paper focuses on low-delay, multiple adjacent error correction codes. Recently introduced, Ultrafast error control codes are a family of ECCs with very low encoding/decoding latencies. Moreover, the logic depth of the circuits does not depend on the data word length. Ultrafast codes with single error correction (SEC), single error correction-double error detection (SEC-DED) and single error correctiondouble adjacent error correction (SEC-DAEC) capabilities were introduced in [14] and [15] . In this work, the error coverage is still hardened, combining double error detection with multiple adjacent error correction (xAEC), obtaining new Ultrafast SEC-xAEC-DED codes. These codes have been obtained applying the Flexible Unequal Error Control (FUEC) methodology, developed and described in [18] . The delays of the new codes mainly depend on the error coverage, although a small dependency on the data word length may appear due to implementation details, as explained later.
Ultrafast codes have been implemented and synthesized in order to validate the error coverage and to measure the delay, area and power overheads. They have also been compared to other state-of-the-art fast codes, obtaining interesting results.
Therefore, the main novelty presented in this paper is the combination of double error detection with multiple adjacent error correction, achieving new Ultrafast SEC-xAEC-DED codes. Furthermore, we have complemented the comparisons performed in [14] and [15] . In this work, we have implemented different designs in VHDL hardware description language, and we have employed CMOS standard cell synthesis to evaluate and compare them.
Although Ultrafast codes have been presented as a possible solution for register protection, it is remarkable that their usefulness is not restricted to this case. If the required redundancy is affordable, Ultrafast codes offer fast encoding and decoding processes with a moderate area and power overhead. For instance, they can also be useful to protect high-speed memories or caches. This paper is organized as follows. Section II introduces previous works and basic concepts about error correction codes. The methodology used to build the proposed codes is described in Section III. Section IV includes an evaluation of the codes and a comparison with previous proposals. Finally, Section V presents conclusions and ideas for future work.
II. RELATED WORK A. BASICS ON ERROR CONTROL CODING
An (n, k) binary ECC encodes a k-bit input word in an nbit output word [19] . The input word u = (u 0 , u 1 , . . . , u k−1 ) is a k-bit vector that represents the original data. The code word b = (b 0 , b 1 , . . . , b n−1 ) is an n-bit vector, where the (n − k) added bits are called parity, code or redundant bits. b is transmitted through an unreliable channel that delivers the received word r = (r 0 , r 1 , . . . , r n−1 ). The error vector e = (e 0 , e 1 , . . . , e n−1 ) models the error induced by the channel. If no error has occurred in the i-th bit, then e i = 0; otherwise, e i = 1. Therefore, r can be interpreted as r = b ⊕ e. Fig. 1 synthesizes this encoding, channel crossing and decoding processes.
The parity check matrix H (n−k)×n of a linear code defines the code [4] . For the encoding process, b must meet the requirement H · b T = 0. For syndrome decoding, syndrome is defined as s T = H · r T , and it exclusively depends on e:
There must be a different syndrome s for each correctable error vector e. Syndrome decoding is done through a lookup table that relates each s to the decoded error vectorê. If s = 0, we can assume thatê = 0 and r is correct. Otherwise, an error has occurred. The decoded code wordb is calculated aŝ b = r ⊗ê. Fromb, it is easy to obtainû by discarding the parity bits. If the fault hypothesis used to design the ECC is consistent with the behavior of the channel,û and u must be equal with a very high probability.
Referred to a binary word, the term Hamming weight w denotes the number of ones in that word. As explained later, the Hamming weights of rows and columns of the parity check matrix determine the properties of a code.
Hamming Single Error Correction (SEC) codes [5] can correct an erroneous bit with simple and fast encoding and decoding operations, and the lowest redundancy. An example of implementation for these codes can be found in [14] .
Extended Hamming codes are able to correct single errors and detect double errors. They need an additional parity bit (calculated as the even parity for the whole encoded word) to achieve the double error detection. Additional explanation, with an example of implementation, can be found in [15] .
Hsiao SEC-DED codes [20] are an optimized version of Extended Hamming codes. They are optimal minimum oddweight-column codes, i.e. all columns of the parity check matrix have an odd number of ones, allowing the DED coverage. The detection logic is simplified, achieving lower delay, silicon area and power consumption than conventional Hamming SEC-DED codes. We will compare them to Ultrafast codes in Section IV.
Increasing the error coverage frequently makes ECCs more complex and slower. For example, correction of multiple errors in adjacent bits is achieved by the codes presented in [6] (up to three) and in [21] (up to four). Multiple random error correction is achieved by well-known BCH codes [4] . All these codes are designed to reduce the redundancy, but at the cost of more complex and slower decoders.
B. FASTER ERROR CONTROL CODES
Hamming or Hsiao codes have been employed to protect registers and memories against single and double errors. However, as working frequency of VLSI systems increases, reducing the delays introduced by the encoding and decoding circuits becomes of paramount importance.
Different approaches can be found in the literature. The terms ''fast'', ''high speed'', ''low delay'', etc. are frequently associated to different ECCs. Anyway, what is considered ''fast'' depends on each application.
For instance, a method to reduce the timing impact of an ECC for multilevel flash memories is described in [22] . In multilevel flash memories, each memory cell stores more than one bit. Thus, from a digital viewpoint, each cell stores a multibit symbol, and a faulty cell may result in several erroneous bits (belonging to the same symbol). The proposal is compared to other classic approaches, and the results show a reduction in the temporal overhead. The strategy is to increase the redundancy to reduce both the Hamming weight of the parity check matrix and the weight of the heaviest row. Nevertheless, non-binary symbol codes are commonly more complex than binary codes.
Fast decoding using binary ECCs is proposed in [23] , but only for a subset of critical bits. That is, the original data includes ''standard'' bits and a small number of ''important'' bits. The proposed ECCs correct single errors and they can decode faster the ''important'' bits. The strategy, in this case, is based on correcting the ''important'' bits using only a subset of the parity bits. Recently, these codes have been improved to correct adjacent errors [24] .
Fast decoding for the whole word is the objective of Orthogonal Latin Square (OLS) codes [25] . They are onestep majority logic decodable codes. ''One-step'' means that the decoding is performed in a combinational circuit (without iterative steps). This circuit implements a voter of several check bits to correct, if necessary, the received bits. The majority voter constitutes a simple and fast corrector. These codes correct random errors, at the cost of a high redundancy and additional overhead. In addition, these codes only exist for a few word lengths.
Correction of adjacent errors in a simple and efficient way is achieved by the codes presented in [26] . They combine multiple error detection with vertical parity in a twodimensional layout. Due to their fast decoding, these codes will be compared to our proposal in Section IV.
Finally, Low Delay (LD) codes proposed in [27] and [28] can be applied to CPU registers protection. They share objectives with our proposal. Due to the special interest for our work, they are explained in detail in Section II.C.
A problem of all the aforementioned codes is that the logic depth of the encoder and decoder circuits scales with the word length. Hence, their latency will grow with longer words. As explained later, one of the main advantages of our proposal is that the logic depth introduced by the encoder and decoder circuits do not depend on the word length.
C. LOW DELAY CODES
Low Delay (LD) SEC and SEC-DED codes [27] reduce the time required to correct 1-bit errors when only the correction of data bits is needed. These codes take advantage of the minor interest of correcting parity bits in registers, as the information stored is not rewritten in the same register once it has been read, and the input data come from other processor elements.
The idea behind LD codes is minimizing the number of 1s in each column and in each row. By reducing the number of 1s in the columns, the decoding logic (i.e. the implementation of the lookup table) can be simplified if all the columns for data bits have the same Hamming weight w (more details can be found in [27] ). As the columns for parity bits have w = 1, the columns for data bits will have w = 2 if only single error correction is required. If additional double error detection is desired, the columns for data bits will have w = 3.
On the other hand, the delay when computing the parity bits in the encoder and the syndrome bits in the decoder can be decreased by reducing the number of 1s in the rows.
Summarizing, the main characteristics of LD codes are:
• Encoder and decoder circuits are simpler than equivalent Hamming codes, presenting equal or lower logic depths. This reduces the delay.
• LD codes slightly increase the redundancy over traditional SEC and SEC-DED codes.
• LD codes only correct errors in data bits. VOLUME 7, 2019 • The logic depth of the encoder and decoder circuits scales with the word length, as stated above.
Other Low Delay codes, with single and double adjacent error correction properties, were presented in [28] . These codes take advantage of the simplification of logic functions when implementing the lookup table, among other techniques.
D. PREVIOUS WORKS ON ULTRAFAST CODES
Ultrafast SEC codes were firstly introduced in [14] . We designed them assuming that redundancy is not the main concern, and focusing on reducing the temporal overhead. Although these codes present a higher redundancy than equivalent Hamming codes, the delay introduced is very short. The requirements to build their parity check matrices are: 1) Each column must be different and nonzero. It allows the correction of single errors. 2) Each column assigned to code bits must have w = 1. It allows easy encoding operations. 3) Each column assigned to data bits must have w = 2.
If the correction of parity bits is not required, it allows simpler error location in the decoder circuit [27] . 4) Each row must have w = 3. It allows decoder circuits whose logic depth does not scale with the word length.
Matrices, circuits and more information can be found in [14] . A similar approach was independently presented in [29] . However, these approaches do not allow the double error detection. In order to ease this coverage, Ultrafast codes were reformulated in [15] .
The new formulation allowed the design of Ultrafast SEC-DED codes [15] . Their parity check matrices are generated using these requirements: 1) Each column must be different and nonzero.
2) Each column assigned to code bits must have w = 1.
3) Each column assigned to data bits must have w = 3. 4) Each row must have w = 4. Conditions 2 and 3 determine the new coverage: all the columns of the parity check matrix have odd weight. Therefore, all syndromes for single errors have odd weight, whereas all syndromes for double errors have even weight. This allows the detection of 2-bit errors. Condition 4, as stated above, allows decoder circuits whose logic depth does not scale with the word length.
Examples, with their parity check matrices, circuits and detailed explanation and information can be found in [15] .
As adjacent errors are becoming more and more frequent, we also presented Ultrafast SEC-DAEC codes in [15] . They can correct single and double adjacent errors. To add this coverage, a new requirement must be considered:
As all columns have odd Hamming weight, single errors produce an odd weight syndrome and double adjacent errors generate an even weight syndrome. If all these syndromes are different, double adjacent errors can be corrected.
An example, with its parity check matrix, is detailed in [15] . The complete process to design the decoder circuit, where the simplification of logic functions is essential to achieve the 4-gate delay, is also described. As explained in Section III, this methodology is employed again to get low decoding latencies.
III. ULTRAFAST ERROR CONTROL CODES A. DELAY INDEPENDENT OF THE WORD LENGTH
When redundancy is not the main concern, and reducing the temporal overhead is the main objective in ECCs construction (e.g. register protection), designers can employ different techniques, mainly based on minimizing the number of ones in rows and columns of the parity check matrix. With these premises in mind, Ultrafast codes presented in Section II.D were designed for fast encoding and decoding operations. Although these codes present a higher redundancy than equivalent codes, the delay introduced is very short.
Let us consider Ultrafast codes as formulated in [15] . These codes are well suited for high-speed operation because: i) encoders are 2-gate-delay circuits (assuming 2-input XOR gates); ii) decoders are 4-gate-delay circuits; and iii) these delays do not depend on the word length. As stated in Section II.A, the decoding process has two steps: syndrome computation, and error(s) location and correction. The delay introduced mainly depends on:
• The Hamming weight w of the heaviest row of the parity check matrix H, for the first part.
• The complexity of the code, for the second part. It is related to the weights of the columns of H, mainly a consequence of its error coverage.
The weights of the rows in H scale with the word length k for most of the codes. Hence, they have length-dependent delays. Conversely, Ultrafast codes keep constant the weights of the rows, independently of k. This can be achieved by adding parity bits, which is their main concern. In fact, for these codes, nk = k, that is, the number of parity bits is the same as the number of data bits (100% redundancy).
As Ultrafast codes have nk = k, n = 2k. Therefore, they have a k × 2k parity check matrix. By definition, the k columns for data bits must have w = 3, and the k columns for parity bits must have w = 1. For a given value of k, a parity check matrix will have w = 3k + k = 4k (data plus parity columns). Thus, a balanced matrix will have 4k / k = 4 ones on each row, independently of the value of k. Therefore, the encoding, as well as the first part of the decoding logic, has always the same logic depth, independently of the word length. In a hardware implementation, minimum variations may appear on their delays due to wiring, parasitic capacitances and inductances, and other implementation details. It applies to the previous Ultrafast codes published in [15] and to the new Ultrafast SEC-xAEC-DED codes presented in this work.
In addition, when only single error correction is required, we can get the lowest decoding latency if the decoder only corrects errors in data bits [27] . In this case, the second part of the decoding logic only depends on the weights of the columns for data bits of H. As w = 3 by definition, this part of the decoding process is also independent of the word length.
But when multiple error correction is required, different syndromes may indicate an error in the same bit. Even more, these syndromes may have different Hamming weights and, therefore, the technique proposed in [27] cannot be employed. In these cases, the simplification of logic equations helps to reduce the complexity of the decoder. In fact, the second part of the decoding logic, i.e. the implementation of the lookup table, is a truth table of a group of logic functions where the inputs are the syndrome bits, and the outputs are the bits of the estimated error vector. This technique was employed in [15] to obtain the Ultrafast SEC-DAEC decoder with 4-gate delay, and it is employed in this work as well.
For simplification, it is important to get as many ''don't care'' terms as possible. Due to their high redundancy, Ultrafast codes commonly have a high quantity of free syndromes that can be treated as ''don't care'' terms. In addition, Ultrafast codes have a low Hamming weight of their parity check matrices (2n for all Ultrafast codes) and a low average weight for their columns (2 ones per column). All these characteristics allow better simplifications, compared to other ECCs, where this ratio is frequently larger and dependent of k. Later, we deepen into this methodology, but the results show that the decoding delay mainly depends on the error coverage.
New Ultrafast SEC-xAEC-DED codes are presented next. As an example, codes for 8-bit data words have been designed, but longer word lengths can be easily achieved, as discussed in Section III.C. Some comparisons are shown in Section IV.
B. ULTRAFAST SEC-xAEC-DED CODES
The main novelty presented in this work is the enhancement of Ultrafast codes to correct multiple adjacent errors, and to detect double (non-adjacent) errors. It is done maintaining the redundancy and low delays (proportional to the error coverage), with reasonable increments in area and power overheads.
The requirements for these Ultrafast codes are:
1) Each column must be different and nonzero.
3) Each column assigned to data bits must have w = 3. 4) Each row must have w = 4. 5) All correctable errors must have different syndromes. 6) All detectable errors must have a syndrome that is different from all syndromes reserved for correction.
The explanation for requirements 1 to 4 is the same given in Section II.D. Searching a matrix that achieves requirements 5 and III-C may result very complex. We have used the Flexible Unequal Error Control (FUEC) methodology, presented in [18] . Although a detailed explanation of the methodology is out of the scope of this paper, it is briefly summarized in the following.
After determining the values of n and k for the code to be designed, error patterns to be corrected and detected must be selected. Then, the parity check matrix H that satisfies (2) and (3) is searched. E + represents the set of error vectors to be corrected, and E is the set of error vectors to be detected.
That is, each correctable error must have a different syndrome (2), and each detectable error must have a syndrome that is different from all the syndromes generated by correctable errors (3).
To find the matrix, a recursive backtracking algorithm is used. It checks partial matrices and adds a new column only if the previous matrix satisfies the requirements. There are 2 n−k -1 combinations for each column. A big amount of these combinations is discarded, as the algorithm is configured to employ columns with Hamming weight 1 or 3 only, to meet requirements 2 and 3 for Ultrafast codes. In addition, we have included requirement 4 to discard partial matrices that have rows with Hamming weight w > 4.
As an example, using the FUEC methodology we have found a (16, 8) Ultrafast code which is SEC-5AEC-DED. That is, it can correct single errors and up to five adjacent errors, and it can detect double non-adjacent errors. Let H8 be 8 × 16 parity check matrix for this code. It can be represented as:
10000000 10100010 01000000 01000101 00100000 10101000 00010000 01010100 00001000 10001010 00000100 01010001 00000010 00101010 00000001 00010101
where I is the identity matrix (columns for parity bits), and A is the second half of the matrix, with the columns for data bits.
It is noticeable that the same parity check matrix can be used for lower error coverage. How to design a SEC-DAEC-DED decoder (or 3AEC/4AEC versions) using the same matrix is explained in Section III.D. Results are also evaluated in Section IV. As stated later, lower error coverage results in higher simplifications and, hence, in faster decoders.
The encoding and syndrome equations can be obtained from the parity check matrix. The encoding equations for this code are:
As it can be observed, the parity check matrix relates each parity bit (b 0 . . . b 7 ) with the data bits (u 0 . . . u 7 ) required for its calculation. Each parity bit is calculated XORing three data bits. Hence, the encoding implementation can be a circuit whose logic depth is two (assuming the use of 2-input XOR gates). As stated above, it does not depend on the word length.
The expressions for syndrome calculation are:
Again, the syndrome calculation can be implemented in a circuit whose logic depth is two, independently of the word length. Following the process shown in Fig. 1, next step is the implementation of the lookup table, i.e. the computation of theê vector as a function of the syndrome. If the design of the parity check matrix is correct, each correctable error will have a different syndrome. This step is described, for example, in [15] . The non-recoverable error (NRE) signal indicates that an error is detected but cannot be corrected. It is calculated in the same way. As the full lookup table is too long, some sample lines are included here.
In this truth table, we can observe 17 logic functions (16 bits from the estimated error vector, and the NRE signal). The input of these functions are the eight syndrome bits. Different cases can be described from the table:
• The no error situation generates the zero syndrome. • Single errors produce a syndrome that must coincide with the assigned column in the parity check matrix (e.g. 00000010 for r 1 or 01000101 for r 10 ).
• Multiple errors included in the correction coverage generate a syndrome that is the bitwise XOR of the columns assigned to the affected bits (e.g. 00000011 for r 0 and r 1 or 11111111 for r 6 , r 7 , r 8 and r 9 ). All the above conditions maintain the NRE signal inactive, as all of them represent correctable situations. Two additional situations can be found in the truth table for this code:
• Double non-adjacent errors must activate the NRE signal. The syndrome is also the bitwise XOR of the affected columns, but different errors may generate the same syndrome (e.g., 00000101 syndrome is produced by an error in r 0 and r 2 , but also by an error in r 12 and r 14 ). Theê vector cannot be determined, as it is not possible to know the positions of the erroneous bits.
• Once assigned all the above situations to the corresponding syndromes, several syndromes may remain unassigned (e.g. 11110111). They represent multiple errors not considered in the code coverage. Nevertheless, as they are very uncommon situations (according to the fault assumptions for which the code has been designed), they can be not considered for NRE activation. Instead, they can be employed for the logic simplification of ê i and NRE signals, as detailed in Section III.D. Finally, the output of the decoder is the result of XORing the received bits with the corresponding estimated error vector:b i = r i ⊗ê i .û is easily obtained just extracting the corresponding bits fromb.
C. CODES FOR LONGER DATA WORDS
Applying the FUEC methodology, matrices for codes with different word lengths and error coverages can be found. Nevertheless, as the number of parity bits grows, the process becomes heavier and it requires a lot of computation time. However, Ultrafast codes for long data words can be easily obtained combining matrices for shorter data words. As the common word lengths in computers are powers of two (8, 16, 32, 64. . . ), we can use Ultrafast (16, 8) matrices to generate (32, 16), (64, 32), etc. codes. For example, considering H8 matrix shown in (4), the 16 × 32 parity check matrix for an Ultrafast (32, 16) code, with the same redundancy and error coverage, can be generated as:
This new code has the same complexity, as the construction is equivalent to having two 16-bit sub-words -bits (0..7, 16..23) and bits (8..15, 24..31)-, each one covered by independent (16, 8) codes. It also maintains the error coverage:
• Single and multiple adjacent errors on each half word are corrected as in the (16, 8) code.
• Adjacent errors affecting both half words become two shorter adjacent errors for the (16, 8) codes.
• Double non-adjacent errors, where each bit belongs to a different half, become single errors for each code.
• Double non-adjacent errors inside a half are detected by the corresponding (16, 8) code.
Even more, we can modify H16 by means of column permutations to achieve better coverage. Alternating columns of both H8 matrices, the new H16' matrix represents an Ultrafast (32, 16) SEC-10AEC-DED, where c i H8 represents the i-th column of the H8 matrix. As it can be observed, a 10-bit adjacent error is treated as two 5-bit adjacent errors, which can be corrected by each original code.
Expressions (5) and (6) can be generalized for other word lengths (32, 64, . . . bits). For example:
As stated above, the objective of Ultrafast codes is to achieve encoding and decoding circuits as fast as possible. The ''real'' performance of a circuit will depend on several factors, such as the implementation technology, the logic depth of the signals or the complexity of the logic equations. Discussion about this matter can be found in [14] and [15] .
As described in Section II.A, the encoding operations and the syndrome calculation, the first part of the decoding, can be easily obtained from the parity check matrix. In most ECCs, the complexity of these operations depends on the word length. Conversely, the expressions for Ultrafast codes have low and constant complexity, independently of the word length, as stated in Section III.A.
Anyway, the most complex part of the decoder is the implementation of the lookup table, especially when several syndromes may indicate an error in the same bit (i.e. when the decoder allows multiple error correction). It is necessary to simplify the logic equations required to obtain ê i and NRE signals. Simplifying the expressions can reduce delay, area and power consumption of the corrector circuit.
Non-simplified equations can be obtained from the lookup table as sum of minterms or product of maxterms. Ultrafast codes take great advantage of their redundancy to get a high number of free syndromes, which become ''don't care'' terms for simplification. They can be obtained when the value of the error vector, or the NRE signal, is not important:
• Different errors can cause the syndromes assigned to double errors. As this situation is indicated by the NRE signal, the erroneous bits cannot be determined and these syndromes can be used to simplify the ê i signals.
• If the fault hypothesis considered for the design of the ECC is correct, the ''undefined'' syndromes only appear under very uncommon situations. Hence, the probability of a wrong detection/correction due to a bad estimation of the error vector is negligible. Therefore, the value of the error vector or the NRE signal is not significant, and these syndromes can be employed to simplify them. Additional simplification, if required, can be obtained by reducing the error coverage of a code. For example, the SEC-5AEC-DED code presented in (4) allows a great error coverage, maybe excessive. If, according to a given fault hypothesis, adjacent errors may affect two adjacent bits at most, we can design a SEC-DAEC-DED decoder using the same parity check matrix. In this case, the lookup table will consider as ''undefined'' the syndromes assigned to 3-to 5-bit adjacent errors, improving the simplification. In the same way, 3AEC/4AEC decoders can be designed.
Following with the above example, ê 10 will be the sum of 15 minterms (of 8 variables) if no simplification and maximum error coverage (SEC-5AEC-DED) is considered: Different matrices may meet the Ultrafast requirements enumerated in Section III.B. They may result in distinct minimizations and decoding delays. Further research is required VOLUME 7, 2019 to determine if it is possible to find matrices with better minimizations than matrix (4) .
An additional question that may help to design simpler decoders, is the necessity (or not) of correcting the parity bits. Of course, errors covered by a code must lead to a right decoding, independently of the kind (data or parity) of the bits affected. However, what is the result of a right decoding? Sometimes, onlyû is required (that is, only data bits; for example, when protecting processor registers, as commented before). Other times it is interesting to obtainb (i.e. data and parity bits; for example, when a scrubbing mechanism is employed in DRAMs). Obviously, correcting the parity bits will require additional silicon area and power consumption, and it may influence the delay of the decoder.
Ultrafast codes could correct parity bits, as all ê i can be calculated from the lookup table. Nevertheless, as our objective is to maximize the simplification, only data bits have been corrected in our designs.
All the techniques presented in this section have been applied in the design of Ultrafast circuits. In next section, they will be evaluated and compared to other state-of-theart codes. For a fair comparison, these techniques have been applied to all codes, when appropriate. For example, all decoders have been implemented for correcting data bits only.
IV. EVALUATION AND COMPARISON A. PREVIOUS CONSIDERATIONS
In previous works, Ultrafast SEC [14] and Ultrafast SEC-DED [15] codes have been compared to other codes, in terms of number of parity bits, logic depth and number of logic gates. The number of parity bits measures the redundancy. The logic depth of the encoder and decoder circuits is an estimator of the propagation delay introduced by those circuits. The number of logic gates influences the silicon area occupied and the power consumption of the circuits. These measurements allow an approximated estimation of the efficiency of the codes, but it is less accurate than the real synthesis for a given technology.
So, in this paper, the encoder and decoder circuits for all ECCs have been implemented in VHDL hardware description language. Then, using CADENCE software [30] , we have carried out a logic synthesis for 45-nm technology by using the NanGate FreePDK45 Open Cell Library [31] , [32] . Standard cells are based on SCMOS design rules. Power voltage and temperature conditions are 1.1V and 25 • C, respectively. Logic synthesis allows obtaining better estimation of the overhead induced by different ECCs. Although the main objective is diminishing the propagation delay of the circuits, information about the silicon area and power consumption is also compared.
As stated above, the simplification techniques employed to improve the performance of the circuits have been applied equally to all codes, when appropriate, for a fair comparison. Synthesis data consider all encoding and decoding steps, including the lookup table implementation. Prior to the synthesis, all compared codes have been simulated, injecting all possible correctable and detectable errors, and we have verified that the planned coverages have been achieved.
B. COMPARISON OF CODES FOR 8-BIT DATA WORDS
In this section, several codes for 8-bit data words are compared: a Hsiao code [20] , a SEC-DED Low Delay code [27] and a SEC-DED Ultrafast code [15] . In addition, the new SEC-xAEC-DED Ultrafast code described in Section III.B is included in two different versions: one with maximum coverage (i.e. SEC-5AEC-DED), and a simplified version with a SEC-DAEC-DED decoder (as described in Section III.D, this technique allows flexibility to adjust the adjacent error coverage to the design requirements).
All the compared codes share the ability of detecting double errors. Although their correction coverage is different, our aim is to compare our proposal with simple and fast codes. Fig. 2 compares the propagation delays of their circuits. Three different groups of measurements have been obtained: encoder delays, decoder delays (considering only the correction logic), and decoder delays for the detection logic. The correction logic is followed by the actual circuit logic. For example, when protecting processor registers, the correction logic is in the datapath. Hence, it increases directly the circuit delay [27] . Thus, the delay of the correction logic is an important parameter. Error detection is larger in most cases, and it will only be used to signal an unrecoverable error. Therefore, it only must be smaller than the clock cycle.
Regarding the encoder delays, Ultrafast codes achieve the best scores, whereas the Low Delay code obtains the worst result. Hsiao code outperforms Low Delay code due to a better balance in the Hamming weights of the rows in the parity check matrix. Anyway, the delays of both codes scale with the word length, whereas Ultrafast codes maintain their delay almost unchanged for longer data words.
Considering the propagation delay for the correction logic of the decoder circuit, Ultrafast SEC-DED and SEC-DAEC-DED codes obtain the best values. The Ultrafast SEC-DAEC-DED decoder is considerably (over 30%) faster than Hsiao and Low Delay codes, whereas increasing the error correction capabilities. The greatest delay is found on the Ultrafast SEC-5AEC-DED. This is an expected result, as these codes are designed for 8-bit data words, and 5 adjacent error correction is a very high coverage. Nevertheless, the delay is in the same order of magnitude, so it could be probably affordable if such error coverage would be required.
Although less critical, as stated above, the propagation delay for the detection signal is also included in the figure. Again, the best score is achieved by the Ultrafast SEC-DED code, and all other values are affordable, as explained before. Fig. 3 shows the silicon area occupied by the encoder and decoder circuits. Although Ultrafast codes present higher redundancy, the area employed by their encoder circuits is slightly smaller than Hsiao or Low Delay codes. Regarding the decoder circuits, the codes with the same error coverage (Hsiao, Low Delay and Ultrafast SEC-DED) occupy a similar area, with small variances. As the error coverage is improved, the decoder size increases, as expected. Fig. 4 presents the power consumption of the circuits. The trends are the same as observed for the silicon area: all the encoder circuits have very similar values; there are small differences in the decoders with the same error coverage; and the power consumption grows as the error coverage augments.
To conclude this comparison, it is noticeable that Ultrafast SEC-DED code gets the best scores in almost all ranks. When it does not have the best value, the difference with it is small. Ultrafast SEC-DAEC-DED code achieves very good results. If the redundancy is affordable and the error coverage meets the fault hypothesis, these codes can be a good choice. If additional error coverage is required, Ultrafast codes can correct longer adjacent errors. As an example, the SEC-5AEC-DED code compared here shows the upper bounds for the different values. Depending on the application and design requirements (error coverage, speed, etc.) this code (or 3AEC/ 4AEC versions) may result interesting.
C. CODES FOR LONGER DATA WORDS
The previous results show the good performance of Ultrafast codes when k = 8. Nevertheless, from the delay viewpoint, their main advantage becomes more evident as k raises. In this section, we study the evolution of propagation delay, silicon area and power consumption, for different codes and typical values of k in registers (8, 16, 32 , and 64 bits). In these comparisons, we have analyzed different high-speed codes:
• SEC-DAEC-DED and SEC-5AEC-DED Ultrafast: the (16, 8) compared previously, and (32, 16), (64, 32) and (128, 64) codes obtained as explained in Section III.C.
• SEC-DED Low Delay [27] : in addition to the (13, 8) used in previous comparisons, we have also included here the (22, 16) , (39, 32) and (73, 64) codes.
• SEC-DAEC Low Delay: the codes published in [28] . It is remarkable that these codes do not detect double errors. Anyway, their low delay decoders make them an interesting option for comparison.
• DEC Orthogonal Latin Square [25] : the (20, 8) code described in [33] , the (32, 16) code shown in [34] , the (55, 32) code presented in [35] , and the (96, 64) code defined in [36] .
• SEC-4AEC Two-Dimensional [26] : (20, 8) , (32, 16), (56, 32) and (104, 64) codes. They achieve very fast decoding due to a simple but efficient layout.
In addition, we have also obtained synthesis data for (13, 8) , (22, 16) , (39, 32) and (72, 64) Hsiao codes. They are not included in the figures, as their results are very similar to the SEC-DED Low Delay codes. Fig. 5 plots the delay introduced by the correction logic of the decoder circuits. As expected, the decoding delay for all non-Ultrafast codes raises as k does. This is mainly due to the increasing number of bits to be considered during syndrome computation, and the difficulty for obtaining good simplifications. The worst results for higher values of k can be observed for the SEC-DAEC Low Delay codes. logic complexity. Anyway, the increment has a significantly smoother slope, with a negligible difference between the 32-and 64-bit data word versions. Although this code has the worst delay when k = 8, the k = 64 version is only slower than the Two-Dimensional and the Ultrafast SEC-DAEC-DED codes, while having higher error coverage than both of them.
Notice the low latencies achieved by the Two-Dimensional codes. The 8-bit version is the fastest code, at the cost of a higher redundancy than Ultrafast codes. Faster results are also obtained by the 16-bit code. Anyway, these codes do not detect double errors, and the latency raises as k does, whereas it does not in Ultrafast codes. Fig. 6 shows the silicon area employed by the encoder and decoder circuits for the different codes. In this case, the value compared is the sum of the area of encoders and decoders. In general, codes with more error coverage get higher values. When k raises the area occupied augments proportionally. The lowest values correspond to the SEC-4AEC Two-Dimensional codes. Obviously, as they do not implement double error detection, their decoders are very simple. On the other side, the highest values are obtained for the SEC-5AEC-DED Ultrafast codes. The SEC-DAEC-DED Ultrafast codes reach affordable values, lower than the DEC OLS codes, eventually employed for the register file protection [37] . Fig. 7 presents the evolution of the power consumption for the different data word lengths. Again, the sum of the power employed for both encoder and decoder circuits is considered. The trends are similar to those observed for the silicon area in Fig. 6 . Again, the lowest consumption is achieved by the SEC-4AEC Two-Dimensional codes (remember that they lack of the detection circuitry), and the highest values belong to Ultrafast SEC-5AEC-DED codes. The Ultrafast SEC-DAEC-DED codes are less power consumers than the OLS codes.
To conclude this comparison, we must consider the different error coverages that each code can offer. OLS codes are the only compared codes that can correct all double errors. LD SEC-DED codes correct single errors and detect double errors. LD SEC-DAEC codes correct single and double adjacent errors. Finally, the SEC-4AEC Two-Dimensional codes correct single and up to 4-bit burst errors.
In the case of Ultrafast codes, if we consider the code construction proposed in (6) and (8) , the delays will be the same (or almost the same) whereas the error coverage augments: due to the alternated columns in the parity check matrices, the As stated above, depending on the error coverage requirements and the constraints about delay, area or power consumption, designers may use different Ultrafast codes. 3AEC/4AEC versions can also be employed to flexibly adjust the desired error coverage.
D. CASE EXAMPLE: A REGISTER FILE
The previous comparisons show the validity of the codes proposed in this paper: Ultrafast codes achieve the lowest delays with moderate area usage and power consumption, probably affordable by several applications. Nevertheless, data shown only consider the overhead generated by the encoder and decoder circuits. Is the additional memory required to store the redundant bits a problem? As commented previously, it could be a problem for high storage memories, but it is less important for small memory structures.
To validate this sentence, we have implemented and synthesized a MIPS-like register file. It has 32 registers to store 32-bit data words. It has two read ports and one write port, allowing two reads and one store in the same clock cycle. We have compared a Triple Modular Redundancy (TMR) version and four ECC-based schemes. The first scheme employs the (64, 32) Ultrafast SEC-8AEC-DED code obtained using matrix (4), simplifying the decoder to obtain the SEC-DAEC-DED version, and combined with the construction proposed in (8) . The second scheme interleaves four (13, 8) Hsiao SEC-DED codes, allowing the correction of four adjacent errors. The third code employed is the (40, 32) SEC-4AEC code proposed in [21] ; and the last scheme is the (56, 32) SEC-4AEC Two-Dimensional code [26] . These last codes correct up to four adjacent errors, but they do not detect all double errors.
The TMR design allows fast write operations, as it has to store the same information three times in parallel. It also has fast read operations, as it only requires 3-bit voters working in parallel. The error coverage is high, as it corrects all possible errors if they occur only in one of the copies of a register. The main drawback of this design is the high redundancy required (200%), as each data bit needs two additional copies.
The Ultrafast-protected scheme has 100% redundancy. Ultrafast SEC-DAEC-DED code is the fastest ECC with double error detection, in our best knowledge (at least, it is the fastest ECC from those compared in Sections IV.B and IV.C). Using matrix (8) , this code becomes SEC-8AEC-DED with the same delays, and a good error coverage.
The Hsiao-interleaved design has 62.5% redundancy. Interleaving simple codes is frequently employed for multiple adjacent error correction. Interleaving a SEC-DED code using distance four, the result is a SEC-4AEC-DED coverage.
The (40, 32) SEC-4AEC code proposed in [21] has 25% redundancy, the lowest value of all proposals. The (56, 32) Two-Dimensional code [26] has 75% redundancy. Table 1 shows the results obtained. The fastest solution is TMR (30% faster than Ultrafast design), but at the cost of a high power consumption (39% higher) and silicon area usage (44% higher). All other proposals are slower.
The best area and power records belong to the (40, 32) SEC-4AEC scheme, mainly due to its low redundancy. Ultrafast solution employs 47% more area, but only 18% more power consumption. In return, Ultrafast solution is 103% faster and has higher error coverage.
Designers may decide between different solutions, which one fits better to the design constrains and requirements. Our proposal offers fast decoding and flexible error coverage with moderate area and power overheads.
E. SUMMARY OF METHODS EMPLOYED TO REDUCE THE DELAY
The analysis performed in this section allows concluding that Ultrafast codes achieve the lowest delay with a high error coverage. Table 2 summarizes the main methods employed by the different codes to reduce the delay and the improved component of the ECC.
V. CONCLUSION
This paper presents Ultrafast codes, a family of error control codes especially designed for very fast encoding and decoding operations. These codes are particularly well suited for applications where the key requirement is a very low latency in the encoder and decoder circuits. This is the case of the protection of the information stored in registers in a microprocessor, for example. As the number of registers is small, the use of ECCs with high redundancy, like Ultrafast codes, may be affordable. In return, Ultrafast codes offer high-speed encoder and decoder circuits, and interesting error coverages. These codes can also be useful to protect highspeed memories or caches.
Firstly, we have summarized Ultrafast SEC, SEC-DED and SEC-DAEC codes, presented in previous works. Then, new Ultrafast SEC-xAEC-DED codes have been introduced, describing the design methodology and the implementation details. Several examples of Ultrafast codes have been implemented and simulated in order to validate the error coverage, as well as they have been synthesized by using a standard cell library to evaluate the overhead induced by the encoder and decoder circuits. Ultrafast codes have been compared to several fast existing alternatives. The results confirm that Ultrafast codes achieve very low propagation delays, whereas adding very reasonable increments in the silicon area and the power consumption. Using the method proposed in Section III.C to construct codes for longer data words, delays do not depend on the word length from a logic viewpoint, but it may appear a small dependency due to implementation details. This is a distinguishing characteristic of Ultrafast codes versus other error control codes, as they take great advantage of the simplification of logic functions and the low Hamming weights of rows and columns of their parity check matrices. The speed-ups obtained for longer data words are very interesting (up to 160% for 64-bit data words).
There is still a lot of ongoing work. The simplifications depend on the parity check matrix employed. A deeper study is required to determine how to find the matrix with the best simplifications. Additional research on Ultrafast codes for long data words, and applications to different memory structures, are part of the ongoing and future work.
