Abstract| This paper proposes a methodology, implemented in a tool, to automatically generate the main classes of Error Control Codes (ECCs) widely applied in computer memory systems to increase reliability and data integrity. New code construction techniques extending the features of previous Single Error Correcting (SEC) -Double Error Detecting (DED) -Single Byte Error Detecting (SBD) codes have been integrated in the tool. The proposed techniques construct systematic odd-weight-column SEC-DED-SBD codes with Odd-bit-per-byte Error Correcting (OBC) capabilities to enhance reliability in high speed memory systems organized as multiple-bit-per-chip or card. The proposed tool chooses the best suited error control code for the characteristics of the application and the design constraints and returns the VHDL description of the encoding/decoding circuits. The tool has been successfully applied for the design of a 64 data bit ECC contained in an ASIC designed for a multiprocessor system.
I. Introduction
To meet the increasing requirements of system reliability and data integrity, Error Control Codes (ECCs) have been widely exploited in the design of computer memory subsystems 1]. The error correcting capabilities of a code represent an e ective mean of increasing fault tolerance in computer applications: the memory subsystem fails only when the errors exceed the error correcting capabilities of the code. On the other hand, error detecting capabilities of a code aim at avoiding data loss, thus the ECC should have the capability of detecting the most likely errors that are uncorrectable. The memory organization and the distribution of memory failure modes represent the major factors in de ning the best class of codes for a given application. Other factors, such as the number of redundant bits, the area and the speed required by encoding/decoding circuits, have to be considered during the choice of a code for a computer memory subsystem.
This work aims at providing a methodology, implemented in a modular and e cient tool, to automaticallydesign ECCs and the corresponding encoding/decoding logic from the code speci cations. The proposed tool, called GECO (GEnerator of COdes), integrates the main classes of codes suitable for high performance computer applications, as identi ed in literature. However, its modularity allows an easy introduction of new classes of codes. In particular, the tool automatically generates the parity check matrix, the VHDL description of the encoder/decoder and the related performances.
The tool can be used in parallel to any VHDL-based commercial design ow, since the generated VHDL models can be considered as a set of models in the VHDL-based hierarchical design description at system level. The methodology allows the evaluation of architectural trade-o s according to prede ned optimization criteria such as area, clock frequency, power consumption and so on. To increase the reliability level of a computer memory system with respect to those systems employing conventional SEC-DED-SBD codes, the authors proposed new code construction techniques 12] providing systematic odd-weight-column SEC-DED-SBD codes in which the class of correctable errors also includes any odd weight error pattern in a single byte by adding redundancy. A few additional check bits (at most four) are required by the proposed codes with respect to SEC-DED-SBD codes in order to extend the protection including the correction of at least 50% of the possible multiple errors per byte. However such an overhead is reasonable with respect to the redundancy of SBC-DBD codes: the proposed codes require almost half of the redundant bits with respect to those required by SBC-DBD codes for b > 8.
This paper extends the works presented in 13] and in 14], and shows also that the proposed codes are suitable for high performance VLSI implementations in computer applications, by using high speed encoding/decoding circuits and parallel data processing.
The paper is organized as follows. Section II proposes an overview of the methodology and the derived architecture of the software advisor, while the main strategies applied to implement the code classes considered in GECO are examined in Section III. The same section brie y describes the new codes construction techniques (a formal description of these techniques can be found in 12]), while Section IV shows the main advantages o ered by these codes and how these codes can be implemented as VLSI circuits. Finally, some application results are given in Section V.
II. Automatic insertion of ECCs into VLSIs
The aim behind the use of the proposed tool GECO is to provide system designers with an easy-to-use software advisor to increase the design time during the development of ECCs and the corresponding logic. The user-friendly interactive interface allows the user to choose among the code classes and code parameters, mainly the number of data bits (k), the number of check bits (r) and the byte length (b). GECO has been developed by using the object oriented methodology and the C + + language. Figure 1 schematically represents the GECO architecture, which is composed of the following main modules: the User Interface, the Controller, the Generator, the Separator, the VHDL Translator and the Internal Functions Manager such as the Galois eld Manager. A. The Controller According to the object oriented approach, the Controller is a virtual class from which as many classes as the corresponding classes of codes have been derived. Each Controller class contains the knowledge on the parameter relations and possible code construction techniques. When the User Interface receives a request from the user to change the value of a code parameter, the Interface addresses the request to the Controller. First the Controller veri es if the request is in the allowed parameter range, then the relations among the code parameters are recomputed. When the parameter values satisfy the user, the Controller calls the Generator to activate the corresponding optimal construction algorithm. B. The Generator
The Generator module consists of a set of functions implementing the code construction algorithms for every code class and parameter con guration. Up to now, the ve classes of codes described in Section III are included in the Generator. Some algorithms use specialized objects to implement di erent algebras such as the Galois elds algebra. The power representation is useful for multiplication, since the product of two elements i and j can be performed by adding their exponents and considering that 2 m ?1 = 1. On the contrary, the polynomial representation and its corresponding m-tuple representation is useful for addition, since the sum of two elements can be obtained by adding the corresponding components of each m-tuple representations, using the modulo-2 addition.
The duality in the representation has been implemented in the GF(2 m ) Manager: the generic nonzero element of GF (2 m Any function included in GECO can interact with the GF(2 m ) Manager, asking for elements of GF(2 m ) and functions representing possible operations among the elements.
D. The Separator
The Separator algorithm transforms a non-systematic code into a systematic or at least a separable code. This operation is always possible 3], 11], thus all code construction schemes generating non-systematic codes, call, as last operation, the Separator to derive the nal parity check matrix in systematic or, at least, separable form. As a matter of fact, the information bits appear unchanged in the codeword and it is not necessary that they appear in the leftmost (n?r) positions of the codeword, as in the case of systematic codes. In any case the information bits are maintained separated from the check bits, so that the encoding/decoding and the data processing can be performed in parallel and all information bits read out of the memory appear unchanged.
As basic de nition, a binary linear block code C can be described as the null space of a binary vector space generated by the row vectors of an (r n) matrix called parity check matrix and indicated as H (r n) , with n being the length of the codeword composed of k data bits and r check bits. A n-bit row vector X is a codeword in C if and only if HX T = 0, where X T denotes the transpose of X. When the H matrix is expressed as H = BI r ], being B an (r k) matrix and I r the (r r) identity matrix, then the code is called systematic. Given r, the Separator algorithm searches for a set of r columns of H (r n) in order to nd a submatrix of H, called A (r r) that is non-singular, thus invertible. Once the submatrix A is obtained, the inverse matrix A ?1 is computed and nally the A ?1 H product is performed to obtain the parity check matrix in separable/systematic form.
The algorithm can require a large amount of computation time for high values of n or r: in fact for a H (r n) it can require, in the worst case, the computation of the rank of n r matrices of size (r r). Thus, for some classes of codes some heuristic methods have been identi ed and implemented, such as in the case of SEC-DED-SBD codes proposed by Reddy in 3]. The method derives the columns of the A submatrix by selecting the b-columns of a generic byte of the parity check matrix H and every rst column of the other distinct bytes of H until r-columns have been identi ed. Then the method checks if the A matrix is invertible, otherwise the procedure is repeated with another group of b-columns and so on.
E. The VHDL Translator
Finally, the VHDL Translator reads the code characteristics and the parity check matrix and generates the VHDL hierarchical description of the basic blocks implementing the code: check bit generator, syndrome generator, syndrome decoder and error corrector. Two architecture bodies are de ned for each of these entities, to have both structural and behavioral descriptions. A deeper decomposition of these entities into lower level entities results in a hierarchy of design entities to be inserted in a system level VHDL description as ECC building blocks.
III. Implementation of code construction techniques
The main features of the ve classes of codes considered in GECO are examined in this section, to outline their application advantages and the implementation strategies adopted. In general, the overall complexity of the parity check circuits required by the given codes can be roughly estimated by examining the structure of the parity check matrix H. Basically, the global number of 1s in H determines the complexity of the hardware required: a lower number of 1s requires a less complex circuit 2]. In particular, in systematic codes the total number t i of 1s in the i-th row of H is related to the number of logic levels necessary to generate the corresponding check bit (C i ) or syndrome (S i ), as described in 16] .
Assuming the use of a v-inputs module 2 adder, the number of logic levels required to generate C i and S i are respectively given by l Ci = dlog v (t i ? 1)e and l Si = dlog v t i e, where dxe indicates the smallest integer greater than, or equal to, x. Therefore to obtain the fastest generation of check and syndrome bits, all t i for i = 1; 2; : : :; r should be minimum and equal, or as close as possible, to the average number given by the total number of 1s in H divided by the number of rows (r). A class of codes satisfying such criteria is called a minimum-equal-weight code 2]. Finally the n-modularized codes, whose encoding/decoding logic can be organized as n identical modules 10], have been preferred for their implementation advantages.
A. SEC-DED CODES
The Hsiao codes 16] have been implemented, since they satisfy the criteria of minimum-equal-weight code, the H matrix is systematic and the code is an odd-weight-column code. A code C is said to be an odd-weight-column code 16] if there exists a parity check matrix for C composed entirely of column vectors of odd weight, where the weight of a vector is the number of its nonzero bits. The Hsiao codes represent the optimal minimum odd-weight-column SEC-DED codes.
B. SEC-DED-SBD CODES
SEC-DED-SBD codes are useful to maintain data integrity in byte-organized memories, where the probability of byte errors is high. Both Reddy codes 3] and Chen codes 5] can be generated by GECO. Being these codes non-systematic, the Separator is called to convert them into a separable form. For a given set of values (k; r; b), the construction technique of H (r n) for Reddy codes is based on the composition of as many submatrices H i (r b) as the number of bytes in the codeword. Each submatrix H i is composed of two submatrices following two techniques, for b even or odd respectively, as illustrated in 3]. Chen codes can be constructed following several techniques as described in 5]. In particular, the rst two techniques can be applied to a prede ned SEC-DED-SBD code to obtain a new code with the same properties, but with (r + 1) redundant bits and (b + 1) bits per byte respectively.
C. SEC-DED-SBD-OBC CODES
The new class of SEC-DED-SBD-OBC codes described in 12] has been included in GECO, since it extends the protection provided by SEC-DED-SBD codes by adding few redundant bits and o ering implementation advantages, as shown in Section IV. As in the Reddy codes, the construction technique of H (r n) is based on the composition of as many submatrices H i (r b) as the number of bytes in the codeword. Depending on the values of (r; b), three techniques have been de ned to get the H i as in 12].
In particular, the rst technique (C 1 ) requires r = 2b, the second technique (C 2 ) requires r > 2b and the third technique (C 3 ) requires b + 2 r < 2b. In C 1 , the matrix B (r k) is de ned as: E. DEC-TED CODES For a memory sub-system with large capacity or with a high rate of hard and soft errors, the use of a t Error Correcting -d Error Detecting (tEC-dED) code can be effective. However such a code requires high redundancy, thus in practical applications Double Error CorrectingTriple Error Detecting (DEC-TED) codes are preferred. A class of DEC-TED codes can be constructed according to the theory of BCH codes 15] that, for a given set of parameter values (k, r and d min ), can be obtained as cyclic codes from a table containing the corresponding generator polynomial. Then, the generator matrix G of the code can be derived in a straightforward manner, from which the parity check matrix can be computed by GH T = 0. To simplify the operation to get the parity check matrix, it is convenient to transform the G matrix in systematic form G(k k) = I k P(k r)]. In this case, it is possible to prove that the H matrix is in systematic form and it is given by H(r n) = P T (k r)I r ].
IV. VLSI implementation of the proposed codes
Coding for high performance computer systems requires design techniques aiming at not only high reliability, but also high speed encoding/decoding and correction circuits and parallel data manipulation to maintain high throughput. In this section, the main features of the proposed codes are examined to outline their advantages from the VLSI implementation point of view.
The implementation advantages o ered by the proposed codes mainly relate to the fact that the codes are systematic odd-weight-column SEC-DED codes with additional byte errors detection and partial correction capabilities and modular structure. Being systematic, the information bits are separated from the check bits, therefore the codes offer the advantage that the encoding/decoding and the data processing can be performed in parallel.
The proposed codes do not satisfy the criteria of the minimum-equal-weight codes as the Hsiao codes 16]. However, whenever it is necessary to control a number of data bits lower than the maximum allowed for a given number of check bits, shortened codes 2] can be simply derived from the proposed H matrix by discarding some selected sub-matrices B i with the purpose of maintaining all t i minimum and equal, thus satisfying the criteria of minimumequal-weight codes.
Another class of codes, suitable for VLSI implementation, is the class of modularized codes, whose encoder/decoder can be partitioned into two or more identical modules. In n-modularized codes 10], the parity check matrix can be divided into n parts, called modules, composed of the same row vectors, but placed in di erent positions within the module. The n modules have the property that the same logic block can be applied, resulting in a great exibility and simplicity during the VLSI implementation. In particular, the codes de ned by C 1 are 2-modularized codes, in fact the H matrix: The proposed high-speed parallel encoding-decoding logic consists of four main blocks: check bit generator, syndrome generator, syndrome decoder and error corrector. The check bit and syndrome generator blocks are constituted by trees of Exclusive-OR gates. The number of inputs for the Exclusive-OR tree for the generation of the i-th check bit corresponds to the number of 1's in the corresponding row of H minus 1, while the number of inputs for the Exclusive-OR tree for the generation of the i-th syndrome bit corresponds to the number of 1's in the respective row of H. The syndrome decoder is constituted by two main blocks. The rst block, SYNDEC, decodes the syndromes to generate the correction patterns for the single-bit and odd-bitper-byte errors. The second block, SYNCNT, decodes the syndromes and counts the number of asserted syndrome bits. Due to the 2-modularized structure of the H matrix, the SYNDEC block can be described by instancing the same logic block twice. Figure 2 shows an example of the SYNDEC block for the above H M matrix. The area required is 496 equivalent gates (2-input NAND gates), while the number of gates necessary to obtain each Bit Error Pointer is 7:75 equivalent gates. The propagation delay to obtain the Bit Error Pointers from the syndromes is four gate-levels. The gate count of the proposed code represents approximately a 6% decrease compared to a conventional decoding logic of a (64; 56) minimum oddweight-column SEC-DED code able to correct just single errors. The increase required by the proposed code, in terms of propagation delay, corresponds only to one gate level.
Finally, the SYNCNT block receives as inputs the syndromes and recognizes the number of asserted syndrome bits. An example of the logic structure of the SYNCNT block having four syndromes as inputs is shown in Figure 3 . Five mutually exclusive output lines constitute the block output, therefore when all syndromes are equal to zero, then just the output ZERO is active, when just one syndrome is equal to one, then just the output ONE is active and so on. Several classes of codes suitable for high performance computer applications have been examined and new coding schemes have been proposed to extend the protection provided by previous SEC-DED-SBD codes. The new techniques construct systematic odd-weight-column SEC-DED-SBD codes in which the class of correctable errors includes any odd weight error patterns within a single byte. The design of codes is supported by an automatic tool which generates, for a given set of code parameters, the parity check matrix and the VHDL description of the logic blocks implementing the code.
This tool has been used to design a 64 data bit error control code 1 inserted in an ASIC developed by Bull Information Systems in the R&D Labs of Pregnana (Italy) for a shared memory multiprocessor system 17] based on the PowerPC architecture. The ASIC implements the data cross bar architecture among the main memory, four data channels to processors and the I/O channel and it has been completely described using VHDL. The main logic blocks of the ASIC are the data path for data multiplexing, the control logic realized as a set of Finite State Machines, the ECC logic with 64 data bits and 8 redundant bits and the testability logic to support the standard JTAG IEEE 1149.1 and the ATPG. The simulation was executed at different abstraction levels: system, chip and internal blocks levels. The simulation patterns and the expected outputs for the logical veri cation of the device were completely written in VHDL; a set of patterns to verify the ECC logic was automatically generated and checked using a program written in C language. The ASIC was manufactured using the 0:7 m gate array technology supplied by LSI Logic Corporation and its main characteristics are reported in Figure 4 . The ASIC successfully operated at full speed (75 MHz) at the rst run. 
