Abstract-This paper proposes a general and systematic code design method to efficiently combine constrained codes with parity-check (PC) codes for optical recording. The proposed constrained PC code includes two component codes: the normal constrained (NC) code and the parity-related constrained (PRC) code. They are designed based on the same finite state machine (FSM). The rates of the designed codes are only a few tenths below the theoretical maximum. The PC constraint is defined by the generator matrix (or generator polynomial) of a linear binary PC code, which can detect any type of dominant error events or error event combinations of the system. Error propagation due to parity bits is avoided, since both component codes are protected by PCs. Two approaches are proposed to design the code in the non-return-to-zero-inverse (NRZI) format and the non-return-to-zero (NRZ) format, respectively. Designing the codes in NRZ format may reduce the number of parity bits required for error detection and simplify post-processing for error correction. Examples of several newly designed codes are illustrated. Simulation results with the blu-ray disc (BD) systems show that the new d = 1 constrained 4-bit PC code significantly outperforms the rate 2/3 code without parity, at both nominal density and high density.
I. INTRODUCTION

M
ODULATION codes [1] , [2] , also known as constrained codes, are used in recording systems to translate an arbitrary sequence of user data into a sequence with special properties required by the systems. They are usually characterized by the so-called (d,k) constraints, or runlength constraints [2] . Binary sequences satisfying the (d,k) constraints have at least d and at most k, k > d, '0's between successive '1's. These constraints mitigate the problems of intersymbol interference and inaccurate timing. For optical recording, modulation codes also need to have the dc-free property [3] , [4] , i.e. they should have almost no content at very low frequencies. The dc-free constraint avoids interference between data and servo signals, and also permits filtering of low-frequency disc noise.
Development of 'efficient and powerful channel codes' is key to ensuring good reception performance under aggressive recording conditions. For optical recording, the d constraint has been reduced from d = 2 [3] , [4] used in the compact disc (CD) and digital versatile disc (DVD) to d = 1 [5] , [6] in the blu-ray disc (BD) or high-definition digital versatile disc (HD-DVD). In [7] , Immink et al. have introduced a new family of finite-state modulation codes with d = 1 or d = 2 constraints, whose rates are higher than those of the standard codes, and are very close to the Shannon capacity.
To further improve the system performance at high recording densities, in recent years, the combination of ReedSolomon (RS) outer codes and parity-check (PC) inner codes in conjunction with post-processing [8] , [9] , [10] has found wide acceptance in magnetic recording systems since the performance-complexity trade-off offered by these codes is very attractive and affordable. This approach also shows high potential for optical recording systems [11] , [12] . The PC code is an inner error correction code (ECC), which can detect the specific dominant error events (i.e. the most likely error events that can occur) at the output of the channel detector, using only a few parity bits. For error correction, the matched-filtering type post-processor that combines syndrome and soft-decision decoding [10] , [12] is widely used due to its simplicity. Cyclic redundancy check (CRC) codes [13] are simple and efficient error detecting PC codes, and have been used in [9] , [10] , [12] . In [14] , specific event error detection code is proposed, which can detect any error event from an arbitrary list of error events.
When the PC constraints are imposed on the data, the modulation constraints should be satisfied simultaneously. This will result in additional code rate loss. The minimum overhead is one user bit per parity bit. Equivalently,
1
Rc channel bits are needed per parity bit, where R c is the rate of the constrained code [11] . For example, for the rate 2/3 d = 1 codes [5] , [6] used in BD and HD-DVD, the minimum feasible overhead is 1.5 channel bits per parity bit. Let there be p parity bits per codeword of length n. Then, the capacity of constrained PC codes is given by
There have been several attempts in recent years to efficiently combine constrained codes with PC codes. For example, in the scheme described by Perry et al. [15] , a constrained data sequence is parsed into shorter blocks of equal length, and a parity data block is inserted between each pair of these blocks. The data and parity blocks are connected such that the modulation constraints are not violated. The major disadvantage of this scheme is that it can only correct specific mixed-type errors.
Gopalaswamy and Bergmans [16] proposed concatenated coding to construct modulation codes with error detection 0090-6778/08$25.00 c 2008 IEEE capability. In this scheme, the PC information is first calculated for each constrained data block. This information is then encoded by a standard constrained encoder and appended to the end of the corresponding data block. In this way, the proposed scheme achieves high coding efficiency. For the rate 2/3 d = 1 code, a parity bit requires 1.5 channel bits. However, in this scheme, the channel bit-stream corresponding to the parity bits is not protected by PCs. Therefore, errors occurring in this portion may cause further errors during decoding. This results in error propagation.
A combi-code scheme proposed by Coene et al. [11] achieves high efficiency similar to [16] , and avoids the paritybit related error propagation. In this scheme, the constrained PC code consists of two sliding block codes, which are designed to detect single-bit transition shift errors (i.e. a '1' in the (d, k) constrained channel bit-stream is shifted a single place to the left or the right). Because the two constituent codes are based on the same FSM, no additional channel bits are needed for stitching the two codes together. By using this scheme, efficient PC codes with d = 2 constraint, which achieve 2 channel bits per parity bit, have been designed. However, efficient combi-codes with d = 1 constraint are not available. Furthermore, this scheme is not general enough to detect any arbitrary error event or error event combinations. The efficiency of this approach can be further improved by using the method proposed in [7] .
In the write path of a data storage system, a precoder, i.e. a modulo-2 integration operation, converts the binary outputs of the constrained encoder into a corresponding modulated signal, which is then stored on the storage medium. The constrained encoded bits before and after the precoder are referred to as a non-return-to-zero-inverse (NRZI) sequence, and a non-return-to-zero (NRZ) sequence, respectively. Most of the prior art schemes design codes in the NRZI format [15] , [16] , [11] , [17] .
In this paper, a novel code design technique is proposed that overcomes all the drawbacks of the prior art schemes. The designed constrained PC codes can detect any type of dominant error events or error event combinations in optical recording systems, without introducing the parity-bit related error propagation. The rates of the designed codes are only a few tenths of a percent below the capacity. Two approaches are proposed to design constrained PC codes either in NRZI format or in NRZ format. Designing the codes in NRZ format is found to be more preferable for the PC code and postprocessing based detection approach. In addition, although this paper focuses on designing codes for optical recording, the proposed code design technique is general, and can encompass other recording channels, such as the magnetic recording channels. This technique can also be generalized to combine constrained codes with other types of ECCs ( e.g. the RS codes). However, it is out of the scope of this paper. This paper is organized as follows. Section II presents the general principle of the new code design approach. Detailed methods for designing codes in NRZI and NRZ format are presented in Section III and Section IV, respectively. In Section V, examples of several newly designed efficient codes are illustrated. Their performances are presented in Section VI. The paper is concluded in Section VII.
II. GENERAL PRINCIPLE OF THE NEW CODE DESIGN APPROACH
The general principle of the new code design is as follows. A segment of user data, which is typically the binary output of a RS-ECC encoder, is partitioned into several data words. All the data words except the last one are encoded by any suitable finite state constrained encoder, such as that proposed in [7] . The resulting codewords are referred to as "normal constrained (NC) codewords". The last data word is encoded by a parity-related constrained encoder, and the resulting codeword is referred to as the "parity-related constrained (PRC) codeword". In particular, the PRC encoder maps the last data word into a specific codeword chosen from a candidate codeword set, so that a certain PC constraint is realized over the combined codeword, which is a concatenation of the sequence of NC codewords and the PRC codeword. This PC constraint corresponds to a predetermined generator matrix, which can be defined to detect any type of error events in the system. The corresponding details can be found in [13] , [14] . For ease in imposing the modulation constraints, the generator matrix needs to be designed to generate a systematic PC code. This code design principle is based on the following proposition.
Proposition 1: Consider the encoder for a [l, n] systematic linear binary PC code C, which transforms an n-bit information word into an l-bit codeword, with p = l − n being the number of parity bits. Let u 1 and u 2 , respectively, denote row vectors with n 1 bits and n 2 = n − n 1 bits consisting of a sequence of NC codewords and a PRC codeword, with 0 < n 1 Proof: Let G = [I P] be a generator matrix that describes the encoder of C, where I is the n × n identity matrix, and P is a n × p matrix. The parity bits of
Thus,
The structure of our constrained PC code, thus, includes two component codes: the NC code and the PRC code. Both codes serve as information words of the PC code C. The NC codewords are first constructed and connected. The parity bits of the sequence of NC codewords (with n 2 trailing zeros appended) are then computed. After that, a specific PRC codeword, which produces the same parity bits when n 1 leading zeros are appended, is selected from a candidate codeword set and concatenated directly with the NC codewords, thus forming the combined constrained PC codeword. The combined codeword is transmitted over the channel without appending its parity bits [ 0, · · · , 0 p ], since the latter is fixed and known by the receiver. At the detector output, by checking the parity bits reconstructed from the received constrained PC codeword according to (2) , which are equal to the syndrome of the received codeword (with p bits of zeros appended), we can detect errors in the received codeword that are within the error detection capability of the corresponding PC code C. Note that, in principle, we could always choose p 1 +p 2 = a, where a is an arbitrary p bits row vector, and generate a codeword of C in terms of [u 1 | u 2 | a]. At the detector output, errors can be detected by checking the syndrome of the received codeword with the parity bits a appended. Choosing a = 0 makes the syndrome equal to the parity bits reconstructed from the received codeword, and simplifies decoding.
Both the NC code and the PRC code are finite-state constrained codes designed based on the same FSM. This enables the two component codes to be connected in any order without violating the modulation constraints, and also facilitates simpler hardware implementation of the encoder/decoder. In principle, any efficient FSM can be used in conjunction with the proposed code design approach. Here, we choose to use the FSM proposed in [7] , since capacity approaching codes can thereby be obtained. Furthermore, since the PRC code is also protected by PCs, error propagation due to the PRC code is avoided. In addition, by applying the Guided Scrambling (GS) scheme [2] , [7] to the NC code, whose codewords occupy the major portion of the constrained PC code, as shown in [7] , satisfactory dc-free performance can be achieved.
The rate of our constrained PC code is given by
where m and n are the lengths of the segment of user data and the combined constrained PC codeword, respectively, and R 1 and R 2 are the rates of the NC code and the PRC code, respectively. The choice of n depends on the specific recording system and is a compromise between the code rate loss due to PC and the error correction capability of the post-processor [12] . The optimum codeword length has been found to be around 100 channel bits per parity bit, for d = 1 coded optical recording channels. The corresponding details are given in Section VI.
III. CODE DESIGN IN NRZI FORMAT
A. Encoder Description
In this section, the codes are designed in NRZI format. Fig. 1 th data word is encoded into the second component codeword by the PRC encoder. During encoding, the next-state information (obtained from a code table) is passed from each codeword to the next. It indicates the next state from which to select a codeword for encoding the next data word. The encoder also includes a PC unit, which calculates the parity bits of the sequence of the leading K NC codewords (appended with n 2 trailing bits of zeros). The parity bits are then passed to the PRC encoder, and used to guide the encoding of the (K + 1) th data word into the PRC codeword. Concatenating the NC codewords and the PRC codeword together, results in the combined constrained PC codeword in NRZI format. The NRZI codewords are then converted into NRZ format by a precoder, which is not shown in the figure, before they are transmitted over the channel.
B. Design of the Component Codes
To design the NC code, we use the FSM proposed in [7] , since rates of the resulting constrained codes are very close to the capacity. To achieve high encoding efficiency, one data word is mapped into one codeword only in each state of the FSM.
To design the PRC code, we propose a novel approach to design sets of codewords with distinct parity bits, based on the same FSM of the NC code. These parity bits correspond to a predetermined generator matrix. We first propose a set of criteria that guides the design of the PRC code. With a given PC constraint, these criteria indicate how to choose the number of encoder states and assign valid codewords to these states to maximize the code rate.
To design a PRC code with m 2 user data bits and p parity bits, the number of codewords leaving a state set should be at least 2 m2+p times the number of states within the state set. Based on the FSM of d = 1 codes proposed in [7] , we can obtain the following criteria:
where X ab denotes the set of codewords starting with a 'a' and ending with a 'b', and |X ab | denotes the size of X ab , where a, b ∈ {0, 1}. The encoder has r states, which are divided into two state subsets of a first and second type. The encoder has r 1 states of the first type and r 2 = r − r 1 states of the second type. All codewords in states of the first type must start with a '0', while codewords in states of the second type start with either a '0' or a '1'.
Furthermore, for each set of codewords with the same parity bits, the number of codewords leaving a state set should be at least 2 m2 times the number of states within the state set. The criteria that guide the design of each set of codewords are therefore given by
whereX ab denotes the set of codewords with the same parity bits that start with a 'a' and end with a 'b'. In each set of the codewords with the same parity bits, each codeword has an assigned next state. A codeword that ends with '0' (i.e. codewords inX 00 andX 10 ) can be assigned up to r different next states in both the first and second state sets, and therefore can be used to map to r different user data words. A codeword that ends with '1' (i.e. codewords inX 01 andX 11 ) can only be assigned up to r 1 next states in the first state set, and therefore can be used to map to r 1 different user data words. The particular mapping of the codeword to the data word is a matter of design choice, and is not critical to the operation of the system. However, to ensure unique decodability, the sets of codewords that belong to a given state must be disjoint.
Similar criteria for designing PRC codes with d = 2 constraint can be derived. They are given by
where X abcd denotes the set of codewords that start with 'ab' and end with 'cd', where a, b, c, d ∈ {0, 1}. For d = 2 codes, the encoder has r states, which are further classified into three sets of states. The first set has r 1 states and it includes codewords that start with '00'. The second set has r 2 states and it includes codewords that start with either '01' or '00'. The third set has r 3 = r − r 1 − r 2 states and it includes codewords that start with '10', '01' or '00'.
The criteria that guide the design of each set of codewords that has the same parity bits are expressed as
whereX abcd denotes the set of codewords with the same parity bits. For each set of codewords with the same parity bits, a codeword that ends with '00' (i.e. codewords inX 0000 , X 1000 andX 0100 ) can be assigned up to r different following states, and therefore can be used to map to r different user data words. A codeword that ends with '10' (i.e. codewords inX 0010 ,X 1010 andX 0110 ) can only be assigned up to r 1 following states in the first state set and r 2 states in the second state set, and therefore can be used to map to r 1 + r 2 different user data words. A codeword that ends with '01' (i.e. codewords inX 0001 ,X 1001 andX 0101 ) can be only assigned up to r 1 different following states in the first state set, and therefore can be used to map to r 1 different user data words. In addition, different states cannot contain the same codeword. Note that the above inequalities are equivalent to the approximate eigenvector equation [1] , and they are necessary conditions for code construction. Following these criteria, and by using either computer search or analytical approaches proposed in [18] , we can determine the optimum number of encoder states to maximize the rate of the PRC code. The corresponding code rate is given by
The main steps for the design of the PRC code are as follows.
(1) For a PRC code with m 2 user data bits and p parity bits, use the criteria described above (i.e. (4) to (7) for d = 1 codes, and (8) to (13) for d = 2 codes) to determine the codeword length n 2 and the optimum number of encoder states. Note that at this step, the maximum runlength constraint k is temporarily relaxed (e.g. larger than k = 7 for d = 1 codes, and larger than k = 10 for d = 2 codes).
(2) Enumerate all the valid d constrained codewords of length n 2 . Based on the given generator matrix, compute the parity bits of each codeword (with n 1 leading zeros appended) and distribute them into a group of codeword sets. A total of 2 p codeword sets are obtained.
For each set of codewords with the same parity bits, allocate the codewords to various encoder states by following the FSM of the NC code [7] . This results in a set of 2 p subtables.
In each of the sub-tables, the principles for distributing the codewords to the encoder states are as follows. For d = 1 codes, the encoder states include two types of state subsets. The codewords in states of the first type must start with a '0', while codewords in states of the second type start with either a '0' or a '1'. In the sub-tables, every codeword has an assigned next-state, which specifies the state from which to select the codeword for encoding the next data word. A codeword that ends with a '0' can be assigned to any of the encoder states, while a codeword that ends with a '1' can only be assigned to the states of the first type. This prohibits that a codeword ending with '1' entering states of the second type. Similarly, for d = 2 codes, the encoder states are divided into three sets of states. The first set includes codewords that start with '00', the second set includes codewords that start with either '01' or '00', and the third set includes codewords that start with '10', '01' or '00'. For each set of codewords with the same parity bits, a codeword that ends with '00' can be directed to any of the encoder states. A codeword that ends with '10' can only be directed to states in the first and second state sets. A codeword that ends with '01' can be directed to states in the first state set only. codewords potentially mapped to one user data word. During encoding, as illustrated in Fig. 1 , the parity bits associated with the sequence of NC codewords are first calculated. The PRC codeword having the same parity bits is selected from the codeword set and assigned to the user data word. Example 1: An example might be helpful to understand the whole process to design the PRC code. Assume the design of a PRC component code with m 2 = 4, for a d = 1 constrained single-bit even PC code (i.e. p = 1). The NC component code is assumed to be the rate 9/13 (1,18) code with 5 encoder states (i.e. r = 5, r 1 = 3, r 2 = 2) as proposed in [7] . First, by using the criteria of (4) to (7), we determine a minimum codeword length of n 2 = 8 for the PRC code. Next, generate all the valid d = 1 codewords of length n 2 = 8 and allocate them to various encoder states according to the principles described above. An example of the code table for the resulting rate 4/8 PRC code is illustrated by Table I . As can be seen, the code table includes two sub-tables, which contain codewords with even and odd parity, respectively. In each sub-table, the first column shows the input data word. The second to the sixth columns show the codewords mapped to the data word and its associated next-state, for encoder states 1 to 5, respectively. Note that in the sub-tables, each codeword can be mapped to multiple data words, with the corresponding next-states being different. Note also that different encoder states do not have the same codeword.
Furthermore, in each of the encoder states, there is a set of two codewords mapped to one data word. This ensures that during encoding, a suitable PRC codeword from the codeword set can always be chosen, to be concatenated with the sequence of NC codewords and to realize an even PC constraint over the combined codeword. Such a fact can be seen from the following example. Let us assume that in Fig. 1 , the PRC encoder's current input data word (i.e. the (K +1) th data word ) is '1111'. Based on Table I, all possible cases that might arise during encoding are listed in Table II . Therefore, the even PC constraint can be achieved on the combined codeword, and the NC codewords and the PRC codeword can be connected without violating the d = 1 constraint.
C. Decoder Description
Based on the same FSM, the operation of the PRC decoder is generally the same as that of the NC decoder [7] , but with the code tables being different. Both decoders are slidingblock decoders with a look ahead of one codeword. However, unlike the NC decoders which are based on only one code table, the PRC decoder is based on two code tables. One is the NC code table, which is used to determine the state of the next codeword, while the other is the PRC code table, which is used to decode the current codeword using the obtained state information of the next codeword.
Example 2: We use the rate 4/8 code of Example 1, to illustrate the decoding process of the PRC code. Assume a received PRC codeword is '00001010'. According to Table  I , the codeword of '00001010' is assigned to the data words '0000', '0001', '0010','0011', and '0100', together with the next-states 1 to 5, respectively. Therefore, we have to look at the next received codeword (i.e. the first NC codeword of the next constrained PC code) to obtain the encoder state that the PRC codeword is directed to. Assume the next NC codeword is '0001010001010', and it is found to belong to State 2, according to the code table of the NC code [7] . This means that the next-state associated with the PRC codeword '00001010' is State 2. Therefore, it is determined that the PRC codeword represents the data word '0001'. In the same manner, other NC and PRC codewords can be decoded sequentially.
IV. CODE DESIGN IN NRZ FORMAT
In this section, we present an approach to design constrained PC codes in NRZ format. For PC codes and post-processing based detection approaches, it is preferable to encode the data in NRZ format due to the following reasons. In the NRZI case, error detection and post-processing have to be done at the output of the 'NRZ to NRZI inverse precoder'. The process of inverse precoding will cause error propagation and thus increase the length of error events. For example, a single bit error in NRZ format will be converted into a transition shift error of 2 bits in NRZI format. As a result, the number of parity bits required for detecting errors may increase. Furthermore, carrying out post-processing at the detector output is more straightforward than doing it at the inverse precoder output.
The conventional approach for detection and correction of errors in NRZ format is to use a concatenation of a modulation encoder with a precoder, followed by a PC encoder [8] , [9] , [10] . However, this approach will considerably weaken the modulation constraint of the encoded channel data stream. In [17] , Cideciyan et al. proposed the cascade of a modulation encoder with a PC encoder followed by a precoder. In this approach, before precoding, the user data is first encoded into a constrained PC code in NRZI format, which can detect and correct NRZ errors. This is done by translating the PC matrix at the output of the precoder into that at the input of the precoder, under the condition that the PC code at the output of the precoder must contain the all-one codeword.
We now present a new approach to design the constrained PC code in NRZ format, without PC matrix transformation and without the specific requirement on the PC code. In our approach, the code table of the NC code remains the same as that of the NC code in NRZI format. However, the code table for the PRC code is designed in a different way. The details are as follows.
(1) For a PRC code with m 2 user data bits and p parity bits, determine the codeword length n 2 and the optimum number of encoder states. The criteria that guide the design are similar to those in the NRZI case. The only difference is that the parity bits of each codeword are computed in the NRZ format, rather than in the NRZI format, based on an assumed initial NRZ bit. To do this, '0' and '1' are used to denote NRZ bits '−1' and '+1', respectively.
(2) Enumerate all the valid d constrained codewords of length n 2 in NRZI format. Compute the parity bits of the codewords in NRZ format with an assumed initial NRZ bit.
(3) Distribute each set of NRZI codewords with the same NRZ parity bits obtained from Step (2) into different encoder states, and form a set of 2 p sub-tables. The principles for distributing the codewords in each sub-table are the same with those in the NRZI case.
(4) Concatenate the 2 p sub-tables together to form the code table for encoding/decoding of the PRC code in NRZ format. For the two different initial NRZ bits (i.e. '+1' and '−1'), we use the same code table to simplify encoding/decoding. However, the order of codeword sets with the same parity bits may need to be adjusted according to the initial NRZ bit.
To do encoding, as shown in Fig. 2 , the NC codewords are first constructed and connected as in the NRZI case. The resulting codewords are then converted into NRZ format by a precoder, and the associated parity bits are computed. Based on these parity bits as well as the last bit of the NRZ sequence, the PRC codeword in NRZI format that has the same NRZ parity bits is selected from the codeword set. The PRC codeword needs to be converted into NRZ format before concatenating with the NRZ format NC codewords. During decoding, the detected NRZ data sequence is first converted into NRZI format through an inverse precoder, and the resulting NRZI sequence is then decoded based on the code tables of the NC code and the PRC code, along the same lines as those described in Section III-C.
Example 3: As an example, we show by Table III the  code table of a Table I, Table III also includes two sub-tables, which contain sets of codewords in NRZI format. However, the parity bit of each codeword is computed in the NRZ format, instead of in the NRZI format as in Table I . It can be verified that with an assumed intial NRZ bit of '-1', all the codewords in the first sub-table have an even parity, 
while all the codewords in the second sub-table have an odd parity. It can be further verified that with an intial NRZ bit of '+1', the code table remains the same 2 . Therefore, based on the NRZ parity bits of the sequence of NC codewords and the last NRZ bit of the NRZ sequence, a suitable PRC codeword in the NRZI format with the same NRZ parity bit can always be selected from the code table. By converting the PRC codeword into NRZ format and concatenating it with the sequence of NRZ format NC codewords, an even PC constraint can be realized over the combined codeword in NRZ format. Finally, we remark that the rate 4/8 PRC codes described in Examples 1 to 3 are only for illustration purpose. Several more efficient newly designed codes are shown in the next section.
V. EXAMPLES OF NEWLY DESIGNED EFFICIENT CODES
In this section, we present several efficient constrained PC codes, designed in NRZ format, using the above code design method.
First of all, a new (1,18) constrained single-bit even PC code is designed. The rate 9/13 (1,18) code with 5 states (i.e. r = 5, r 1 = 3, r 2 = 2) FSM proposed in [7] is used as the NC Table IV shows the distribution of codewords in r = 5 encoder states, for the rate 12/19 PRC code. Through enumeration, we find that among the total 10946 valid d = 1 codewords of length 19, there are 5490 codewords having even parity with an assumed initial NRZ bit of '-1' (or odd parity with an initial NRZ bit of '+1'). Among these codewords, we further find |X 00 | = 2135, |X 01 | = |X 10 | = 1275 and |X 11 | = 805. We also find that there are 5456 codewords having odd parity with an assumed initial NRZ bit of '-1' (or even parity with an initial NRZ bit of '+1'), among which we have |X 00 | = 2046, |X 01 | = |X 10 | = 1309 and |X 11 | = 792. Each sub-table in Table IV illustrates the distribution of codewords with the same parity bit among the r = 5 states.
We take Table IV (i) as an example, which contains all the codewords having even parity with an assumed initial NRZ bit of '-1'. Observe that the setX 00 has 605 codewords allocated in State 1, 599 codewords in State 2, and 603 codewords in State 3. The total number of assigned codewords is 605 + 599 + 603 = 1807, which is smaller than the set size 2135. Similarly, for each of the other codeword set, the total number of assigned codewords is smaller than the size of the set. On the other hand, in each state, the codewords are distributed according to the restrictions that a codeword ending with a '0' can be assigned to up to r = 5 different user data words, while a codeword that ends with a '1' can only be assigned to up to r 1 = 3 different user data words. Therefore, for State 1, the total number of assigned codewords is 605×5+358×3 = 4099, which is sufficient to map 2 12 = 4096 user data words. Similarly, it can be verified that from any of the r = 5 encoder states, there are at least 4096 codewords that can be assigned to the user data words. This means that 12-bit user data words can be encoded. In the same manner, codewords having odd parity with an assumed initial NRZ bit of '-1' are distributed as shown in Table IV (ii), which also shows that 12-bit user data words can be supported. Hence, following Table IV, a rate 12/19 (1,18) PRC code can be constructed. We remark that the distribution of codewords given above may not be unique.
As a second example, using the same rate 9/13 code as the NC code, we design new constrained 2-bit and 4-bit PC codes. 1,18) code, respectively. With respect to the rate 2/3 d = 1 codes, these PRC codes achieve 1.5 channel bits per parity bit.
As a third example, we consider d = 2 codes. With the rate 6/11 (2,15), 9-state (i.e. r = 9, r 1 = 4, r 2 = 2, r 3 = 3) code proposed in [7] as the NC code, whose rate is 2.27% higher than that of the rate 8/15 (2,10) EFM-like codes [4] , [2] used in DVD systems, we have designed a new constrained singlebit even PC code and a new constrained 4-bit PC code defined by g(x) = 1 + x + x 4 . The PRC codes are a 9-state rate 10/20 (2,15) code and a 9-state rate 8/22 (2,15) code, respectively. With respect to the EFM-like codes, these PRC codes achieve 1.25 and 1.75 channel bits per parity bit, respectively.
The above examples of newly designed efficient codes are summarized in Table V . The codeword length, n, is chosen such that the number of channel bits per parity bit is around 100 (see Section VI for details). As can be seen, the new codes achieve minimum parity overhead, and the efficiency of most of the new codes is only a few tenths of a percent below capacity 3 . It should be noted that for the above new codes, the sizes of input symbols of all the component codes are not 8 bits. As a result, error propagation due to the mismatch of symbol sizes between the constrained code and the conventional byteoriented RS-ECC [5] may arise. However, this error propagation can be avoided by using the 'modified concatenation' scheme [19] . Alternatively, a non-byte-oriented RS-ECC can be used to eliminate this error propagation. For example, when the rate 9/13 code without parity, the rate 135/198 2-bit PC code and the rate 279/409 4-bit PC code in Table V are used in conjunction with a 9-bit/symbol RS-ECC, error propagation is avoided since the size of the input symbols of all the codes (or component codes) is 9 bits. Finally, we remark here that it is possible to impose stricter k constraint and the repeated minimum transition runlength (RMTR) constraint [5] , [6] on the designed codes, by increasing the number of states of the FSM, and/or by applying the GS scheme.
VI. PERFORMANCE EVALUATION
In this section, the performance of the newly designed efficient constrained PC codes with d = 1 constraint is evaluated using BD systems. In particular, the performance of various constrained codes is compared using the symbol error rate (SER) at the output of the constrained decoder as the performance criterion. In the simulations, we assume a RS-ECC with 9 bits/symbol, since in such cases there is no error propagation due to the mismatch of symbol sizes between ECC and the constrained codes whose input symbol size is 9 bits (i.e. the rate 9/13 code, the rate 135/198 2-bit PC code and the rate 279/409 4-bit PC code).
In our study, it is assumed that the optical read-out is linear and a generalized Braat-Hopkins model [20] is used to describe the channel. The Fourier transform of the channel symbol response is given by
where Ω is the frequency normalized by the channel bit rate, and R is the rate of the d = 1 constrained PC code. The quantity Ω u = f c T u , which is the optical cut-off frequency f c normalized by user bit rate 1/T u , is a measure of the recording density. For an optical recording system using a laser diode with wavelength λ and a lens with numerical aperture NA, the normalized cut-off frequency is given by Ω u =
2NA
λ L u , where L u is the spatial length of one user bit. For the BD systems using the rate 2/3 17PP [5] , with λ=405 nm, NA=0.85 and L u =112.5 nm, we get Ω u ≈ 0.5. In this paper, cut-off frequencies Ω u = 0.5 and Ω u = 0.375 are considered. These choices represent recording systems with nominal density and high density, respectively, according to current standards [5] . The variance σ 2 of additive white Gaussian channel noise, is determined by the user signal-to-noise-ratio (SNR) defined as SNR u (dB) = 10 log 10 being the noise power in the user bandwidth, and h ku is the channel symbol response for R = 1 [20] . When studying the performance over different user densities, the reference signal power in the user SNR needs to be independent of density. For this, h ku is evaluated for a particular user density, e.g. Ω u = 0.33, which is independent of the densities at which the channel and receiver are tested. The above definitions of SNR and channel response help to fairly reflect the impact of code rate in the system performance evaluation. In this study, a Viterbi detector that is matched to a 7-tap optimized partial response (PR) target is used as the detector [12] . The dominant error events at the Viterbi detector output turn out to be ±{2, 0, −2}, ±{2}, ±{2, 0, −2, 0, 2}, ±{2, 0, −2, 0, 2, 0, −2}, and ±{2, 0, 0, −2}. At the output of the Viterbi detector, a matched-filtering type post-processor is used, which can correct both single and double error events that occur within each detected codeword [21] . The postprocessor is essentially a soft-decision decoder of the PC code, and it is widely accepted in practice due to its simplicity [10] , [12] . In particular, the post-processor is designed to correct a specific number of the dominant error events at the output of the channel detector, by exploiting the syndromes of the received constrained PC codewords, and by computing the Euclidean distance of the candidate error events [10] , [12] . We remark that the code design technique proposed above is general, and it is suitable to other types of decoders of the PC code as well.
The suitability and efficiency of different PC codes for given channel and detector are determined by the error event distribution at the detector output. The new singlebit even PC code can only detect error events with odd number of errors, and therefore cannot detect the error events ±{2, 0, −2}, ±{2, 0, −2, 0, 2, 0, −2} and ±{2, 0, 0, −2}. The new 2-bit PC codes, defined by the generator polynomial g(x) = 1 + x + x 2 , can detect most of the dominant error events, except ±{2, 0, −2, 0, 2} and ±{2, 0, 0, −2}. The new 4-bit PC codes, with g(x) = 1+x+x 4 , however, can detect all the dominant error events. For real-life channels, the dominant error events may differ from those illustrated above. However, following the code design method described in the above sections, we can easily define a generator matrix that detects all the required error events [14] , and design the constrained PC code accordingly.
For a given PC code, the choice of its codeword length (n) is a trade-off between the code rate loss and error correction power. In our study, for each code, SERs are compared with different codeword lengths and SNRs. Simulation results show that the minimum SER is obtained with around 100 channel bits per parity bit, over a wide range of SNRs. Therefore, we choose to use a codeword length of around 100 bits per parity bit, for the designed constrained PC codes.
Figs. 3 and 4 illustrate the SER performance of the system with the rate 2/3 code, rate 9/13 code, and the new constrained PC codes, at nominal density and high density, respectively. The data storage systems typically require an error rate of 10 −12 or less after ECC. For BD, an ECC failure rate of 10 −16 corresponds to a SER of around 4 × 10 −3 [5] , [22] . Therefore, the performance of various codes is compared at SER = 10 −4 , to keep an additional margin for the allowable SER of various codes and SNRs. From Fig. 3 , we observe that compared to the performance of the system with the rate 2/3 code and without parity (Curve 1), the rate 9/13 code without parity (Curve 2) gives a gain of 0.4 dB at SER = 10 −4 , due to its higher code rate. Compared to the rate 9/13 code without parity, the new single-bit PC code (Curve 3) gives no significant gain, since it cannot detect the error event ±{2, 0, −2}. The new 2-bit PC code (Curve 4), however, achieves a gain of 1.2 dB over the rate 9/13 code, since it can detect the error event ±{2, 0, −2}. Using the new 4-bit PC code (Curve 5), around 0.5 dB gain is obtained over the 2-bit PC code. The reason is that it can detect all the dominant error events. Overall, the new constrained 4-bit PC code gains 2 dB over the system with the rate 2/3 code and without parity. At Ω u = 0.375, observe from Fig. 4 that compared with the results at nominal density, the performance gains of PC codes are more modest. According to Fig. 4 , the new 4-bit PC code achieves an overall performance gain of 1.5 dB at high density. This is due to the reason that at high density there are many non-dominant error events, which are long events with small probabilities. To detect these types of error events, more parity bits are needed. They are also difficult to correct, since mis-correction of these long events will cause many more errors. Using appropriate coding techniques (e.g. RMTR codes) may eliminate the underlying data patterns that support these events and improve the performance of PC codes and post-processing. This is beyond the scope of the paper.
VII. CONCLUSIONS
In this paper, a general and systematic code design technique has been proposed for constructing capacityapproaching constrained PC codes, which can detect any type of dominant error events or error event combinations in optical recording systems. The PC constraint corresponds to linear systematic binary PC codes. The modulation constraint can be any practical d constraint (i.e. d = 1 and d = 2). Furthermore, error propagation due to parity bits is avoided, since errors are corrected equally well over the entire constrained PC codeword. Approaches have been proposed to design the code in NRZI format and NRZ format. Designing the codes in NRZ format is found to be preferable. Using the proposed method, various new codes for different optical recording systems can be designed. Application of this technique to other recording systems is straightforward. Examples of several newly designed efficient codes have been illustrated, and their SER performances have been evaluated with the BD systems. Simulation results show that the new d = 1 constrained 4-bit PC code can detect all the dominant error events. Compared to the rate 2/3 code without parity, it achieves a performance gain of 2 dB at nominal density, and 1.5 dB at high density, at SER = 10 −4 .
