Abstract-We report on 20 years of development of codes for optical disk recording systems. A description of the state-of-the-art and feasible options for future extensions and improvements are given.
(1,7)PP RLL, dc-free [6] lf-components of the generated sequences can be suppressed. In the remaining part of this article, we will study the construction and performance of alternative EFM-like codes.
We start in the next section, with the definition of RLL sequences, and the computation of a number of basic properties of such sequences.
II. PROPERTIES OF RLL AND DCRLL SEQUENCES
A RLL sequence is a string of symbols of ones and zeros with at least and at most zeros between consecutive ones. Channel codes are needed to translate arbitrary data into sequences. In general, a sequence is not employed in optical or magnetic recording without a simple coding step. A sequence is converted to an RLL channel sequence in the following way. Let the channel signals be represented by a bipolar sequence . The channel signals represent the positive or negative magnetization of the recording medium, or pits or lands when dealing with optical recording. The logical ones in the sequence indicate the positions of a transition or of the corresponding RLL sequence. The sequence would be converted to the RLL channel sequence
The mapping of the waveform by this coding step is known as precoding. It can readily be verified that the minimum and maximum distance between consecutive transitions of the RLL sequence derived from a sequence is and symbols, respectively; or, in other words, the RLL sequence has the virtue that at least and at most consecutive-like symbols (runs) occur.
An encoder translates arbitrary user (or source) information into, in this particular instance, a sequence that satisfies given constraints. On the average, source symbols are translated into channel symbols. What is the maximum value of that can be attained for some specified values of the minimum and maximum runlength and ? Using the basic techniques developed by Shannon presented, for example, in [3, Ch. 2], it is fairly straightforward to compute the maximum value of , called capacity, of encoders that generate sequences with a constraint on the minimum and maximum runlength. The capacity of RLL sequences is [3, Ch. 4] (
where is the largest real root of the characteristic equation (2) RLL sequences used in optical disk recording systems should satisfy the additional requirement that the low-frequency components are sufficiently small. Sequences with such a property are usually called dc-free sequences. The running digital sum of a sequence, in short, RDS, plays a significant role in the analysis and synthesis of codes whose spectrum vanishes at the low-frequency end. Let be a binary sequence. The (running) digital sum is defined as (3) Fig. 1 portrays the various signals. Chien [8] studied bipolar sequences , that assume a finite number of sum values; that is, at any instant , the RDS of such a sequence meets the condition where and are two (finite) constants, . Sequences that have a bound to the number of assumed sum values are termed (-constrained) or RDS-constrained sequences. The total number of sum values a sequence assumes, denoted by (4) is often called the digital sum variation (DSV). Pierobon [9] showed that the power density function of an encoded sequence vanishes at zero frequency if, and only if, the encoder is a finite RDS encoder. The capacity of "pure" dc-free sequences; i.e., sequences that assume a maximum of sum values were derived by Chien [8] (5)
The capacity of dc-free RLL sequences has been computed by Norris and Bloomberg [10] . The capacity of an RLL sequence with a bounded RDS is characterized by three parameters and and it will be denoted by . We will not follow the derivation given by Norris and Bloomberg, but will offer an alternative approach in the next section that will make it possible to efficiently compute the power density function of DCRLL sequences as well.
A. Capacity and Spectral Properties of DCRLL Sequences
Kerpez [11] presented a description of the combined and constraint in terms of a variable length graph and its adjacency matrix that requires a relatively small number, , of states. Let the DCRLL message be denoted by . The RLL message is assumed to be composed of runlengths of lengths taken from the set of allowed runlengths . As the sequence has limited DSV, we have where . The above constraint can be described in terms of the runlengths. The sequence is composed of a cascade of runlengths whose symbols have alternate polarity. Transitions of the polarity of the sequence , i.e., instants where , occur therefore at . We simply find where the sequence , . In other words, the DSV constraint is equivalent to for all Thus, a sequence satisfies the constraint if and only if the sequence of runlengths satisfies, for all (6) and (7) The general form of the adjacency matrix for the constraint, derived from (6) and (7), has a regular structure. The matrix has size and is constant on the anti-diagonals. If the value of is nontrivial, i.e., , the lower right of the diagonal of the matrix is zero, whereas in the case of , the lower right corner is filled. Using the above adjacency matrix it is now straightforward to compute the capacity and the spectrum of the maxentropic sequence for large values of and . A maxentropic sequence is assumed to be generated by a Markov source whose transition matrix is chosen such that the entropy of that source equals capacity. Spectral and other properties of maxentropic sequences are assumed to be a sound predictions of sequences generated by efficient encoders. Table II shows the results of computations for various values of DSV and runlength parameter The following examples may serve to illustrate the theory. Fig. 2 shows the power spectral density function of maxentropic sequences with , and the maximum runlength as a parameter. Apparently, the influence of the maximum runlength parameter is drastic. Most noticeable is the fact that the curves become more peaked with decreasing maximum runlength parameter . Fig. 3 shows the spectrum for the parameters , and , and with a logarithmic frequency-axis and a vertical (dB) axis, where a decibel is defined by . The choice of the log axes clearly shows the parabolic relationship between power and frequency in the low-frequency range. The low-frequency power increases with 6 dB per octave (or 20 dB per decade) frequency increase. We need a sound yardstick for measuring the low-frequency properties of DCRLL sequences. The spectral width is usually quantified by a parameter called cutoff frequency [3, Ch. 9] . Braun [12] defined the cutoff frequency of DCRLL sequences, denoted by , by (8) where denotes the spectral density at zero frequency of the maxentropic constrained sequence. For and , the parameters used in Fig. 3 , we find dB). Braun also studied the relationship between redundancy and cutoff frequency. He defined the extra rate loss, , as the difference between the capacities of the pure RLL channel and the DCRLL channel, or (9) The parameter quantifies the extra rate loss that results from the additional constraint on the RDS . Braun showed that maxentropic sequences have the property that there is, in good approximation, a linear relationship between cutoff frequency and the extra rate loss . The relationship found is given by (10) The constant of proportionality between cutoff frequency and extra rate loss is independent of the and constraints. The constant was derived for pure RDS constrained sequences and is valid if in addition and constraints are imposed. Computer simulations revealed that the above relationship appears to be accurate to within 5% for . In the next section, we will discuss various design methods for constructing DCRLL codes that have emerged in the literature.
III. EFM
The main parameters of EFM are , , and rate . Detailed information on code tables and so on can be found in the patent granted to Immink and Ogawa [1] . The 8-bit source data are translated into a 14-bit -constrained word. The 14-bit words are concatenated with 3-bit words, merging words. The 3-bit mergings words are selected by the encoder such that the minimum and maximum runlength are guaranteed. There are instances, however, where the merging word is not uniquely governed by the minimum and maximum runlength requirements. This freedom of choice is used for minimizing the power density at the low-frequency end, as will be explained with Fig. 4 . Fig. 4 shows an example of the merging process. Eight user bits are translated into 14 channel bits using a look-up table. The 14 bits are merged by means of three merging bits in such a way that the runlength conditions continue to be satisfied. For the case shown in Fig. 4 , the condition that there should be at least two zeros between ones requires a zero at the first merging bit position. There are, thus, three alternatives for the merging bits: 000, 010, and 001. The encoder chooses the alternative that gives the lowest absolute value of the RDS at the end of a new codeword, i.e., 100 in this case. In the experimental phase of the CD [13] , it was learned that the suppression of low-frequency components, when only two merging bits are used, is not sufficiently effective. Thus, the number of merging bits was increased to three, so providing a greater degree of freedom to set or omit transitions in the merging bits. With three merging bits in 65% of the block catenations, a transition can be set or omitted freely. The more effective low-frequency control is achieved at the expense of 1/17 of the information rate. In principle, better suppression of the low-frequency components can be obtained, without offending the agreed standard for the CD system, by applying improved merging strategies. For example, by looking more than one symbol ahead, because minimization of the low-frequency content in the short term does not always contribute to longer-term minimization. Improvements of 6-10 dB have been reported [14] . The look-ahead strategy is not used in present equipment. The power spectral density (PSD) function of classic EFM has been obtained by computer simulation. Results are plotted in Fig. 5 .
IV. EFMPLUS
The CD and its extensions CD-ROM and CD-V, introduced in the early 1980s, have become a very successful medium for the distribution and storage of audio, MPEG-1 video, and other digital information. Its storage capacity, 680 MByte, is insufficient for graphics-intensive computer applications and high-quality digital video programs. An extension of the CD family, the digital versatile disk (DVD), is a new optical recording medium with a storage capacity seven times higher than the conventional CD. Most of the storage capacity increase is due to improved quality of the light source (red instead of infrared light) and the objective lens. The storage capacity of the DVD is also increased by a complete redesign of the logical format of the disk including a more powerful Reed-Solomon product code (RS-PC) and recording code (EFMPlus). The details of the construction of the rate 8/16, (2,10) EFMPlus code, a sliding-block code with suppressed lf-content, will be discussed in the next section.
A. Design Outline
Under EFM rules, the data bits are translated eight at a time into 14 channel bits, with runlength parameters and . In this section, we will detail a code with the same runlength constraints as EFM, called EFMPlus, 1 having a 6% higher rate than classic EFM. EFMPlus has been adopted in the industry standard of the DVD as the channel modulation scheme. The 1 The name EFMPlus is slightly confusing as the acronym EFM stands for eight to fourteen modulation. In EFMPlus, there is no such mapping, but the dk constraints are the same as in classic EFM. is the number of source words (not necessarily a power of two) that can be accommodated by the encoder. The additional source words ( 256) that are made possible in this fashion are employed as alternatives for dc-control (see the next section for more details). The complexity of a sliding-block encoder and decoder is essentially governed by the maximum value (weight) of an element of the approximate eigenvector. Table III shows the maximum value (weight) as a function of the code size . In addition, we listed the parameter , which denotes the relative redundancy, . Note that the maximum code size that can be accommodated for , , and is 406. For the given code parameters, we note that for code size , the maximum weight is two. A one-round split is sufficient to construct the encoder. We also notice in Table III that the maximum weight grows very rapidly with increasing code size . After many trials and considering the diminishing returns, it was decided for a code size . After an initial merging of the states, we obtain a threestate FSM. After a single-state split, this three-state FSM can be transformed into a four-state encoder. Each of the four states of the EFMPlus encoder is characterized by the type of words that enter, or leave, the given state. The states and word sets are characterized as follows.
• Words entering State 1 end with trailing zeros, .
• Words entering State 2 and 3 end with trailing zeros, trailing zeros.
• Words entering State 4 end with trailing zeros trailing zeros. The words leaving the states are chosen in such a way that the concatenation of words entering a state and those leaving that state obey the channel constraints. For example, words leaving State 1 start with a runlength of at least two and at most nine zeros. In an analogous manner, we conclude that words leaving State 4 start with at most one zero.
Obviously, the sets of words leaving State 1 or 4 have no words in common. Words emerging from State 2 and 3 comply with the above runlength constraints, but they also comply with other conditions. Words leaving State 2 have been selected such that the first (msb) bit, , and the thirteenth bit, , are both equal to zero. Words leaving State 3 have . With a computer, it can easily be verified that from each of the states, at least 351 words are leaving. An encoder is constructed by assigning a source word to each of the 351 edges that leave each state. The encoder requires accommodation for only 256 source words. The excess, 95, words have been used for suppressing the low-frequency power (see the next section), the dc-control. More details can be found in [15] .
The encoder defined above can freely accommodate 351 source words. In order to make it possible to use a unique 26-bit sync word, seven candidate words were barred, leaving a code size of 344. As we only need accommodation for 256 source words, the surplus words can be exploited for minimizing the power at low frequencies. The suppression of low-frequency components, or dc-control, is done by controlling the RDS. The 88 surplus words are used as an alternative channel representation of the source words . The full encoder is described by two tables called main and substitute The DVD standard requires an extra rule for dc-control. If the encoder is in State 1, the encoder may use the codeword , , as an alternative for dc-control, provided the runlength constraints are not violated. Similarly, if the encoder is in State 4, it may use the codeword , , as an alternative. In other words, codewords pertaining to States 1 and 4, i.e., and , both in the main and substitute tables, may be used as alternatives for dc-control, provided the runlengths constraints are strictly obeyed. State swapping is allowed as decoding can be accomplished unambiguously. The state swapping offers a 2-3-dB extra reduction of the lf power.
V. ALTERNATIVES TO EFM SCHEMES
It is of some interest to consider the possibility of redesigning the EFM code and its variants of various codes rates and to compare the spectral performance.
The EFM code was designed in 1980 before efficient design algorithms, such as ACH and so on, were developed. A second handicap of the EFM design is that at the time of its conception, every gate used for decoding was one too many. 2 Let us now for academic interest ignore for the time being the complexity issue, and start from scratch. Essentially, EFMPlus is a redesign of EFM with a rate 8/16 instead of 8/17. Decoding of EFMPlus requires 1000 instead of the 52 gates of EFM.
An obvious alternative of the rate 8/16, EFMPlus code would have been EFM with two instead of three merging bits. The dc- content of the alternative code, EFM16a (the name we shall use for the code with two instead of three merging bits), can easily be assessed by computer simulation, and the results are shown in Table IV . The dc-content can be reduced significantly by a reassignment of the various words that takes into account the following observation. Observe, for example, that in EFM16a, the 16-bit words 0001000100001000 and 1001000100001000 are alternative channel representations. It can easily be verified that the disparity of both words (after precoding, of course) is zero. This, in fact, means that the encoder has no real option to increase or decrease the RDS with the transmission of those codewords. Obviously, it would be much better if we could redesign the code in such a way that as many source words as possible would have channel representations of zero-disparity. Nonzerodisparity codewords of opposite signs should be paired, whereas zero-disparity codewords may remain single. It is a straightforward exercise, using Gu and Fuja's method [16] to design a block code according to the above design heuristics. We may construct a block-decodable code with a source size of 260 instead of 257 words. A typical result, note there are many possibilities, called EFM16b, offers 8 dB (see Table IV ) more reduction at the low-frequency end than does EFM16a. This is a significant improvement, in particular, as the only disadvantage is the extra gate count required for decoding. Note that the EFM16b code requires a full decoding array of 16 bits instead of 14 bits as in EFM16a. An advantage with respect to EFMPlus, which requires a sliding-block decoder of length two, is the absence of error propagation. On the other side of the balance, we have a 3-dB extra reduction of EFMPlus's lf-content (see Table IV) .
As , at least in theory, it is possible to construct a rate 8/15 (2, 9) code. The EFM15 code [17] is an example of a rate 8/15, (2, 14) DCRLL code. An alternative rate 8/15 construction [18] is possible that requires, in contrast with classic EFM, only one merging bit. We could, in principle, employ the same 14-bit word assignment as in classic EFM. For an example of such a construction, the reader is referred to the U.S. Patent granted to Tanaka et al. [19] . EFM15, which is very similar to EFMPlus, has a codeword length of 15 bits. The code, a typical sliding-block code, has four encoder states, and was constructed using the ACH algorithm after a single split. The number of words that can be accommodated depends on the state, and is at most 270. This leaves at most spare words that can be used for dc-control. Pairing of the alternative representations has been accomplished in such a way that the words that form a pair differ in a single position, i.e., have unity Hamming distance. This has the advantage that the decoding operation is simplified and that the alternative representations have an odd or even number of ones, which, as was observed experimentally, has a beneficial effect upon the quality of the dc-control. Further details, such as coding tables and so on, of EFM15 can be found in the U.S. Patent description [17] .
VI. PERFORMANCE OF EFM-LIKE CODING SCHEMES
The spectral performance of the various members of the EFM family has been assessed by computer simulation. The outcomes of the simulations have been collected in Table IV. The lf-suppression, as presented in Table IV , is measured at , where the channel bit frequency Hz. If we wish to compare coding schemes of a different rate, it is standard practice to compare the lf-suppression at, say, 0.0001 times the user bit frequency . As the frequency Hz is assumed to be in the range of frequencies, where the spectrum has a parabolic shape, the lf-suppression at can be found by multiplying the lf-suppression measured at by . For example, if , we have to subtract dB from the numbers shown in Table IV to obtain the lf-suppression at , where Hz. A comparison of the properties of sequences generated under the rules of EFM and EFMPlus with those of maxentropic sequences is shown in Fig. 6 . As we can observe, theory predicts there is some room for improvement. For codes of the same rate as EFM and EFMPlus, we could, in theory, construct codes that generate sequences with a factor of three smaller sum variance or, alternatively, a 10-dB extra lf-content suppression.
If, on the other hand, we stipulate that the sum variance and, thus, the lf-content of EFMPlus is adequate, we may conclude from Fig. 6 that a rate 0.53 ( ) is possible with the sum variance of EFM. The performance of the rate 8/15, code, EFM15 [17] , listed in Table IV , is a far cry from the theoretical bound. Braun et al. [20] and Immink [21] presented coding schemes using long block codes with enumerative coding that are very close to the predicted maxentropic performance. The typical codeword length in their constructions is about 1000 bits, and the hardware required for encoding and decoding is about 5 kB. Other EFM-like codes have been presented by Roth [22] .
VII. OTHER EXAMPLES OF DCRLL CODES
The design of any constrained code can, at least in principle, be systematically accomplished by the design techniques that have been developed over the years. Unfortunately, the design of a DCRLL code with a rate close to the Shannon capacity of the constrained channel, is severely hampered by the large number of states of the finite-state machine (FSM), which models the channel constraints at hand. The large number of states of the underlying FSM, can, at least in principle, be handled by buying a sufficiently large computer, but the insight required is too easily lost. The design of DCRLL codes is therefore (still) the province of a plurality of ad hoc methods, for example, [23] - [26] . Basically, there are two systematic design approaches that emerged in the literature.
The first method uses the ACH algorithm to design an RLL code. In the final stage of the ACH algorithm, we end with a graph with the property that from any state of the graph, there are at least ( is assumed to be the source word length) outgoing edges. There are (hopefully) states with a larger number of outgoing edges. These surplus edges are used as alternative codewords that can be used for dc-control. The rate 8/16, (2,10) EFMPlus code, discussed in Section IV, is an example of a DCRLL code used in practice (DVD) that was designed according to these guidelines.
In the second method, dc-control is effectuated by multiplexing the source data or the encoded data with dc-control bits. A given, state-of-the-art RLL code, for example, the rate 2/3, (1,7) code, is used to generate RLL sequences. The sequences generated under the coding rules of said code are multiplexed with channel bits for minimizing the low-frequency components, the dc-control. The user data or, alternatively, the encoded data are partitioned into segments of bits. Between two consecutive -bit segments , dc-control bits are inserted, and the dc-control bits, in turn, are chosen to minimize the low-frequency components. In the experimental phase, we have the freedom to select the parameters and such that the required dc-suppression is reached. There is, in other words, no need to redesign the constituent RLL code.
The success of the design method depends on various factors, such as, for example, how much lf-suppression is required. In most of the practical cases that we encountered, an extra redundancy for the dc-control of 2-3% was sufficient to yield the required dc-suppression. In that case, codes using multiplexing methods offer an excellent performance and flexibility. In the next section, we will present a description of the first design method, where dc-control bits are multiplexed with the user or channel data.
A. Dc-Control on Data Level versus Coded Level
Assume the and constraints are given and that an efficient code has been found in the literature or, alternatively, constructed using the various methods offered in the literature.
A straightforward method for extending a standard code with dc-suppression is to add (stuff) redundant bits, which can be chosen to reduce the power at low frequencies. Essentially, there are two approaches with which a encoder can be extended with multiplexed dc-control: multiplexing can be done at two levels, namely, at the source data level or at the channel data level. Between segments of source data or between segments of encoded data , dc-control bits are inserted. In both multiplex formats, the dc-control bits are chosen to minimize the low-frequency components of the channel sequence generated. This can be accomplished by tallying the RDS at the end of each candidate segment. The encoder transmits that candidate segment whose RDS is closest to zero. At the receiver site, the added dc-control bits, either on the data or channel level, can easily be skipped by the decoder. The two multiplex approaches of dc-control have various distinct features. The dc-control bits can be freely chosen if they are multiplexed at source data level. Then, the encoder has possible sequences to be tried. If, on the other hand, the dc-control bits are multiplexed with the sequence, the new multiplexed sequence so generated has to obey the constraints in force, and as a result, the number of candidate sequences to be tried is less than . For the dc-control to be effective under all worst-case circumstances, it should guarantee that an (almost) entire segment of modulated data bits can be inverted or not inverted. We can easily verify that if the dc-control bits are multiplexed with the sequence, that in order to guarantee said worst-case performance, we require at least dc-control bits. Then, the maximum runlength at the segment boundaries will increase from to . A similar method has been proposed by Odaka [27] , Coppersmith and Kitchens [28] , and Patel [29] . When, on the other hand, the dc-control bits are multiplexed on the source level, the matter of worst-case performance is much more involved. The encoded segments are both a function of the source data and the encoder state at the start of the segment. It is therefore not recommended to use an industry-standard code. A possible solution, using the parity preserving word assignment, will be discussed in the next section.
B. Codes with Parity Preserving Word Assignment
In order to make it possible to efficiently control the dc-content in the source date level mode, we have made the assignment between source words and codewords in such a way that the parity of both the source word and its assigned codeword are the same. The parity of an -bit word (either source words or codewords), is defined by In other words, if the source word has an even (or odd) number of ones, then its channel representation also has an even (or odd) number of ones. A code with a parity preserving assignment has the virtue that when it is used in conjunction with dc-control bits at the data level, that setting an even (or odd) number of ones at the data level will result in an even (or odd) number of ones at the code level. This leads, as we will demonstrate shortly, to an efficient dc-control. An example of a variable length rate 2/3, code that complies with the parity preserving property is shown in Table V . It can easily be verified that indeed the assignment is parity preserving.
A parity preserving assignment of a rate 2/3, (1,8) code, first presented by Kahlman and Immink [30] , is based on the lookahead rate 2/3, (1,7) code described by Jacoby and Kost [31] . Tables VII-IX show the encoding of the new code parity preserving code. The full coding table of the code consists of a main table and two substitute tables instead of a single substitute table. It can easily be verified that the assignment is indeed parity preserving. The code was found by trial and error, as no approach is (yet) available for systematically constructing codes with a parity preserving word assignment. The systematic design of RLL codes with a parity preserving word assignment is a challenging task. The above examples show that it is indeed possible and that such codes offer a better performance than do their counterparts. Block codes are by their virtue of simplicity good candidates, but the complexity issue will hamper their design. Variable length synchronous codes seem to be promising candidates. It is not (yet) known how we can design parity preserving codes with the ACH algorithm.
The difference between the quality of the alternative dc-control methods has been assessed by Wang et al. [32] . The power density measured at a relatively low-channel frequency was used as a quality criterion. Computer programs have been written for simulating the two coding schemes, where the dc-control bits are multiplexed at source or at channel level, respectively. The code for the channel-level multiplex is the standard, rate 2/3, (1,7) code, whereas the source-level multiplex is the parity preserving, rate 2/3, (1,8) code described in the previous section. The authors observed that the parity preserving code performs 2 dB better than does the standard rate 2/3, (1,7) code used with channel-level multiplex in the range of dc-control redundancy of 1%-4%.
VIII. CONCLUSION
We have given a survey of channel codes for optical disk recording systems. It has been shown that state-of-the-art codes are very close to the bound set by the tenets of information theory.
