Constructions are presented of nite-state encoders for certain (d k)-RLL constraints with DC control. In particular, an example is provided for a rate 8 : 16 encoder for the (2 10)-RLL constraint that requires no look-ahead in decoding, thus performing favorably compared to the EFMPlus code used in the DVD standard.
Introduction
In optical and magnetic recording systems, the bit stream that is written into the device must satisfy certain constraints. The most common family of such constraints appears to bethat of the (d k)-runlength-limited (RLL) constraints, where the run of 0's between consecutive 1's in the bit stream must have length at least d and no more than k for prescribed parameters d and k. For Given a binary sequence z 1 z 2 z 3 that satis es a given (d k)-RLL constraint, the respective NRZI sequence over the bipolar alphabet f+1 ;1g is given by w 1 w 2 w 3 where w h = Q h i=1 (;1) z i . In addition to satisfying RLL constraints, recording applications typically require also the suppression of the DC component of the recorded NRZI signal namely, the spectrum (i.e., power spectral density) of the ensemble of NRZI sequences should take small values in the neighborhood of zero frequency 4, Ch. 2], 8], 9], 10]. Another parameter that measures the DC suppression is the running sum variation (in short, RSV), which is de ned as the expected value of (1=`) P`h =1 w 2 h when`! 1. Clearly, a smaller RSV indicates a better suppression of the DC component.
In order to satisfy a given constraint, a user-provided unconstrained data stream should beencoded namely, it needs to undergo a uniquely-decodable (or lossless) mapping into a constrained sequence. Encoders for constrained data usually take the form of a nite-state machine. A rate p : q nite-state encoder accepts an input block o f p bits and generates a q-bit codeword depending on the input block and the current state of the encoder. The sequences obtained by concatenating the generated q-bit codewords satisfy the constraint.
Each encoder must have a decoder that recovers the input stream from the constrained sequence generated by the encoder. The class of sliding-block decoders has the advantage of limiting error propagation. An (m a)-sliding-block decoder reconstructs an input p-bit block that corresponds to given received q-bit codeword on the basis of the local context of that codeword in the received sequence: the codeword itself, as well as m preceding codewords and a upcoming codewords. The parameter m is called the encoder memory and a is referred to as the anticipation. Thus, a single error at the input to a sliding-block decoder can only a ect the decoding of at most m+a+1 consecutive codewords.
In addition to having sliding-block decoders, it is desirable that the code have the highest rate possible. The rate of any encoder for a given constraint is bounded from above b y the Shannon capacity of that constraint. (2 10)-RLL encoder has zero anticipation. DC control is achieved by letting as many input blocks as possible be encoded into two possible codewords the parity o f n umberof 1's is di erent in these two codewords and, so, the respective NRZI sequences end with a di erent polarity, t h us reversing the polarity of subsequent codewords. Furthermore, both codewords lead to the same state, thus allowing local replacement of a codeword by its alternate without a ecting subsequent codewords. The spectral properties of the new (2 10)-RLL encoder are compared in Section 2 with two existing encoders for the same constraint at the same rate one of these encoders is the EFMPlus code which is part of the DVD standard. Design considerations are summarized in Section 4. One architectural feature of the design approach here is having a single compact encoder table that contains the codeword tables of all states in an overlapping manner: recognizing that sets of codewords that correspond to di erent states can intersect, the overlapping encoding table uses the very same entry in the table for each intersecting codeword, regardless of the state from which this codeword is generated. Furthermore, the speci c ordering of the codewords in the table provides an easy mechanism of DC control.
(2,10)-RLL encoder with DC control
The Shannon capacity of the (2 10)-RLL constraint is approximately 0.5418 4, p. 91]. We describe here a four-state encoder for this constraint at rate 8:16.
Our encoder consists of a table of 556 codewords, each 16 bits long. Table 4 lists the codewords in hexadecimal form. At each encoding step, the encoder can bein one of the following states: S0, S1, S2-5, or S6-8. Each state is associated with a range of runlengths which is re ected in the state name. E.g., state S6-8 is associated with the runlengths 6, 7, and 8. Encoding is carried out as follows: given an input byte b, a ten-bit address is formed by pre xing b with two bits. This two-bit pre x depends on how the value of b (as an integer) compares with two thresholds, T1 and T2. These thresholds, in turn, depend on the current state of the encoder. The table of threshold values and  the table of For example, if the current state is S6-8 and the input byte is 049 (decimal), then the address will be 256 + 049 = 305.
The output codeword is the entry in Table 4 at the computed address and the next encoder state is determined by the last runlength of the generated codeword.
In the example, the output codeword is 0000001001000010 (0242 in hexadecimal notation) and the next encoder state is S1.
There are cases where more than one pre x is possible, resulting in two di erent codeword candidates. To allow D C c o n trol, the encoder table is designed so that|to the largest extent possible|those codeword candidates will have di erent parity (odd/even) of number of 1's, thereby inducing di erent parity of sign changes (see 7] for using a similar idea in an unconstrained setting). Furthermore, both codeword candidates lead the encoder to the same state and, therefore, replacement of a codeword with its alternate can be done within an output stream without a ecting preceding or following codewords. The decoder can recover the input byte regardless of the speci c codeword candidate that was chosen.
For example, if the current state is S1 and the input is 70 (decimal), then the output codeword can beeither 0000100000010001 or 0100000010010001. Both codewords lead to state S0.
A schematic diagram of the encoder is shown in Figure 2 . Assuming independent and uniformly distributed input, the expected percentage of input bytes within an input sequence for which t wo c o d e w ord candidates exist is 49:7% (this number is computed by rst nding the stationary probability o f b e i n g at each encoder state see the discussion towards the end of Section 4).
Decoding is carried out as follows. First, an address is found of a The spectra of the encoder presented here and the modi ed EFM code were obtained by simulation and are shown in Figure 3 . In both encoders, the decision between a codeword and its alternate is done on-line by looking ahead at two upcoming input bytes and minimizing the absolute value of the sum of the bipolar values in the NRZI sequence that corresponds to the output binary sequence. The axes in the gure are scaled to match the respective gures in 5]: the length of each input bit is assumed to be two t i m e units, and the NRZI sequence is normalized so that its amplitude is 1= p 2. By comparing the two curves in Figure 3 , we conclude that in the low-frequency range, the spectrum of the encoder presented here is 5dB lower than that of the modi ed EFM code. The improvement of the new encoder over the modi ed EFM code can beseen also through the RSV values: simulation results show that the RSV of the new encoder|while looking ahead at two input bytes|is 24:3, whereas the RSV of the modi ed EFM code (with the same encoding look-ahead) is 36:5.
The second encoder presented in 5] is the EFMPlus code, which is part of the DVD standard. The encoding table of this encoder consists of 1 376 codewords, each 16 bits long. Unlike the modi ed EFM code or the new encoder presented here, the EFMPlus code is (0 1)-sliding-block decodable, as the decoding of an input byte requires the knowledge of two bits in the upcoming received codeword in addition to knowing the current codeword. On the other hand, when comparing the spectrum of our encoder with the respective c u r v e of the EFMPlus code ( Figure 4 in 5]), the two c u r v es look very much alike. In particular, both spectra take the value of approximately ;15dB for the normalized frequency value f = 1 0 ;3 . Modi ed EFM code r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r Figure 3 : Spectra of the new (2 10)-RLL encoder and the modi ed EFM code with two merging bits. To maintain DC control, each encoder looks ahead at two upcoming input bytes.
(2,12)-RLL encoder with DC control
The Shannon capacity of the (2 12)-RLL constraint is approximately 0.5471 4, p. 91].
We next describe an eight-state encoder for this constraint a t rate 8:15. Table 5 lists the encoder table which consists of 551 codewords, each 15 bits long. At each encoding step, the encoder can bein one of the following states: S0, S1, S2a, S3a, S4a, S5-6a, S2-6b, or S7-8. The speci c state depends on the last runlength of the previous codeword and the l.s.b. of the previous input byte, as described in Table 2 Table 3 . The output codeword is the entry in Table 5 Table 2 . Like in the previous encoder, DC control is attained by a l l o wing certain input bytes to have two di erent c o d e w ord candidates with di erent p a r i t y of numberof 1's (that lead to the same state). Assuming that the input is independent and uniformly distributed, the expected percentage of such input bytes within an input sequence is 12:2%.
The resulting encoder is (0 1)-sliding-block decodable, as demonstrated by the following decoding scheme. First, an address is found of a table entry that contains the received codeword. In case the codeword ends with a runlength of 0, 1, 7, or 8, then that codeword appears only once in the table. In this case, the input byte is obtained by truncating the two m.s.b.'s of the address where the codeword is located. In case the codeword ends with a runlength of 2, 3, 4, 5, or 6, then it appears twice in the encoder table in two adjacent locations. By truncating the two m.s.b.'s of the address the input byte is determined except for its l.s.b.: the latter bit decodes to 0 if the next received codeword is located in the encoder table at address smaller than 292 (this address boundary is marked in Table 5 ) otherwise, the l.s.b. is 1.
We can redirect into state S5-6 all the codewords that lead to states S2a, S3a, S4a, and S7-8, and delete those four states. By doing this, we obtain a four-state encoder for the (2 12)-RLL constraint yet, the expected percentage of input bytes that have two codeword candidates reduces to 7:6%.
The spectra of the eight-state and four-state encoders for the (2 12)-RLL constraint were obtained by simulation and are shown in Figure 4 , where the encoding look-ahead and the scaling of the axes are the same as in Figure 4 . Compared with the encoder in Section 2, the reduction of the DC component here can beobserved at signi cantly smaller frequencies: this is a result of the fact that the percentage of input bytes that have t wo codeword candidates is much smaller here. The e ect of that percentage is very much apparent when comparing the performance of the four-state (2 12)-RLL encoder with the eight-state encoder in the low frequency range: the spectrum of the latter encoder is approximately 6dB lower.
Code design
In this section, we outline the principles that guided the design of the coding scheme presented in Section 3. The design does involve some heuristics, but still can be applied to obtain similar coding schemes for other certain (d k)-RLL constraints in particular, similar principles guided also the design of the encoder in Section 2.
State merging and state splitting
Let G denote the graph presentation of the (2 12 )-approximate eigenvector must contain a component which is greater than 1 11, Section 3.1.4]. Combining this with Proposition 3.34 in 11], it follows that any sliding-block decoder of any nite-state encoder for the (2 12)-RLL constraint at rate 8:15 must have anticipation a 1. The encoder we construct is (0 1)-sliding-block decodable furthermore, the l.s.b. of the current input byte is the only bit whose decoding may require the knowledge of the next received codeword, in addition to the current codeword. (We remark that in the case of the (2 10)-RLL constraint, there is an approximate eigenvector which is a 0{1 vector. Indeed, for this constraint we presented a (0 0)-sliding-block decoder in Section 2 see also 3].)
States 9 through 12 have zero weight and therefore can be removed from G 15 with all their incoming and outgoing edges. States 7 and 8 have the same weight and, in addition, all words that can be generated by paths beginning at state 8 in G 15 can also be generated starting at state 7. Therefore, state 7 can bemerged into state 8 by redirecting edges incoming to state 7 so that they terminate in state 8, thereby allowing the deletion of state 7 (see 11], 12]). Similarly, states 2 through 5 can bemerged into state 6 yet, to obtain the encoder of Section 3 we merge only state 5 into state 6. That is, avoiding state merging in certain cases is the mechanism by which we get more codewords that have alternates and, thus, more DC control. (The same applies to the encoder in Section 2: we could merge states 2 through 7 into state 8 to obtain a (2 10)-RLL encoder with three states however, to increase the DC control, we only merged states 2 through 4 into state 5 and states 6 and 7 i n to state 8, resulting in four states altogether).
After merging and deleting states, we obtain a graph H with the following seven states: S0, S1, S2, S3, S4, S5-6, and S7-8. States S0 through S4 correspond to states 0 Next we split states in H with weight greater than 1. When a state u is split, two or more descendant states are formed. The incoming edges to u are duplicated into each of the descendant states, whereas the outgoing edges from u are partitioned among the descendant states. In the graph H, there are four states that need to besplit, namely, states S2, S3, S4, and S5-6. It can beveri ed that each of these states can besplit into two states, resulting in descendant states each h a ving at least 2 8 outgoing edges. In fact, after splitting, most states will have more than 2 8 outgoing edges, which will allow h a ving alternate codewords and hence DC control.
We point out that the EFMPlus code in 5] for the (2 10)-RLL constraint can be obtained in a similar manner by one state splitting. So, state splitting can lead to a gain in DC control. On the other hand, our strategy to obtain DC control is limiting state merging. We did invoke state splitting in the (2 12)-RLL case not as a means to gain DC control, but simply because it was required to obtain a (2 12)-RLL encoder even if DC were to be ignored. The application of state splitting is the reason for having anticipation greater than 0 in this case. On the other hand, we did not have to (and therefore we d i d not) apply state splitting to obtain the (2 10)-RLL encoder in Section 2.
Setting up the table
Straightforward splitting of the four states with weight 2 in H may r e s u l t in 11 encoder states. To reduce the number of encoder states and to obtain a compact codeword table, we de ne a certain order on the outgoing edges (or rather, their labels) from each state in H and construct the encoding table based on that order. Figure 5 presents the structure of the encoding table. The rectangle to the right represents a table of 551 codewords, divided into runlength intervals: each runlength interval contains the codewords whose rst runlength falls within a given interval of values e.g., all codewords that start with a runlength between 2 and 4 belong to one runlength interval. Codewords ending with a runlength in the range 2{6 are written twice in the table, in two consecutive places indeed, those codewords correspond to edges that were duplicated due to the splitting of states S2, S3, S4, and S5-6.
The two-sided vertical arrows to the left mark the locations of codewords that can be generated from each state in H. Figure 5 also shows a splitting of states S2, S3, S4, and S5-6 which is marked by the dashed line: In each one of those states, the outgoing edges are partitioned so that edges labeled by codewords that are located at addresses < 292 belong to a descendant state that inherits the name of the parent state with a su x \a". The rest of the edges belong to the other descendant state that carries the su x \b". The number 292 was chosen so that state S5-6a will have at least 2 8 outgoing edges. Note that a duplicated codeword in the table corresponds to an incoming edge to a state that was split.
Observe that all codewords that can be generated from a given state form a contiguous segment of the table. In fact, it follows from a result by F ranaszek 2] that this can always be done for any ( d k)-RLL constraint. Segments of the table that correspond to di erent states can overlap, thus resulting in a compact encoding table. Also, states S2b, S3b, S4b, and S5-6b are equivalent in that the sets of codeword sequences that can begenerated from each one of those states are the same. Therefore, we can combine those states into one state which we call S2-6b.
The current table already implies a coding scheme as follows. Encoding is carried out by adding a two-bit pre x to the input byte. This two-bit pre x is chosen so that the resulting address falls within the address range of the (contiguous) segment of the table that corresponds to the current state, as determined by Figure 5 in fact, in many cases more than one pre x is possible. Finding the right pre x can be translated in a straightforward manner into threshold comparison, leading to Figure 5 : Location of codewords generated from each state in H.
Recall that codewords that appear twice in the table occupy consecutive addresses. By putting the rst codeword in each such pair at an even address, the most signi cant seven bits of the current input byte are determined by the current received codeword.
As a nal design stage, we reorder the codewords in our table so that the structure of Figure 5 is maintained, while satisfying additional conditions to allow D C c o n trol. More speci cally, let a and a + 2 8 betwo addresses in the table, both belonging to the same table segment corresponding to some state in H. Then, we require that, to the largest extent possible, the following two conditions hold:
(C1) The terminal states of the codewords at addresses a and a+ 2 8 should be the same this allows to replace a codeword with its alternate without a ecting subsequent encoded codewords.
(C2) The codewords at addresses a and a + 2 8 should have di erent parities (of number of 1's).
We use the following heuristic procedure to reorder the table so that these two conditions are satis ed. Denote by x y) a contiguous table portion that starts at address x and ends at address y;1 (inclusive). Starting with s = 000, we look for the largests uch that for every i 0, each of the Rows and columns are indexed by the encoder states, according to the following order: S0, S1, S2a, S3a, S4a, S5-6a, S2-6b, and S7-8 (the entries in A E do not take i n to account alternate codewords that are used for DC control). The peculiarity o f h a ving equal rows in A E for all encoder states is a consequence of the fact that the terminal states are the same for codewords that are located in the table at addresses which are at distance 2 
