Abstract| Block Cyclic Redundancy Check (CRC) codes represent a popular and powerful class of error detection techniques used almost exclusively in modern data communication systems. Though e cient, CRCs can detect errors only after an entire block of data has been received and processed. In this work, we propose a new \continuous" error detection (CED) scheme using arithmetic coding that provides a novel tradeo between the amount of added redundancy and the amount of time needed to detect an error once it occurs. We demonstrate how the new error detection framework improves the overall performance of transmission systems, and show how sizeable performance gains can be attained. We focus on several important scenarios: (i) Automatic Repeat ReQuest (ARQ) based transmission; (ii) Forward Error Correction (FEC) frameworks based on (serially) concatenated coding systems involving an inner error-correction code and an outer error-detection code; and (iii) Reduced State Sequence Estimation (RSSE) for channels with memory. We demonstrate that the proposed CED framework improves the throughput of ARQ systems by up to 15% and reduces the computational/storage complexity of FEC and RSSE by a factor of two compared to the state-of-the-art systems.
I. Introduction
Conventional communication systems have use block Cyclic Redundancy Checks (CRC) for error detection. Since CRCs operate on blocks of data, they can detect errors only after an entire block of data has been processed. An error detection scheme that is \continuous" can detect errors faster and so can enhance the performance of communication systems, both in terms of throughput, and complexity reduction in systems based on sequence estimation. In this work, we describe a new class of \continuous" error detection techniques, and show its utility in a variety of common communication scenarios like (i) Automatic Repeat ReQuest (ARQ) based data transmission, (ii) (Serially) Concatenated Coding systems deploying an inner error-correction code and an outer error-detection code, and (iii) Reduced State Sequence Estimation for channels with memory.
Our proposed approach is based, ironically, on a popular source (entropy) coding technique, namely the arithmetic coder 1]. During our research, we became aware that the idea of continuous error detection based on arithmetic coding had been tackled earlier by Boyd et al. in 2] , albeit with little system performance analysis, or exposition of its utility in communication systems. In this work, we not only undertake a more rigorous analysis of this paradigm, quantifying the underlying tradeo s involved in the process, but also establish the impressive gains in system performance attainable through sophisticated integration of this novel paradigm into popular, powerful transmission scenarios such as those listed above.
The basic idea is simple: add to the list of data symbols to be arithmetically encoded an extra \forbidden" symbol that is never actually transmitted, but for which a controlled amount of probability space is reserved nonetheless. (Note that this increases the redundancy of the coded symbol stream.) By increasing the amount of coding space that the forbidden symbol occupies, it is possible to make statistical guarantees about where the errors may have occurred. That is, it is possible to isolate the location of the error in a statistical sense, to any desired con dence level, to the previous m bits, where m depends on the amount of invested excess redundancy and the desired con dence level.
We also observe that the error detection performance is independent of whether the errors were random or bursty, but depends only on the amount of redundancy introduced. Thus CED can be used for communication over AWGN channels, or fading channels, or in conjunction with communication systems with error propagation at the receiver, like the decision feedback equalizer with CED as an outer error detecting code.
Continuous Error Detection can be of great use in several communication systems like systems where there is state space explosion such as in sequence estimation based receivers. In this work we would like to present CED as a new paradigm to be incorporated into communication systems which can improve the throughput in some systems, as well as reduce the complexity in some others. CED can be approximated in other ways too, without using arithmetic coding. For example, a high rate convolutional code that introduces redundancy in a periodic fashion in a bitstream can also detect errors based on illegal state transitions. Another example would be a cyclic redundancy check code used periodically. Such schemes would be more amenable to analysis as they are linear systems, unlike CED based on arithmetic coding, which is di cult to analyze. In these examples of traditional error control codes, however, error detection can only occur at instances corresponding to parity check symbols 1 and in this sense is not really continuous. In the proposed arithmetic coding based CED approach, encoded bits are both data and parity bits and thus allow error detection occur virtually at any instance. Even at its best, convolutional or cyclic codes cannot provide the continuously tunable redundancy that comes with arithmetic coding, where the redundancy can be any real number. So the focus of this paper is to present performance improvements in di erent communication systems that use CED, over conventional CRC based systems. We will use arithmetic coding as a vehicle to demonstrate the power of continuous error detection, though the CED framework is general enough to encompass other schemes such as the one based on convolutional coding as discussed above.
In an ARQ system, the utility of CED comes from the ability to make statistical statements relating the time of error occurrence to the time of its detection, resulting in potential savings in the number of bits that need to be retransmitted when an error is detected. In this setting, we optimize the tradeo between added redundancy and error-detection time to attain signicant throughput gains of up to 15% compared to a fully-optimized conventional ARQ system.
In a serially concatenated coding system using inner convolutional codes, CED can likewise be useful to eliminate invalid trellis paths, leading to potential performance gains. The ability to dynamically prune inadmissible subpaths in a \list" Viterbi algorithm can be exploited to increase the likelihood of retaining the correct path in thenal list. Our approach reduces the number of paths which have to be processed and stored in the list Viterbi algorithm by a factor of two compared to the number of paths needed with block CRC detection for the same overall performance.
In reduced state sequence estimation where there is a limit on the number of paths that can be maintained, continuous error detection allows checking the for validity of paths on the y in the reduced state trellis and pruning the incorrect candidates earlier, thus increasing the likelihood of the correct path being present at the end. As in the concatenated coding scheme mentioned above, CED allows reducing the complexity/storage requirements by a factor of two compared to state-of-the-art systems in this application also.
The rest of the paper is organized as follows. Arithmetic coding is introduced in Section II and details of how it can be used for continuous error detection are discussed. Section III presents an application of CED for ARQ transmission where it provides signi cant throughput gains over conventional CRC based schemes. In concatenated coding systems, using CED as an outer error detection code can lead to reduce the complexity of the system signi cantly. This is discussed in Section IV. In Section V we show how CED can reduce the complexity of reduced state sequence estimation by a factor of two over a conventional CRC based scheme. We conclude in Section VI.
II. Arithmetic Coding and its application for Continuous Error Detection
Arithmetic coding 1], with a suitable source model, has been widely accepted as the optimum method for data compression. Here data strings are mapped onto code strings that represent the probabilities of the corresponding data strings. To realize this mapping, some model has to be assumed for the source, which could be changed adaptively in practical situations. Based on this model for symbol probabilities, the coding algorithm progresses in a symbol-wise recursive fashion. On each recursion, it partitions an interval of the number line 0 < a b < 1 and retains one of the partitions as the new interval. Thus the data string gets mapped onto a fractional number between zero and one.
This can be illustrated better with an example. Consider a source alphabet with three symbols a; b; and c with p(a) = 0:011; p(b) = 0:011 and p(c) = 0:010, where the probabilities are all in binary representation ( Figure 1 ). Let us consider encoding a sequence abc : : :. After encoding the rst symbol a, the transmit sequence would lie between 0 and 0.375 (0.011 in binary). At this point the encoder can release the rst encoded bit -\0". The second symbol con nes the transmit sequence to the range ((0:375)(0:375); (0:375)(0:375) + (0:375)(0:375)), i.e., (0:1406; 0:2812). Each symbol encoded reduces the interval in which the transmit sequence would lie. As the number of symbols increases, we get a fractional number between zero and one that represents the probability of occurrence of that particular sequence of symbols. The probabilities of symbols are allowed to vary in the process of encoding with the single restriction that they can only depend on past data to ensure that the decoder can also track them. It is actually the model adaptation which contributes the most to the complexity of the practical arithmetic coders and makes the term \arith-metic coding" to be synonymous with \computa-tionally expensive." However, if the probabilities of symbols are xed through the encoding process, then the computational burden is quite insigni cant; normally only two registers are required for a data string with a few real operations per symbol 3]. This complexity is for the updates for the size and the beginning pointer for the current interval, which require two multiplications. It has been shown in 4] that each multiplication can be closely approximated by a shift and an add, thereby greatly reducing the complexity of each update to two shifts and adds instead of two multiplications. These results are used in this paper to justify the claims about complexity/storage requirements of our proposed systems since they need no modi cation of probability models based on the previous data during encoding.
A. Incorporating error detection into arithmetic coding The arithmetic decoder is able to perfectly reconstruct the transmitted data by reversing the encoding operations. However, an error in an arithmetically coded stream often causes a loss of synchronization, and all subsequently decoded symbols become invalid. It is this loss of synchronization that we wish to exploit to give us error detection capabilities. The idea ( rst proposed in 2]) is to introduce a forbidden symbol that is never encoded by the arithmetic coder, but is nonetheless assigned a nonzero probability; then upon decoding, if an error occurs, this forbidden symbol is likely to be eventually decoded with very high probability. This is because loss of synchronization corrupts the subsequent decoded stream so that it can lie anywhere in the interval (0; 1], which means that the forbidden symbol, that has a nite probability assigned to it, can be decoded with some nonzero probability. Investigation of the conditions under which the detection of errors is guaranteed is not undertaken in this paper. What is exploited in this work is the fact that if the forbidden symbol is decoded, this guarantees that an error has surely occurred. The amount of time needed to decode the forbidden symbol after the occurrence of an error is inversely related to the amount of added redundancy due to the introduction of the forbidden symbol.
To understand this better, consider the case where there are only two symbols (see Figure 2 ): a \useful" symbol a, and the forbidden symbol x. Suppose that x is assigned probability (0 < < 1) and a is assigned probability (1 ? ).
Each time a data symbol a is encoded, the subinterval corresponding to a (with current uncertainty interval ) will be partitioned as follows:
(1 ? ) of the subinterval is allocated to a, and of the subinterval is allocated to x. when encoding the symbol a and adding a symbol x which is never encoded. The probability of x is and the probability of a is (1 ? ). The subintervals representing data strings that contain an x are denoted as n(x).
Because the forbidden symbol is never encoded, the subintervals corresponding to x are never partitioned. Then, after encoding n symbols, the length of the interval containing valid codewords is decreased by the factor (1 ? ) n . If an error is introduced in the encoded stream, the decoded codeword will change its position on the interval. This change in position is well modeled as being uniformly distributed on the inter- When an error occurs, the bitstream is no longer restricted to be in the subintervals that belong to the data symbols, but will now have an probability of falling in the subinterval corresponding to x (i.e., decoding the symbol x), and a (1 ? ) probability of falling in the valid subinterval corresponding to a. 2 From this, we 2 Simulations using various sizes of con rm that the geometric distribution is a valid model. see that P Y > n] = (1 ? ) n : (2) Letting = P Y > n], taking the logarithm of both sides of (2) and solving for n, we get n = log 2 ( ) log 2 (1 ? ) : (3) Thus, upon detection of an error, the probability that the error occurred in the previous n bits is (1? ). This probability can be made as small as desired. Yet, there is still always a small probability that given an error occurred, the error pattern will result in the original code word being mapped to another valid code word, so that the error is never detected. We, however, found this event to have insigni cant probability in our experiments.
At this point, some observations regarding burst error detection properties of the continuous error detection scheme have to be mentioned. This is particularly of interest while considering error detection times for fading channels which cause bursty errors in the transmitted data, or when CED is used as an outer error detection code with an inner decoding scheme that could have error propagation, like the decision feedback equalizer. Analysis of detection times for burst errors is not very straightforward as the e ects of loss of synchronization depend on the particular symbol sequence and the particular burst of errors, and a general characterization would be di cult to make. However, it has been observed from simulations that the position of detection of error is independent of the burst length. Figure 3 shows a comparison of the detection position histograms for three di erent burst lengths. A frame of 2000 bits was encoded using a redundancy = 0:01. In the encoded bit stream, a burst of errors was introduced starting from bit position 100 and the positions where the error was detected were observed. The histograms presented are for errors detected within bits 100 to 300. We can see that they are practically the same irrespective of whether it was a single error, or a burst of length 20, or one of length 50.
This can be intuitively understood as follows. Once an error happens in a stream encoded using arithmetic coding, the received codeword can lie anywhere in the 0; 1) interval with equal probability 5]. The only way in which an error burst could a ect this is if it could cause a resynchronization of the erroneous codeword into another valid codeword. This is a very rare event, and the probability of its happening within anite frame size is quite small. Hence the position of detection of the error is more or less dependent on where the rst error happened. This helps us detect burst errors of di erent lengths with the same chances of success or failure as detecting a single error. It is also worth mentioning that a popular block-based Hu man (pre x) coding for data compression can be viewed as a special case of arithmetic coding. Thus, our results may also be extended to include Hu man coding, though the continuous nature of error detection will, in general, be lost. (The extension is obvious: if a Hu man decoder receives a pre x that does not correspond to any codeword from the alphabet, an error has surely occurred during the transmission.) B. Redundancy versus error detection time
Having established the error-detecting capabilities of introducing a forbidden symbol upon encoding, the next task to consider is the price (in terms of extra bits) that is paid for introducing this forbidden symbol. To determine the coding ine ciency resulting from the presence of the forbidden symbol, we note that typically, ?log 2 ( ) bits are needed to represent a subinterval of width on the unit interval 0; 1). By introducing the forbidden symbol x that occupies a subinterval of width , the rest of the data symbols will be con ned to a subinterval of width 1 ? . As a result, the amount of redundancy R x due to adding the forbidden symbol x is R x = ?log 2 (1 ? ) bits=symbol encoded: (4) From (4) and (3) we get n = ?log 2 ( )=R x : (5) Also, we see that the amount of redundancy added for a given \con dence level" (captured by ) is inversely related to the amount of time it takes to detect an error. This is useful for ARQ where, depending upon channel conditions, there are conditions in which fast detection is more important than the amount of redundancy added (noisy channel conditions) and there are conditions in which the opposite is true (calm channel conditions). From (5), we have direct continuous control over the amount of redundancy added versus the expected amount of time it takes to detect an error.
III. ARQ with continuous error detection
In this section we brie y describe the application of CED for ARQ transmission. A more elaborate description can be found in 6]. To incorporate the new error detection scheme into ARQ, a few minor modi cations need to be made to the conventional ARQ framework. The most critical change is that error detection is done continuously; as soon as the forbidden symbol is decoded at the receiver, a retransmission of n bits is requested, where n corresponds to a con dence level of (1 ? ) as in (3) . Since the position of the error has to be conveyed to the transmitter, an overhead of at most R n = log 2 (N) + 32 bits is necessary each time a retransmission occurs; N represents the packet size and the 32 bits of overhead typically provide for sequencing information, ags, etc. 7]. In our scheme, the packet size N is only signi cant in contributing to the amount of overhead (log 2 (N) bits + other overhead bits) and may thus be chosen independently of the channel conditions. This is not the case, however, for conventional ARQ, where N must be carefully chosen in accordance to the bit error probability of the channel (we will assume a Binary Symmetric Channel -BSC) in order to optimize throughput. To maximize throughput in our case, n (i.e., the number of bits to retransmit) must be optimally matched to the channel conditions. A. Throughput analysis For simplicity, in the following analysis we assume a noiseless return channel and an in nite receiver bu er. We also disregard the probability of undetected error and the probability that the error is not corrected after the rst retransmission. Throughput is de ned as the average amount of information (in bits) transmitted per channel bit. To nd the throughput, we need to rst nd the average number of bits that are successfully received for each channel bit transmitted and call that T c . We then need to nd the fraction of useful data contained within each bit after adding parity, and call that . The throughput T is then clearly: T = T C : (6) Let Y be the random variable representing the time (in channel bits) from when an error occurs to when it is detected, and let Z be the random variable representing the time (in channel bits) from when a retransmission (or start of transmission) is requested to when an error occurs.
Then the probability distribution of Y , P y (k), is de ned in (1) and the probability distribution of Z, P z (k), will be geometric with parameter p b
(bit error probability of the BSC):
Now, de ne X = Y + Z to represent the number of channel bits (not including overhead) received in between the requests for two successive retransmissions. Because the n bits prior to the bit that produced the forbidden symbol need to be retransmitted, the number of bits that are successfully received will then be (X?n) + where X + represents the positive part of X (i.e., X + = X if X > 0 and X + = 0 if X 0). Then, T c can be written as T c = E (X?n)+]
E X+Rn] and can be shown to be: 
(8) and (9) can then be combined to arrive at the throughput using (6) . Simulations con rm that the throughput equation (6) models the actual throughput very well, as will be seen in the next section.
B. Results for ARQ with continuous error detection Simulations were run using a BSC model at various bit-error probabilities with multisymbol data alphabets. Several 10 kbit packets were sent at each bit-error probability, and the resulting throughput was calculated. As a measure of performance, we compared our method of ARQ to conventional methods of ARQ. To simulate a fair comparison for our method versus conventional ARQ methods, the optimal packet size 3 was used at each of the bit-error probabilities tested using conventional ARQ. Accordingly, the optimal was also found, and used for each of the biterror probabilities tested using our new method of ARQ. The resulting throughputs are shown in Figure 4 . From the gure, we see that the new method of ARQ outperforms conventional ARQ methods at all bit-error probabilities 4 Furthermore, our new method of ARQ is well-suited for time-varying channels, because can be continuously adapted as a function of the channel conditions.
IV. CED for concatenated coding systems
Concatenated coding schemes for forward error control are becoming extremely popular due to their high performance and the existence of ef-cient decoding algorithms. In this work we consider serially concatenated coding schemes with an inner convolutional coder and an outer error detection coder and show how the CED can be applied to signi cantly reduce the complexity of such systems.
A convolutional encoder can be represented in the shift register form as shown in Figure 5 . Since the memory is nite, the equivalent nite Markov chain and trellis representation are easily constructed (not shown). Suppose that the output of a convolutional encoder is transmitted over a memoryless channel. The receiver has to perform the maximum likelihood estimation of the transmitted data sequence fu k g using the received vector sequence fz k g which is the output of the memoryless channel with the input being A point of note is that despite the fact that the VA in its original form produces a best estimate of the transmitted codeword, this estimate may be incorrect. By using an additional error detecting channel code (typically CRC), it is possible to tell with a very high con dence level whether the survivor path is error free at the price of an insigni cant increase in the overall channel rate. By making appropriate modi cations, the VA algorithm can be changed to output an ordered list of the \N-most likely" transmitted codewords (for a list-N VA), i.e., instead of nding the best path we require a list of the N best paths, resulting in considerable improvements over the original VA. An example of such modi cation is called the \parallel" list VA 9]. In the parallel list VA, at every step of the algorithm, the list of N best paths coming into each state is maintained rather than a single path as in the conventional VA.
Those N survivor paths are checked (in the order determined by their accumulated metrics) at the end of the data block using an error detection code until the rst valid path is found. The performance of this decoding procedure depends on N and can be shown to have a worst case asymptotic gain of 3 dB over the original (i.e., N = 1) VA 9] . Note that N is directly proportional to the complexity of the decoder and to its performance. It is chosen to trade-o these two parameters. The parallel list VA is illustrated in Figure 6 for N = 2. A more e cient \serial" implementation is also possible which stores all decisions and associated metrics along the trellis and allows for the request of additional path estimates only when needed. To summarize, the list VA algorithms allow a trade-o of the complexity of the decoder for its performance. It is also true that most of the performance gain can be expected by maintaining a list of 3 ? 5 paths instead of one 9]. Further increases in the size of the list correspond to rather insigni cant gains. Our proposed scheme uses the same idea of producing a list of best path estimates as in 9]. However, the outer block error detection code (CRC) is replaced by a continuous error detection code. We assume that the total number of paths in the list N is xed to address computational complexity/delay constraints, and we consider a parallel version (which is well suited for hardware implementations) of the list VA 5 as shown in Figure 6 . The motivation is that if channel errors occur at the beginning of the block, there is a good chance that this will render all the paths in the list-N VA (for small N) incorrect, resulting in an undesirable \no correct path" situation. The proposed continuous error detection scheme allows some incorrect paths to be pruned before the end of the block is reached, hence, increasing the probability that the correct path will survive. We illustrate our proposed scheme in Figure 7 . Note the presence of a feedback path from the arithmetic decoder to the list VA decoder. The system operates as follows (see Figure 6 ).
At each state of the trellis, N best paths are chosen from among the incoming 2N paths (we assume a binary alphabet as shown in Figure 6 ). The decision about the survivor paths is made based on both the accumulated metric and the validity of the path. In this case if the arithmetic decoder signals an error event for a given path, this path is pruned and replaced by a \valid" path having the next best accumulated metric.
A. Results for list VA with continuous error detection To assess the performance of the proposed modi ed list VA decoder, we compare it to the list VA scheme of 9] in terms of the complexity/storage requirements for target performance and SNR gains. In all of our experiments we set = 10 ?2 , which resulted in approximately a 1:5% increase in the data rate: this is accounted for by appropriately reducing the SNR in the comparisons. We also chose to keep the same packet structure as used with CRCs for simulation purposes.
A Figures 8 & 9 . Note that the number of paths in the list VA has to be about two times higher than the number of paths in the proposed system for the same target performance.
The storage requirements are signi cantly reduced in the proposed scheme since only a single register is needed to store the state of the arithmetic coder/decoder for each path, while the number of maintained paths is signi cantly lower. Additional computational complexity of the arithmetic codec does not depend on the size of the data and can be addressed by e cient implementations, where each multiplication operation can be replaced by a shift and an add. This was mentioned in Section II. 
V. Application of Continuous Error Detection for Joint Equalization and Coding for ISI Channels
Combating inter symbol interference (ISI) is one of the main challenges in digital communication over band-limited channels. Because of the band-limitedness of the channel, past and future symbols interfere with the present symbol. The channel itself causes amplitude and phase distortions. Several equalization techniques can be used to cancel out the ISI. In Decision Feedback Equalization (DFE), past decisions are used to cancel out the interference due to the past symbols from the current received sample. However, an error in declaring a symbol results in error propagation, which might make it di cult to combine it with error correction coding. Techniques like transmitter precoding to reduce error propagation cannot be applied to cases where the channel is not known to the transmitter.
A more powerful technique would be to do Maximum Likelihood Sequence Estimation (MLSE), using the Viterbi algorithm. In Section IV we discussed the scenario where communication was over a memoryless channel. Under those circumstances, the number of states in the trellis for MLSE is small and so performance gains can be got by actually maintaining more paths than what would be present in a normal trellis. This fact is utilized by the list Viterbi algorithm, on which the scheme presented in Section IV was based. On the other hand, the complexity of MLSE grows exponentially with channel memory and therefore might be impractical for channels with long memory. Reduced state MLSE schemes seek to reduce this complexity by choosing not to grow the full trellis based on some set of rules.
Past work in this direction ( 11] , 12], 13], 14]) has mostly been based on a priori rules for controlling the path generation mechanism. Thus the set of paths that is generated is essentially predetermined. In 15], the path generation is based on estimated values of channel noise samples. While in 11], 12], 13], 14] uncoded ML criteria are used for choosing the correct path from a list of candidate paths, in 15] an error detection code is used to see which of these paths belong to the code. In this work, we propose a superior alternative to CRC-based error detection 15] and advocate the use of continuous error detection in this framework.
In 15] a novel scheme was presented that combined decision feedback equalization with reduced state sequence estimation (RSSE), where estimated values for channel noise were used to decide whether or not to branch out to a new path. At the end of the frame, a CRC is used to select the correct path. Our scheme integrates continuous error detection into that framework and shows how, by providing an e ective way of continuously pruning erroneous paths, one can reduce the complexity of the sequence estimation scheme by a factor of two. This shall be discussed at greater length in the sequel.
A. Joint Equalization and Coding for ISI Channels : CRC based approach In this section, we present some details of the algorithm discussed in 15] that are relevant to our proposed scheme.
A.1 System Model
The source data stream is taken k symbols at a time and encoded using a cyclic code to n symbols. Thus it ensures that all bursts of error of length n?k or less can be detected. Let s 1 ; s 2 ; : : : be the transmitted sequence, n 1 ; n 2 ; : : : be the channel noise samples and h 0 ; h 1 ; : : : ; h M be the channel impulse response. The channel impulse response is assumed to be known, or reliably estimated, at the receiver. Without loss of generality, h 0 = 1. Transmit symbols are drawn from a modulation alphabet of size jSj = q. We can write the receive sequence at the output of the channel as:
s t?i h i + n t : (10) The conventional DFE (see Figure 12 ) would estimate the current received symbol as follows:
(s t?i ?ŝ t?i )h i + n t ; (11) which can be written as z t = y t ? M X i=1ŝ t?i h i : (12) Here, z t is the signal at the slicer input, and s t denotes the estimated symbol. The output of the DFE, i.e.,ŝ 1 ;ŝ 2 ; : : : will be referred to as the standard path. Channel noise at each instant can be estimated aŝ n t = z t ?ŝ t ; if jz t j < (q ? 1) 0; otherwise:
A new path is opened in the trellis whenever this estimated value of the channel noise exceeds a certain threshold . Paths that have already diverged from the standard path also diverge based on this threshold. Thus provides a trade-o between performance and complexity. High values for would mean that we would have very few paths in the trellis, and so the performance would be the same as that of the single zeroforcing DFE. On the other hand, lower values would mean that we have increased complexity of the MLSE trellis as we would have to keep more paths.
The modulation alphabet considered is Pulse Amplitude Modulation with equiprobable symbols taking on values in A = f 1; 3; : : :; (q ? 1)g, where q = jAj is a positive even integer.
Errors can happen due to two di erent reasons: channel noise causing wrong decisions on symbols, and error propagation due to a wrong decision. A channel error is said to have occurred at position t, if jn t j 1 and js t + n t j q ? 1; and ifŝ t 6 = s t we say that a decision error has occurred. Typically a channel error is followed by several decision errors.
A path diverging from the standard path is called a branch, and a path that diverges from a branch is called a sprout. Going beyond a depth of two in the DFE tree would give only small performance gains at the expense of a lot of complexity 15]. So this algorithm does not look for paths beyond a depth two in the DFE tree.
A.2 Path Generation Algorithm
The rules for branching used in 15] are summarized below. Figure 13 presents If a path goes through b branches and s sprouts, the path is said to have b + s breakpoints. For a DFE tree of depth two, there can be a maximum of 2B breakpoints. A path with = b + s breakpoints is the correct path only if channel errors occur during the transmission. So, we see that a path with + 1 breakpoints is less likely to be the correct path than one with breakpoints.
A.3 Path Selection Algorithm
Following the rules mentioned above, paths are created, and at the end of the frame the correct path is selected as follows:
1. The paths in the DFE tree are processed in a xed order to check if they belong to the cyclic code, and the rst one that happens to be in the code is selected as the estimated output sequence. 2. If a path belonging to the code is not found among the paths processed, the standard path is declared as the estimated sequence. The standard path is along abcdefg; paths with one breakpoint are along abhcdefg; abcdi, and abcdefj; and the paths with two breakpoints are along abhefg; abhcdi, and abhcdefj.
Since our goal is to isolate the gains that could be got by using CED instead of CRC, we maintain the same rules for path generation and branching as in 15] . CED provides us with an effective tool for pruning the paths and we nally have path selection rules that follow the same principles outlined above.
B. Application of CED for Reduced State Sequence Estimation in this framework In this section, we discuss how continuous error detection can bring about reduction by a factor of two in the number of branches and sprouts required to achieve performance similar to that of 15] at the expense of some complexity in terms of the arithmetic codec. We also show how CED ts well into the framework of the algorithm presented in Section V-A where the constraints are on the number of branches and sprouts that we are allowed to use and hence it would be desirable to assign them only to correct paths.
Consider a simple case where there was only one channel error in the frame. Then, what is the probability that we do not have a branch that captures the channel error by the end of the frame? This could happen under the following scenario. Let branch B i , which was created at time t i , have an estimated noise of magnitude jn i j at its position of branching. Suppose B k corresponds to the correct branch, i.e., the one that captured the channel error. If it happens that jn j j > jn k j; j = 1; 2; : : :; B; j 6 = k, and at some time t m the estimated noise amplitude jn m j > jn k j, then the valid branch will have to be discarded. Without a way to continuously prune erroneous branches, several additional branches would have to be allocated to allow for the erroneous branches so that the correct branch is not replaced by a spurious one. On the other hand, if we use continuous error detection, we can ag down some of the paths that are wrong. From (2) we can see that given that there was an error at bit position m on a path, then the probability that it is not detected by bit position n is given by (1 ? ) n?m . As n increases, this probability goes down and the chances of detecting the error increase. Whenever a new branch has to be assigned, we check all the existing paths against the arithmetic decoder to see if any are wrong. If all paths along a branch are wrong, it is removed and that spot is freed for the next deserving candidate. The performance gains are primarily due to the fact that we are able to remove invalid branches in several cases before we run out of branch allocations and end up replacing the correct one as described in the scenario above. The same also holds for sprouts. Since there will not be many paths to be checked, the complexity is not very high. This issue will be addressed in Section V-C. For more than one channel error also, the reasoning holds, as CED is insensitive to burst lengths (see Figure 3) .
B.1 System Description
The block diagram for the CED-based system is presented in Figure 14 . The DFE outputs estimates of the symbols at each instant, and based on the path generation mechanism described in Section V-A the reduced state Viterbi trellis creates new paths. Whenever all resources for branches and sprouts are used up and a valid candidate comes up, all the paths are checked against the arithmetic decoder and the agged paths are removed, and if all paths through a branch or sprout are removed, then the resource is reallocated to the next deserving candidate.
Consider the example presented in Figure 15 , which shows a particular case where the CRCbased scheme would fail but the CED-based scheme does not. Here B = 2; andS = 1.
Suppose that the channel error had occurred at t 2 . There were two branches that exceeded the threshold until time t 3 , at instants t 1 and t 2 , t 1 < t 2 . Let the absolute values of noise amplitudes be jn 1 j = 0:33; jn 2 j = 0:3; andjn 3 j = 0:38 at instants t 1 ; t 2 , and t 3 , respectively. Since jn 3 j > min(jn 1 j; jn 2 j) we would have to replace the branch corresponding to min(jn 1 j; jn 2 j) in the CRC-based case, which is b 2 , which happens to be the correct branch. On the other hand, in the CED-based scheme, if the CED is able to detect that that paths along b 1 and s 1 are wrong, then b 1 is removed and b 2 can still be retained.
This scenario does happen quite often when few branches allowed, and the advantage of CED is that it requires about half the resources in terms of B to achieve the same performance as the CRC based scheme.
How well can the CED-based scheme perform compared to the CRC-based scheme? An exact analysis of the performance of the CED-based scheme seems complicated as most of the expressions for paths removed and paths left are in probabilistic terms, making the overall error probability expressions unwieldy. However, we can nd a lower bound for the performance. The lower bound discussed in 15] consists of the case where the standard path is in error and it goes were wrong and removes them.
undetected by the path selection scheme. This can happen under two di erent scenarios: there are one or two channel errors and the CRC does not detect the error, and the case where there are more than two channel errors. Since we do not keep paths with more than two breakpoints, our scheme will not be able to capture the cases where there are more than two channel errors. This results in negligible loss of performance, but there are signi cant gains in complexity 15]. Since all paths with more than two breakpoints are being discarded, a signi cant part of the errors are because of the fact that they would have required more than two breakpoints to capture them. Our modi cation by using continuous error detection only helps in reducing the errors because of error bursts that go undetected by the outer CRC code. Therefore, we are also essentially lower bounded by the same curve as the CRC-based scheme. The improvement is in terms of fewer branches (sprouts) required to achieve the same performance. This can be put more precisely through some analysis.
B.2 Lower Bound on Performance
In a purely CRC based scheme, let the event A j denote the probability that there are j channel errors, that the standard path is in error, and that this was not detected by the CRC at the end of the frame. Let the corresponding event in the CED-based scheme be A 0 j , i.e., the event A 0 j happens when the CED-based scheme is not able to detect the error in the standard path for some burst lengths at some times, and the CRC also fails to detect the error in the standard path. We need to consider the events for j = 1 and j = 2 only, because whenever there are more than two channel errors, we cannot capture it as discussed earlier. As in 15] the basic expression for the probability of error is P e = n X j=1 P(k = j)P ejj : (14) Here, k denotes the number of channel errors that occur during the transmission, and P(k = j) is the probability that there are j such errors. We also have P ej0 = 0 and P ejj = 1 for j > ( = 2). So the expression for the probability of error becomes P e = X j=1 P(k = j)P ejj + n X j= +1 P(k = j): (15) From the events A A channel error at t 0 causes a bit error at 2t 0 or at 2t 0 + 1 since we map two bits on to one channel symbol. Let us suppose that the bit error was at 2t 0 . Let Y be a random variable that represents the number of bits taken to detect an error for the arithmetic coder with redundancy from the position where the error occurred. We have from (2) that the probability that the error is not detected by n is P Y > 2(n ? t 0 )] = (1 ? ) 2(n?t0) : (17) Let (1) CED represent the probability that the error was not detected by the CED-based scheme for j = 1. Then, averaging over all t 0 , we have from (17) 
The probability of the event A 1 is the probability that the arithmetic coder is not able to detect the error and the CRC is not able to detect it either. Since we have observed that the error detection properties of CED are independent of the burst length, we have P(A 0 1 ) = (1) CED P(A 1 ). When j = 2, say the errors happened at times t 1 and t 2 . Then the detection position would depend on Y = min(t 1 ; t 2 ), where t 1 ; andt 2 are uniform in 1; n]. If we denote P(Y = k) by p k , then (2) CED , the probability of not detecting the error when there are two channel errors, will be A comparison of these terms can be seen in Figure 16 . Here j is the number of channel errors, and the gure shows the contributions due to the various A j 's to the lower bound, for the CRC-based and CED-based schemes. As mentioned earlier, we can see that the term due to more than channel errors is the predominant term; and therefore the lower bound in our case also is essentially the same as in the CRC-based scheme. This would be expected, as CED only detects the errors in existing paths, while the path generation algorithm in both schemes keeps only paths with two or fewer breakpoints, and hence introducing CED does not reduce errors due to throwing out those paths. But this still gives good performance without having to add a lot more complexity to keep paths with more than two breakpoints to achieve minor gains in performance. We carried out two di erent kinds of comparisons. If we use an 8-bit CRC and 16 bits redundancy for the arithmetic coder, then we would e ectively using a code that uses 8 bits more per frame than the CRC-based scheme, and would be of rate 0.954. With this code, we required only four branches and one sprout to achieve the same performance as the CRC-based scheme, and if we were to use the same number of branches as the CRC-based scheme, there was a modest performance gain of 0.5 dB. These results are shown in Figure 17 . The error rate shown is the block error rate, i.e., the probability that a frame was decoded incorrectly. Performance curves for B = 10 and S = 4 are shown for the CRC and CEDbased schemes, as well as for B = 4 and S = 1 for the CED-based scheme, which matches the CRCbased scheme. The lower and upper bounds for the CRC-based scheme are also shown for comparison. The performance gains seem to increase as SNR increases. One reason for this could be the following. At low SNRs, the predominant factor in the block error rate would be the cases where two break points were not su cient to capture all the channel errors. Since we also use only two break points, the errors in the CED-based scheme are also mainly due to this term. But as the SNR goes up, quite a few of the errors in the CRC-based scheme would be because the correct branch was replaced by some erroneous branch with a higher noise amplitude. The CED-based scheme is able to remove most of those errors by being able to detect those paths to be in error, on the y. Thus it gets more gains over the CRCbased algorithm at higher SNRs than at lower SNRs.
Since we use some redundancy for the arithmetic coder and some for the CRC at the end of the frame for path selection, comparison was also done with the CRC-based scheme assuming the same power per frame. In our case, the excess rate due to redundancy for the arithmetic coder was o set by lower energy per symbol so that the overall energy per frame would be the same as the CRC-based scheme. Simulations were carried out with ten branches and four sprouts as in 15] to see the gains, and also with four branches and one sprout to see the reduction in the resources required. These are shown in Figure 18 . Here we should also mention the fact that an e ective SNR-based comparison would be a pessimistic metric as far as performance of the CEDbased scheme is concerned. This is because of the following reason. The purpose served by the redundancy bits in a CRC-based scheme and the redundancy bits used up by the arithmetic coder are totally di erent. The CRC bits can cause errors if they fail to detect a burst of length more than n ? k. A purely CRC-based scheme cannot detect if a path is correct or wrong until the end of the frame. So, instead of a 16-bit CRC even if we were to use a 24-bit CRC, there would be performance gains because of the fact that more burst errors can be detected, but it would still be impossible to remove wrong branches on the y, and so there would be errors because of the fact that the correct branch was replaced by some wrong branch within the frame. On the other hand, if we were to distribute the 24 bits as 16 bits of redundancy for the arithmetic coder and 8 bits for the CRC, then many of the spurious branches will be removed by the CED-based scheme, and so many of the errors due to the correct branch being replaced will not happen. In addition, the strength of the CRC required would not have to be the same as in the purely CRC-based scheme to achieve the same performance, as many of the burst errors would have been detected by the CED-based scheme and the erroneous paths removed. Hence, there is a difference in quality between the purpose served by the redundancy bits in the two schemes, and the results would have to be seen with this in mind. We observe that even with a pessimistic comparison, the CED-based scheme achieves the performance of a purely CRC-based scheme that uses ten branches and four sprouts, with about six branches and one sprout, which is still a complexity reduction by a factor of two (see Figure 18 ).
C.1 Complexity versus Performance Gains
An important issue to be discussed is the complexity versus performance gains trade-o . The implementation of the algorithm described in Section V-A requires that all the paths in the trellis be saved so that it is not required to search through the trellis and locate all paths when we do error detection using the arithmetic decoder. Since many paths are agged down by the CEDbased scheme, the memory requirement for this is not very large. Typically, we need to store about 20 paths at worst and an average of 10 paths. This would have to be compared against the computational complexity required for a path search in a block-based scheme at the end of a frame, where the correct path might be found only after several paths have been invalidated by the path selection procedure as in Section V-A. On the other hand, at the end of the frame in the CED-based scheme, the correct path would be one of the rst few in the list as most of the wrong paths would have been removed by then. Memory requirements for the CRC-based scheme would include keeping track of the nodes where branches (sprouts) diverge and remerge from the standard path (parent branch). It has already been mentioned in Section II that the complexity incurred by having to perform the arithmetic encoding would be two shifts and adds per symbol and a register to store the state of the arithmetic encoder.
Another question to be considered is the frequency with which we have to check the paths out against the arithmetic decoder. It has been observed that on an average, every path visits the decoder once in every three or four instants. Since we know the maximum number of paths, we can include these many decoders in the hardware along with the DFE structures, and we can reduce the complexity in terms of allocations for branches and sprouts to be maintained.
VI. Conclusion
In this work, we have considered a new error detection scheme that performs well in di erent communication scenarios like ARQ-based communication systems, concatenated coding systems and reduced state sequence estimation for channels with memory. It has the novelty of lending itself to be physically combined with the source entropy encoder in a single device (i.e., the arithmetic coder). Signi cant gains of the proposed continuous error detection scheme over conventional block codes are demonstrated for both ARQ and forward error correction frameworks. These gains result from the ability of the new method to detect errors earlier than the end of the block and thus to be more responsive than block CRC-based schemes. In the reduced state sequence estimation framework, the new method gives us a means to allocate scarce resources in a better way by letting us know, on-the-y, which of the paths are wrong, thereby enabling reallocation of resources to more deserving candidates. We mention here that CED can be put to good use to improve throughput performance of transport protocols like TCP over heterogeneous networks, where early detection of an error can result in potentially more number of retransmits, thereby increasing the probability of successful reception over a fading channel. This is currently being veri ed. The scenarios presented here are in no way exhaustive and CED could nd applications in other frameworks also, like multi-user detection where there is a state space explosion. The goal of this work is to present the bene ts that communication systems can derive from using continuous error detection for complexity reduction and/or throughput enhancement.
