Abstract-The decision feedback equalizer (DFE) is an efficient scheme to suppress intersymbol interference (ISI) in various communication and magnetic recording systems. However, most cost-effective DFE implementations suffer from the phenomenon of error propagation, which degrades its bit error rate (BER) performance. This paper proposes a soft-threshold-based multilayer DFE (STM-DFE) technique to reduce the BER. It involves very low hardware overhead costs as compared to the conventional DFE. When applied to a practical Lorentzian channel and channels of different eigenvalue spread, the STM algorithm even outperforms the Ideal DFE (IDFE) system (in the IDFE, symbols are correctly fed back without propagation errors). Simulation results show that the proposed scheme can efficiently reduce the burst error length (BEL) as well as BER. Additionally, the hardware overhead to implement the STM-DFE algorithm is negligible compared with the conventional DFE. By using the concepts of Shanbhag and Parhi and of Yang, Wu, and Lai, we also propose the pipelined STM-DFE (PSTM-DFE) architecture for high-speed communication/storage applications.
In general, the DFE has better performance than the LE. Hence, it is widely employed in practical designs. One of the major drawbacks of the DFE is error propagation, which is caused by slicing errors. Much research has focused on analyzing error propagation [20] [21] [22] . Some approaches suggest modification of the decision devices to reduce the error propagation [6] , [23] . The right-hand side of Fig. 1 shows the generalized DFE architecture. Fig. 2 (b)-(d) shows these various types of decision devices that aim at the reduction of error propagation in DFE-based designs. Fig. 2(a) shows the conventional DFE scheme, in which the decision device is just a slicer. Fig. 2(b) shows the erasure scheme [23] . In the erasure scheme, if the detected signal is smaller than a given threshold value, 0 is fed back; otherwise, it works as a slicer. In [5] and 1053-587X/$20.00 © 2005 IEEE [6] , the authors proposed a threshold technique (TT) approach to detect the error events. The idea is that the past decision errors that reside in the feedback register tend to cause an offset at the slicer. We call this scheme the threshold technique decision feedback equalizer (TT-DFE). The TT-DFE makes use of this offset and employs an offset-threshold slicer to detect the presence of error propagation. When the offset events are detected, they may be corrected in the next symbol. Fig. 2(c) illustrates the TT-DFE scheme. Additionally, a fast simulation procedure in [6] is also derived to estimate the BER. One of the disadvantages of the TT-DFE is that design parameters in the TT-DFE are difficult to determine, and they depend on the channel model. The performance will be degraded due to the nonoptimal design parameters. Hence, it limits the application of the TT-DFE to channels that are stationary and known in advance.
Inspired by the TT-DFE [6] , we propose the soft-thresholdbased multilayer decision feedback equalizer (STM-DFE) to enhance the DFE by modifying the decision device. The key idea is to define a reliable/unreliable region. When the received signal is in the region, the STM-DFE does not make the decision instantly. Instead, it forwards the log-likelihood ratio to the next symbol. The process can be performed continuously until the output of equalizers becomes reliable or the last stage is reached. Then, the decision on the transmitted data is made at the same time. Fig. 2(d) illustrates the block diagram of the STM engine. In this paper, the threshold value of reliable/unreliable regions is analyzed in closed form. The STM algorithm can adjust the threshold value dynamically, which makes full use of the information from the output of the equalizers. It does not just detect it by a simple threshold device. The simulation results show that the BER and BEL of the proposed STM algorithm are better than those of the TT-DFE [6] and the conventional DFE when being applied to a Lorentzian channel [7] , as well as in different eigenvalue spread channels [8] . Compared with the ideal decision feedback equalizer (IDFE), the STM-algorithm also has better performance.
In VLSI implementations, the least mean squared (LMS) algorithm is a well-known technique to update the weight coefficients of ADFE in a cost-efficient way. We apply the LMS algorithm to update the weight coefficient of the STM-DFE. However, the fine-grain pipelining of the ADFE is a difficult problem for high-speed applications due to the decision feedback loop (DFL). Similarly, it is also difficult to pipeline the STM-DFE VLSI architecture. The relaxed ADFE techniques [10] and predicted parallel branch slicer scheme (PPBS-ADFE) [13] solve the aforementioned problem with speed limitation. In this work, we also suggest applying the relaxed ADFE [10] , [13] to derive the pipelined STM-DFE (PSTM-DFE) architecture. Thus, the speed performance can be enhanced.
The rest of the paper is organized as follows. Section II describes the channel model and reviews the recent work of [6] . In Section III, we present the two-layer STM algorithm and extend it to a higher order layer. The numerical analysis and simulation results are discussed in Section IV. The simplified architecture of the STM-DFE is given in Section V. The pipelined architecture is also proposed for high-speed applications. Finally, we conclude our work in Section VI.
II. REVIEW OF THE TT-DFE APPROACHES [6] In this section, we review the recent work of the TT-DFE [6] , which is a simple to reduce the error propagation of a DFE. The communication system adopted in this paper is shown in Fig. 1 , in which the notations are defined as follows:
• is the transmitted data. In general, are assumed to be independent with zero-mean.
is the detection of .
• is the equivalent discrete-time channel impulse response, which is linear and bandlimited and shown on the left-hand side of Fig. 1 .
• is Additive White Gaussian Noise (AWGN).
• is the channel output. It is considered to be corrupted by at the front end of the receivers.
• is the output of the equalizers and can be expressed as (1) where • denotes the th tap-weight of the feed backward filters (FBFs).
• denotes the th tap-weight of feed forward filters (FFFs).
• is the number of taps in the FBF.
• is the number of taps in the FFF. In most DFE implementations, the receiver must determine the value of data instantly and then feed back the tentative decision. When the decision is wrong, the error will propagate throughout the feedback tap delay line. Moreover, it will affect the successive symbols to wrong decisions. The idea of the TT-DFE is as follows. When decision error arises at time , it will cause a large offset the next time (
). By detecting the large offset, the error event at time can be detected and recovered. We reformulate the TT-DFE algorithm as follows [6] 
where , and are design parameters. The TT-DFE is very suitable for low-cost VLSI designs. The hardware overhead is just two threshold detectors. The drawback of this scheme is that the design parameters are difficult to determine, and these values are different case by case, especially when is not around .
III. PROPOSED STM ALGORITHM
To derive the STM-algorithm, three assumptions are made in this paper.
A1) The transmitted data sequence is independent with zero-mean. A2) The noise is AWGN with zero-mean and variance . This assumption is commonly used in many mathematical derivations of common systems [24] [25] [26] [27] [28] . A3) For the convenience of our discussions, the detections of the transmitted data prior to are assumed to be correct when we want to make the decisions of these transmitted data from time to time .
A. Two-Layer STM Algorithm
In this section, we first derive the two-layer STM algorithm. Then, we extend it to a higher order layer. For clarification of the representation, we assume that the date sequence is . It is trivial to extend the data for higher order constellations. Let [applying Assumption (A2)] at time ; the log likelihood ratio (LLR) of and can be expressed as LLR (4) The decision of errors may happen by hard decision when is close to 0, i.e., LLR . As a result, the unreliable region should be set to near zero value. Let be transmitted, then we have , which is depicted in Fig. 3 (a) and (b). For the detection scheme of a slice, when is detected as 1; otherwise, it is set to 0. As we can see, when is close to 0, the decision is very unreliable. Fig. 3(b) shows the detection scheme of the two-layer STM algorithm. is denoted as a threshold value between the reliable and unreliable regions. When is said to be in the unreliable region, we do not make the decision but forward the LLR to the next symbol. We denote the operation of the STM-algorithm in Stage 1 at time and in Stage 2 at time , as shown in Fig. 3 . For the two-layer STM-algorithm, the last stage number is set to 2. Therefore, we must detect two symbols simultaneously in Stage2, and then, the stage changes into Stage 1. First, we explain the effect of as follows.
• is Larger: When , an error arises from a slicer. If we want to detect more errors, must be set at a larger value. As the threshold value is extended to be as shown in Fig. 3 , both regions and ' are the incremental unreliable regions. At time , a slicer has the correct decision when is in region ; on the contrary, the decision by a slicer is wrong when is in region '. Since , region is smaller than region . Thus, if we enlarge to detect more errors, the more correct decisions by a slicer, i.e., region , are sent to Stage 2 simultaneously and detected again. Unfortunately, the distance of states in Stage 2 is closer than that in Stage 1. That is, the detections tend to be erroneous when they are detected in Stage 2.
• is Smaller: For the smaller , the less correct decisions by a slicer are set to Stage 2, but fewer errors will be detected. Therefore, must be determined appropriately.
1) MAP Detection in Stage 2:
The STM algorithm has the same decision scheme as the slicers in Stage 1 when is reliable. Otherwise, the process of the STM algorithm changes into Stage 2. Next, we show how to detect the transmitted data and minimize the error probability in Stage 2. Then, we apply the results to figure out the optimal threshold value . In Stage 2, we define
where is independent of since the value of is not determined at time . In (5b), we apply the assumptions A2) and A3). We need to detect and after and are figured out. From (5b), when is and , the value of states are , and , respectively. We illustrate the states in Fig. 3(b) . The conditional probability of and on and at time can as in (6) , shown at the bottom of the page.
Applying the property that and are independent conditional on and in (6a), (6a) can be reformulated as (6b), which is proportional to (7) Equation (7) is used to determine the boundaries of decision regions to minimize the probability of decision error in Stage 2.
2) Boundaries of Decision Regions in Stage 2:
We will focus on because we can just change the sign of for Case . The boundaries between regions I-IV are labeled as , and in Stage 2, as shown in Fig. 3 . When is in regions I-IV, is detected as , and , respectively. We derive the boundary value in the Appendix. To compute the probability of decision errors in Stage 2, three cases of boundary problems must be noted and described as below. : This case is the opposite of Case 2), as illustrated by Fig. 4(b) . When , and . As shown in the above discussion, the STM-algorithm utilizes the information from , which is not used in [6] :
• The information of is used at time , which can make decisions more correct in Stage 2.
• The boundary of decision regions in Stage 2 is dynamically adjusted to be optimal based on and not a given constant value in [6] . 
B. Optimal Threshold Value for Minimizing BER
As in the previous discussion, the threshold value of the reliable/unreliable region is significant in the reduction of errors. In the two-layer STM algorithm, is chosen to maximize the number of the reduced decision errors by using the STM engine at times and . In the discussion below, we assume that is in the unreliable region at time and that is transmitted. By symmetry, can follow the same way.
1) Analysis of Decision Error by Using Slicers: At time and
, the expectation of decision errors by using a slicer is the summation of the two following cases.
Case 1) The slicer has the correct decision at time . Therefore, only one error may happen at time , and the expectation of decision errors is (8) where noise density function is the normal distribution with zero mean and variance by assumption A2), i.e., .
Case 2) In this case, the slicer makes the wrong decision at time . Then, the error will propagate and cause an offset in next symbol. Applying Assumption A3), the sum of the expectation of decision errors at time and is (9) Therefore, the expectation of total decision errors made by a slicer at time and is the summation of (8)and (9) and is denoted as (10) 2) Analysis of Decision Error by Using the STM-Engine: Next, we will figure out the expectation of decision errors by the STM engine. In addition, we assume , and is in the unreliable region. Case 1) When is , the STM-engine makes one or two bit errors if it determines to be and , or , respectively. The expectation of decision errors can be formulated as (11) Case 2) When is transmitted at time , the STM-algorithm makes one or two bit errors if is chosen as (State I) and (State IV), or (State III), respectively. The expectation of decision errors is (12) The expectation of total decision errors by the STMalgorithm is (13) 3) Solution of the Optimal Threshold Value: From (8) and (13), the STM-engine reduces the decision errors by , which is a function of . When satisfies (14) the STM engine achieves the maximum reduction of errors. For , the optimal threshold value must satisfy
For must satisfy (16) where . The threshold of the reliable/unreliable region is a function of and . For VLSI implementations, must be updating adaptively along with and . Fortunately, can be approximated as . We show this with numerical results in Sections IV-A and C.
C. Higher Order Layer of the STM Algorithm
In the TT-DFE, the error correction scheme is based on two successive symbols. It is difficult to extend this scheme to more than two successive symbols. However, the proposed STM-Algorithm is easy to extend to higher layers. For the derivation of the -layer STM algorithm, we follow the derivation of the two-layer STM algorithm. In Stage , we define (17) The detection at Stage can be formulated as (18) Equation (18) can be recursively computed, and only needs to be calculated in Stage . Other parts can be referred to log-likelihood. They are calculated in Stages and . Moreover, in implementing an equalizer, the performance of the BER as well as the hardware cost must be taken into account. To attain higher performance of the BER, one way is to extend the STM-DFE to a higher order layer. We depict the simple concept in Fig. 5(a) . However, the number of states is exponential to the number of stages. Extending to the -layer STM-DFE, the state number of the trellis diagram will expand up to . This places high demand on the hardware. We can limit our search range in higher probability regions to reduce the state number. Take the three-layer STM for example; when is around line [see Fig. 5(b) ], then the probability of state 1 and 2 is higher than states 3 and 4. In this case, we set to be or . Furthermore, in Stage 3, we just begin from states 1 and 2 in Stage 2 (solid line in Fig. 5(b) ). The other two cases can follow in the same way. Therefore, the number of states always remains in 4, where they do not grow exponentially with the number of stages.
In the STM algorithm, we can find the optimal threshold value off-line and approximate it with a polynomial function so that the threshold value can be solved in running time. To solve the threshold, it is very complicated to figure out the threshold values in all stages simultaneously. Our scheme is to find the threshold values and then use the value to figure out the next . We take the three-layer as an example. In (18) to find the boundaries of the decision regions in Stage 3. Next, using the method in Section III-B, we apply these boundaries and the threshold value in Stage 1 to figure out the threshold values of Stage 2. In Stage 2, there are three thresholds around lines , and , respectively. Due to the symmetry, the thresholds near lines and are equal. To list these equations is lengthy and tedious. We do not list the equations but show them by simulations. To compute the threshold values of the -layer STM-algorithm, the steps are described as follows.
Step 1) Initially, is set to 1.
Step Step 4) Go to Step 2 when .
D. Stage Transition of the STM Algorithm
Initially, the operation of the STM algorithm is in Stage 1, until the output of equalizers becomes unreliable, at which point, the stage changes into Stage 2; otherwise, the stage remains in Stage 1. This process can be continued. That is, if the received signal is unreliable in Stage , the stage changes into Stage ; otherwise, the stage goes into Stage 1. I addition, we can limit the last stage number to a constraint constant . We define the -layer STM algorithm as its maximum stage number being . Note that when one symbol is in the end of the feedback tap delay line, it will have no affect on the next output of the equalizers; therefore, the maximum stage number must be equal to or smaller than the tap number of FBF. Fig. 6 shows the Stage transformation of the four-layer STM algorithm. When the output of equalizers is in the unreliable region in Stage , we do not make the decision but forward the log-likelihood ratio to Stage . We continue this process until the decision is reliable or the stage is 4. Then, we must detect the transmitted data, and the stage changes into Stage 1. 
IV. NUMERICAL ANALYSIS AND SIMULATION RESULTS
In this section, we show the threshold value related to the first tap weight of the STM-DFE by the numerical result. To appreciate the effect of the STM-DFE, we compare it with the conventional DFE, TT-DFE, and IDFE for BER and BEL by simulations. Fig. 7 shows that all thresholds are larger under low SNR. This is very clear when the noise is larger; the log-likelihood ratio of is close to 1 in the larger region. That is, there is a larger unreliable region in low SNR. When , the threshold value is close to within a large range of high SNR. For case , the threshold value is close to one above . Furthermore, when the output of the equalizers is in the unreliable range between state 2 and state 3 , we may make 2-bit errors if the decision to be in state 2 or 3 is wrong. However, we make a 1-bit error in other cases. On the other hand, the distance of states 2 and 3 will become longer as becomes larger. This is the reason why when , the threshold value is larger.
A. Determination of Threshold Value

B. Comparison of BER
The channel model used is a Lorentzian channel at user density [7] , which is common used in magnetic recording. All simulations have an average of 20 runs, and 100 million bits are used in each run. Designed equalizers are targeted at SNR db. We assume the data fed back into the FBF are always correct in the case of the IDFE in order to measure the SNR degradation of DFE due to the error propagation. As shown in Fig. 8 , the proposed STM-DFE has better performance than the DFE and TT-DFE. Moreover, when the SNR is larger than 12.5 db, the STM-DFE performs better than the IDFE. This is true even if the IDFE does not suffer from error propagation; it has no ability to detect and correct errors, which the STM-DFE can do, especially at higher SNR. In the next scenario, the channel model is [8] . We use this to show the effects of the eigenvalue spectrum of the signal on the STM-algorithm. The number of taps in FFF and FBF are 7 and 5, respectively. Again, the number of errors corrected by the STM algorithm is larger at high SNR. The proposed algorithm can achieve the best SNR performance in two cases, as shown in Fig. 9 . For the case of the smaller eigenvalue spectrum, the STM algorithm can outperform the IDFE in both low and high SNRs. For the case of , the STM-algorithm can achieve better results as the SNR becomes larger, and it has better performance than the IDFE from SNRs above 10 db.
C. SNR Degradation Due to the Approximation of Threshold Value
The configurations of the scenario are the same as that for the Lorentzian channel. As illustrated in Fig. 10 , the theoretical analysis of the optimal threshold is consistent with simulation results in all cases. Furthermore, in order to figure out the threshold value, we must solve (15) and (16) . However, in VLSI implementation, it is very complicated to solve these equations. The threshold is the function of and . In general, we can approximate the threshold value via a polynomial function under the least square criterion. Then, the multipliers and adders are enough to produce the results. As the order of polynomial becomes higher, we can have better approximations. However, it will cost more in terms of hardware. As previously mentioned in Section III-A, we can approximate the threshold value by , which is the limited value in the boundary problems, as discussed in Section III-A. Therefore, the hardware cost to calculate the threshold value is just one adder and one square. As shown in Fig. 10 , the curve is flat near the optimal value, and is close to the optimum. Therefore, the approximation is very good in each case.
D. Comparison of BEL
The simulation setup is the same as Fig. 8 . IDFE almost has no burst errors in our simulations. Fig. 11 shows that the proposed STM-DFE always achieves lower error probability than DFE and TT-DFE in every burst error length. In other words, the proposed algorithm performs better in reducing the error propagation. Moreover, the STM-DFE has the least probability of single error, even when the BER of the STM-DFE is larger than that of the IDFE. This is because the STM algorithm can efficiently correct the error, which is in the unreliable region in Stage 1. In addition, the three-layer STM-DFE has better performance than the two-layer STM-DFE for both BER and BEL. 
V. STM-DFE VLSI ARCHITECTURE
A. Algorithm Simplified for Low-Cost Implementation
The simplified two-layer STM-DFE architecture is discussed in this section. We show the prototype of the two-layer STM engine in Fig. 12(a) , which is the modified decision device. Two operation modes of the STM engine are described as follows:
1) Slicing Mode (SM): In this mode, the STM engine is in Stage 1. The output of the equalizers is tested. If it is determined to be in the unreliable region by the threshold device, the STM engine works as a slicer and outputs the decision. Otherwise, the decision is delayed and the STM engine outputs a log-likelihood ratio. The operation mode changes into the delay decision mode (DDM).
2) Delay Decision Mode (DDM):
In this mode, the STM engine detects two transmitted data and changes into the SM. Next, we implement the STM algorithm in a simple way. As mentioned above, a slicer is the only required module in the SM. For the clarification of presentation, we assume that the transmitted data is 2-PAM. Based on the STM algorithm, are chosen to maximize (7) . Applying the strict convex property of log function, can be chosen to maximize (19) As a result, to find the maximum value of (19) is to minimize (20) Therefore, the variance of error is not involved in determining . To figure out the solution, and are taken into (20) . Next, the smaller of each are compared to obtain the smallest. Based on (20), we require six square operations, 14 adders, and three comparators 
Case 2) For (22) Moreover, the term can be removed from (21) and (22), and is added into (21), which is given by Stage 1. Consequently, when for case or for case should be selected as 1. Otherwise, is set to . To determine , the two smaller ones of each of the cases above are compared with each other.
B. VLSI Architecture of the STM-DFE
As shown in the discussion in Section V-A, we need only two square operations and five adders. The STM-DFE controller can be implemented by a simple two-state finite state machine (FSM). We show the prototype of the two-layer STM engine architecture in Fig. 12 . In [14] and [15] , the square operations are designed to be smaller in area and higher in computing speed than one multiplier. In summary, the major hardware overhead is only three square operations, as well as six additions and one switch. Moreover, the STM-DFE will increase the latency in DFL.
, and denote the latency of square, adders, one-bit full adder, and switch operations, respectively. As shown in Fig. 12 , the increasing latency of the two-layer STM-DFE is . Generally, the area/latency of adders and square operators are dependent on hardware architectures. In this paper, we implement square operators and adders with array multipliers and ripple adders, respectively. For a wordlength and are and , respectively. Therefore, is about twice . To estimate the overhead due to increased latency of the STM engine, we apply the STM algorithm to the Lorentzian channel. The tap number is set as 7 to achieve BER at SNR db. As a result, the latencies of the STM engine and conventional DFE are and , respectively. Thus, the latency of the STM engine is about twice that of the conventional DFE. As a result, the one-stage pipelined STM can operate at the same clock rate as the conventional DFE (the one-stage pipelined STM will be discussed in the next section). This will require two adders and two 2-to-1 multiplexer overheads, as shown in Fig. 14 . Table I summarizes the hardware complexity.
C. Pipelined STM-DFE (PSTM-DFE) Architecture
In practical equalizer designs of high-speed application systems, the computing speed is one of the main issues. Delay LMS (DLMS) [10] is a very low-cost scheme to reduce the critical path and speed up the rate of computing. However, the performance, such as SNR or convergence rate, will be degraded severely by employing this method. In our approach, we employ the look-ahead scheme of [10] and [13] to pipeline our STM-DFE equalizer. Fig. 13 shows the two-stage look-ahead. We have two extra delays to pipeline the feedback loop. To pipeline the STM-DFE, two modifications are made as follows:
1) Move the slicer to the front of the multiplexers: If we substitute slicers with STM engines, the overhead will increase dramatically. Hence, we first move the slicer to the front of the multiplexers. Next, the slicer is substituted by the STM engine, as in Fig. 14 . The overhead of the STM-DFE is the same as that of the conventional DFE. 2) Look-ahead of tap-weight: The signal fed back into the first tap of FBF is unknown in the delay decision mode; Fig. 13 . Two-stage look-ahead architecture of the pipelining DFE [13] . WUF and WUB denote weight updating blocks for FFF and FBF, respectively. therefore, it will be useless for the look-ahead of . For the two-stage look-ahead, we need to expand and and not and . The overall system diagram of the STM-DFE is depicted in Fig. 14 . The critical loop of the STM-DFE is lined in dashes. Table II summarizes the total overhead of the PSTM-DFE and the conventional DFE.
VI. CONCLUSION
This work proposes a novel and effective STM-DFE algorithm to enhance the DFE performance. At the algorithm level, the threshold of the reliable/unreliable region is analyzed in closed form. At the architecture level, the STM-DFE is very suitable for VLSI design. Its hardware cost is very similar to that of the simple DFE. In summary, the proposed STM algorithm can outperform the conventional DFE low hardware overhead. The PSTM-DFE is also suitable for high-speed communications and storage applications.
APPENDIX DERIVATION OF BOUNDARY VALUES
We first derive the value of the boundary . Based on (7), must satisfy (23) Applying Assumption A2), (23) can be reformulated as (24) By simplifying (24), we have (25) Therefore, the boundary is equal to . Boundaries and can be derived in the same way.
