This paper presents a low complexity partially folded architecture of transposed FIR filter and cubic B-spline interpolator for ATSC terrestrial broadcasting systems. By using the multiplexer, the proposed FIR filter and interpolator can provide high clock frequency and low hardware complexity. A binary representation method was used for designing the high order FIR filter. Also, in order to compensate the truncation error of FIR filter outputs, a fixed-point range detection method was used. The proposed partially folded architecture was designed and implemented with 90-nm CMOS technology that had a supply voltage of 1.1 V. The implementation results show that the proposed architectures have 12% and 16% less hardware complexity than the other kinds of architecture. Also, both the filter and the interpolator operate at a clock frequency of 200 MHz and 385 MHz, respectively.
Introduction
In order to administer limited frequency resources and offer higher quality service, the existing analog TVs have been replaced with digital TVs (DTVs) in many countries. For this transition, the Advanced Television Standard Committee (ATSC) standards [1] have been proposed. The DTV receiver for the ATSC cable broadcasting system is required. This receiver has been used in many fields such as HighDefinition TV (HDTV), set-top boxes, VCR/DVDs, and PDAs. In particular, the DTV tuner is an important block in the DTV receivers and is a major power expender. Therefore, low-cost DTV tuner Integrated Circuits (ICs) for the DTV systems are required to select the desired frequency and down-converting for the intermediate frequency. The digital FIR filter and interpolator are the most fundamental Digital Signal Processing (DSP) components for the CMOS DTV tuner.
The FIR filter has the advantages of stability and easy implementation, but the large number of filter taps can lead to excessive hardware complexity. Therefore, folding techniques [2] , [3] have been proposed as a means of reducing the hardware complexity. A significant advantage of the folded FIR filter architecture is that it can lead to reduce the hardware complexity compared to the corresponding unfolded one. The sampling frequency of the FIR filter output is generally increased by the interpolator. The interpola- tor is generally used to calculate new samples at arbitrary time instants in between existing discrete-time samples. A polynomial-based interpolation filter can be efficiently implemented by using Farrow architecture [4] . The interpolation methods using polynomials have been investigated in [5] , [6] . Among others, spline interpolations are very useful methods for smoothing out noisy data [7] . This is because they offer a good tradeoff between simplicity and efficiency in controlling the degree of smoothing.
In this paper, we present a low-complexity FIR filter and interpolator for ATSC DTV tuner. Also, a novel architecture of the FIR filter and interpolator are proposed with the aim of reducing hardware complexity and improving interpolation efficiency. This design has a lower number of gate counts and a shorter critical path delay in comparison to the conventional architectures.
In Sect. 2, a comprehensive introduction of the ATSC 8-VSB system is given. The proposed FIR filter and interpolator are presented in Sect. 3. In Sect. 4, the results and performance comparisons are given. Finally, conclusions are presented in Sect. 5.
ATSC-8VSB System

ATSC 8-VSB Transmitter
The 8-VSB (Vestigial Side Band) is the modulation scheme used for USA's digital television broadcasting. It supports data transmission rates of 19.39 Mbps with 6 MHz. The transmitter of the ATSC 8-VSB is shown in Fig. 1(a) , in which the MPEG-2 Transport Stream (TS) comprised of 188 bytes, is received with data rates of 19.39 Mbps. After the transport streams of 187 bytes, except for sync bytes, are dispersed as energy, Reed-Solomon (RS) codes are used to encode the TSs. After interleaving the data encoded with the RS codes, the convolution encode is carried out. The convolution encoded bit is modulated with a symbol composed of eight levels, and then transmission symbols are generated. The transmission symbol is a transmitted 8-VSB signal with a bandwidth of 6 MHz after being filtered at the VSB modulator.
ATSC 8-VSB Receiver
The 8-VSB receiver is composed of a phase tracker, equalizer, Intermediate Frequency (IF) filter, and synchronous detector. Also, it includes a convolution and RS decoder Copyright c 2011 The Institute of Electronics, Information and Communication Engineers for error compensation. The 8-VSB receiver is shown in Fig. 1(b) . After the tuner selects a desired channel and filters an intermediate bandwidth from the IF filter, the tuner searches a carrier wave frequency. After the synchronous detector searches for a synchronous signal and clock, the equalizer eliminates any multi-path interference. Next, the phase tracker compensates for any phase error left behind after a signal passes through the equalizer.
Specification of ATSC 8-VSB Tuner
According to the ATSC DTV standards, DTV information including video and audio must first be compressed into MPEG-2 format. Then, the digital data are also modulated with 8-VSB and up-converted to the desired Radio Frequency (RF) channel for transmission. The system specification for the 8-VSB tuner is shown in Table 1 . The channel bandwidth is 6 MHz, which is the frequency, when the zero-IF signal is up-converted to a low-IF signal.
For practical communication systems, the desired channel usually receives multiple interferences. The interferences generate distortion components in the desired channel due to the unavoidable nonlinear characteristics of the devices used during the implementation of the TV tuner. As 
Table 1
System specification for 8-VSB tuner. a result, the power level of those interferences channels can be 40 dB higher than the desired channel. In addition to the linearity requirement, a sharp selection ratio is necessary to attenuate the interference channel enough at the output of the tuner so that the analog-to-digital converter (ADC) in the demodulator is not saturated. In the worst case scenario, the tuner has to provide over 50 dB attenuation for the adjacent channel. Another important specification listed in Table 1 is the image rejection ratio. Because the final intermediate frequency is 44 MHz in this specific application, the image channel is 88 MHz away from the desired channel. Therefore, the required image rejection ratio is usually over 60 dB. This image rejection requirement is an important factor for the design of the DTV tuner.
FIR Filter and Interpolator Design for DTV Tuner
As shown in Fig. 2 , the DTV tuner consists of a digital FIR filter, interpolator, up-converter, and analog circuits. In this section, we present the methods for designing the digital FIR filter and interpolator.
Modeling of FIR Filter System
The digital FIR filter is a system in which each output sample is the sum of a finite number of weighted samples of the input sequence. The effect of the Linear Time Invariance (LTI) system on the magnitude and phase of the input complex exponential signal is determined by the frequency response H(e jω ). If the input is x[n] = Ae jϕ e jωn , then by using the polar form of H(e jω ), the result below is given:
The notation |H(e jω )| is referred to as the gain of the system. In order to design the FIR filter, the type of filter and coefficients of the transfer function have to be determined.
Next, it is required that an in-out bit of the FIR filter is considered and implemented by analyzing the quantization of the coefficients. That is, the fixed-point simulation for the FIR filter is required. For the fixed-point simulation, it is necessary to extract the coefficients that meet the specification of the FIR filter by using a MATLAB FDA (Filter Design Analysis) tool as shown in Table 2 . Since the bit width of the defined filter output is 12-bits, the coefficients extracted by using the FDA tool need to be converted to proper values in order to set the output bit of the filter to 12-bits. Figure 3 shows the frequency response that uses proper coefficients. The converted filter coefficients have been selected with the lowest quantization error. Table 3 shows the coefficients lookup table of the proposed FIR filter, in which the index k and coefficient h(k) represent the coefficient number and coefficients generated using MAT-LAB FDA tool. h binary (k) is binary coefficients which was converted from the decimal coefficients h(k). h f ixed−point (k) is 16-bit coefficients which has been obtained using fixedpoint simulation and meet the specification of digital FIR filter in Table 2 . To get the h f ixed−point (k), h binary (k) was used as parameters in the fixed-point simulator.
Low Complexity FIR Filter Design
With respect to the FIR filter in the DTV tuner, low complexity is the primary goal for minimizing the hardware size and power consumption. Therefore, various low area reduction strategies have been applied. To reduce hardware complexity, several techniques such as the filter design with non-multiplier, Canonical Signed Digit (CSD) representation of coefficients, and folding techniques are used.
The FIR filter with non-multiplier can be implemented by using the shift and add operator. For example, a multiplied B with A=0.11011 can be expressed as
, where is the right bit shift operator. After the filter coefficients are obtained, it is possible to find the correct activated bit. Therefore, effective arithmetic can be carried out by using the required quantity of shift and adding operators for coefficients.
It is generally well known that Common Subexpression Elimination (CSE) methods based on CSD coefficients can reduce the number of adders required for the multipliers of FIR filters. CSD methods have been used to minimize the non-zero bits. However, CSD methods are not appropriate for the high order FIR filter design. As shown in Table 2 , the desired number of FIR filter taps is 138. In [8] , a CSE algorithm using binary representation of coefficients is presented for the implementation of a higher order FIR. The FIR filter using this algorithm has a lower number of adders than that of the CSD-based CSE method. The CSE method, which is also called Binary Sub-expression Elimination (BSE), is used to eliminate a redundant binary Common Sub-expression (CS) that occurs within a coefficient. In Fig. 4 , a binary number can be formed with three terms, which are [101], [1101] , and [100001] . It is assumed that x 1 is an input signal. These CSs can be expressed as
If more than one CS occurs between a coefficient pair, the CSs can be grouped together to reduce the redundant arithmetic. As shown in Fig. 4 , the number of same CS is two and these terms are composed of same portions named D, E, and F, respectively. Table 4 Number of grouped common sub-expressions of FIR filter (138 taps).
In Table 4 , the occurrence frequency of a CS is defined as the number of the same CS being reused or repeated for the filter coefficients. The number of grouped CSs for a binary representation of coefficients is smaller than the CSD representation of coefficients. Therefore, the binary representation of coefficients is more favorable for reducing the adders in the filter.
Overall, the FIR filter architecture is a transposed form realized from using thirteen CSs, which are Fig. 5 . The input signal x 0 , output signal y, thirteen CSs (x 1 to x 13 ), and convolution computation parts (h 0 * x to h 69 * x) with a filter coefficient are placed. The accumulation part is composed of a symmetrical structure because the filter has linear phase characteristics. In the common sub-expression part, the CS [101] can be expressed as mark 2. This expression is meant to be shifted two bits to the right. Similarly, the CS [10101] can also be expressed as mark 4 because it is shifted four bits to the right from the CS [101] . Each CS is connected to other filter coefficients for convolution computation. Finally, the processed CSs are accumulated and delayed at the accumulation block. This architecture can be designed without any extra multiplier. 
Interpolation Filter Design Using Spline Function
Spline-based interpolation is a convolution-based interpolation where the interpolation kernel is a piecewise polynomial generated by a B-spline. It is composed of two operations: a preliminary iterative filtering to get the spline coefficients and a mixed discrete-continuous convolution that generates reconstructed samples at the interpolation range. Usually, B-splines of three degree (cubic) are preferred, as they provide a sufficient quality and acceptable computational load. As shown in Fig. 6 , the cubic spline interpolation is a piecewise continuous curve with continuous first and second order derivatives. A third degree polynomial is constructed between each point. In Fig. 2 , it is assumed that the samples passed through the FIR filter are y(nT s ). After the FIR filter, samples y(nT s ) are taken at uniform in-tervals, T s . The samples at correct symbol timing, t = kT , is interpolated from the samples y(nT s ) by using an interpolator. Based on the interpolation theory, the value of a reconstructed signal y recon (t) can be expressed as
where h(t) is the interpolation function. In cubic B-spline, Eq. (6) can be substituted for Eq. (5). That is, Eq. (6) can be changed into
where β 3 is the cubic B-spline function, c(nT s ) are B-spline coefficients, and μ k is the fractional interval [9] . Assuming μ k = 0, coefficients can be determined by using Eq. (6)
With the aid of the z-transform, the equation can be expressed as [9] :
where Fig. 9 , there are two parts, which are the IIR filter and approximate FIR filter [10] . In the IIR filter, the Bspline coefficients are determined by using input samples y(nT s ). In the Farrow interpolation filter, the reconstructed samples y recon (t) are calculated. The Farrow polynomial approximations [10] . According to the piecewise polynomial model, the generation of reconstructed samples at an arbitrary position is possible. The cubic polynomial degree in the Farrow structure is represented by a 4 × 4 matrix. In the case of cubic B-spline interpolation kernel, the reconstructed samples are generated by the following equation [11] :
An IIR filter has two poles at p 1 = −2 + √ 3 and p 2 = 1 p 1
. A preliminary iterative filter is divided into two filters as follows [12] :
H 1 is a 1 tap IIR filter and H 2 is an approximate FIR filter, respectively.
Proposed Partially Folded FIR Filter and Spline Interpolator Using Sub-Filter
Proposed Partially Folded FIR Filter Architecture
Folding techniques have been proposed as a means of reducing the hardware complexity. The FIR filters are ideal candidates for folding since they are a repetition of multiplication. A significant advantage of the folded FIR filter architecture is that they lead to reduced hardware complexity compared to the corresponding unfolded schemes and the clock skew problem does not exist. Also, combined folded and unfolded filters are much more efficient compared to the full folded filters. The partially folded filter is an intermediate form between the folded and unfolded form of the filter featuring higher throughput than the fully folded, and requiring less hardware than the unfolded. In this paper, a partially folded FIR filter design in a transposed form is considered by using 6-to-1 multiplexer (MUX) instead of a delay register. By using 6-to-1 MUX in the multiplier block as shown in Fig. 7 , it is expected that the total area of the FIR filter can be reduced than the structure using the delay register [12] . This architecture is composed of 23 stages including 23 MUXes, which select filter coefficients according to the control signal, because the number of unfolded filter taps and folding factors is 138 and 6, respectively. The advantage of this architecture is that it can select coefficients without any extra delay operation compared to the conventional folded architectures. Therefore, it can lead to a much reduced hardware complexity. The data flow of stage 1 is shown in Table 5 These processes progress until all the processed values in the multiplier block are fed to the entire register. At this time, the control signal of 2-to-1 MUX becomes "1" and the value of d 6 at stage 2 is fed to the adder at stage 1. From this process, the first clock cycle to obtain the output is finished. However, the processing to obtain the other outputs, six clock cycles are required.
Proposed Cubic B-Spline Interpolator Using SubFilter
In this section, Cubic B-spline interpolator architecture using sub-filter is presented. The disadvantage of the conventional cubic-B spline is the increase of errors at a rapid slant. This leads to a deteriorated performance of the filter. As shown in Fig. 8 , it is possible to estimate the samples by using the spline polynomial after going by t. At this time, the dashed line placed on high is more deteriorated in comparison to the original signal that indicates a solid line. In order to decrease the errors, it is necessary that the sample closed to the original curved line should be obtained if a sample at y 0 can be moved to a position that has a sample value of y 0 + α. By moving α up or down, the performance of interpolation can be improved. A movement α can be expressed as a form of the sub-filter, as shown in Fig. 9(a) . Also, the overall spline interpolation architecture can be divided into three parts by Eq. (10): a composed IIR filter, an approximate FIR filter, and a cubic B-spline reconstruction filter. In general, the Farrow architecture of the continuous delay control has been widely used in designing the interpolation filter [4] . This structure has the disadvantage of increasing the constant multiplier and delay according to the filter coefficient. However, the cubic B-spline interpolation architecture has six constant multipliers, twelve adders, and three delay devices. Figure 13 shows the frequency characteristics and phase delay of spline interpolation filter using subfilter, in which the approximate FIR filter system is modeled according to the specifications in Table 6 . The fixed-point simulation is carried out for the approximate FIR filter design. Figure 11 shows the magnitude response of approximate FIR filter. To verify the efficiency of the cubic B-spline filter, a sub-filter can be modeled. First, the step size known to affect the steady state error is considered. A minimum step size is set to 0.0035 because it is a moment that has a optimal state error. After deciding the minimum step size, the initial values are set to 0 and added or subtracted by 1 of initial step size value. The results generated by doing this are applied to subfilter shown in Fig. 9(a) . The c(nT s ) generated from Fig. 9 (a) is connected to the Cubic B-spline interpolation Farrow architecture for reconstructed samples as shown in Fig. 10 . If final pass-band ripple error generated after passing through the architecture in Fig. 10 is reduced, the step size is set to half and the previous processes are repeated. These processes are continued until the pass-band ripple error has the smallest value. The processes to determine the sub-filter can be described as follows.
1. First, define the sub-filter applied to the previous part of the reconstruction filter. filter length.
A sub-filter obtained by the above method can be expressed as Eq. (11).
Figures 12 and 13 compare the ripple error and phase delay for the continuous delay control architecture and proposed architecture. The phase delay can be a important measure for fulfilling the interpolation. If the phase delay is increased by the sample rate conversion, then it is important that the synchronization between the input and output should be adjusted. Because of this, calculating the degree of ripple error and phase delay after interpolating can tell us whether the synchronization is carried out well or not. The responses corresponding to the delay values d = 0.1, 0.2, · · · , 0.5 are shown in Figs. 12 and 13.
They show that d (= 0.1) is equal to that of d (=0.9) for the even-length (L=4) filter, and the corresponding phase delay responses are symmetric with respect to the curve of d (=0.5). At the low frequency, the phase delay curves are nearly constant, but at the high frequency they approach the integer delay, which is in the case of '2' as shown in Fig. 13. For Figs. 12 and 13 , the ripple error of the pass-bandwidth and phase delay values over continuous delay control architecture are 0.0262 and 0.00612, respectively. However, the ripple error and phase delay of the proposed architecture are 0.0115 and 0.00383, respectively. This shows that the proposed architecture has a more advantageous position in ripple error and phase delay than the continuous delay control structure.
Results and Comparison
The proposed partially folded FIR filter and overall interpolation filter were designed in Verilog HDL and simulated to verify its functionality by using MATLAB and ModelSim 6.0 SE. The proposed architectures were synthesized using appropriate time and area constraints. The synthesis steps were carried out by using SYNOPSYS design tools and 90-nm CMOS technology optimized for a 1.1 V supply voltage. Table 7 shows the implementation results of the pro- Table 8 Implementation results of the proposed Cubic B-spline structure using sub-filter and continuous delay control architecture.
posed partially folded FIR architecture, unfolded architecture, and conventional folded architecture. The proposed partially folded FIR filter architecture operates approximately at a clock frequency of 200 MHz, and requires approximately 60% and 12% fewer gate counts than the unfolded and conventional folded architectures, respectively. The latency of the proposed architecture is six times longer than that of the unfolded architecture, because the folded architecture has a folding factor of six. In the unfolded part, the architecture using coefficients expressed as binary representation has 6% less hardware complexity than the architecture using coefficients expressed as CSD representation. Table 8 shows the implementation results of the overall interpolation filter architectures. The proposed cubic Bspline interpolation filter architecture using the sub-filter has 16% less hardware complexity than the conventional continuous delay control architecture. It also operates approximately at the clock frequency of 385 MHz. Because the proposed interpolation filter has a folding factor of six, its latency has six times longer than that of the unfolded architecture.
Conclusion
This paper presents the design and implementation of the partially folded FIR filter and cubic B-spline interpolation filter using sub-filter for ATSC broadcasting DTV systems. To implement a low complexity FIR filter, the optimization of filter coefficients was needed. To do this, a common sub-expression elimination method, which shares coefficients expressed as binary representation, is used. Instead of using a delay unit, a control method of the filter coefficients using a multiplexer is proposed. As a result, the proposed architecture has 60% less hardware complexity than the unfolded architecture. Also, it has 12% less hardware complexity than the other folded architecture.
The overall interpolation filter architecture using subfilter and cubic B-spline has a more efficient interpolation performance. Also, it has 16% less hardware complexity than the conventional continuous delay control architecture. The proposed FIR filter and interpolator has potential applications in DTV tuner for ATSC broadcasting DTV systems.
