Abstract-This paper studies the design and multiplier-less realization of a new software radio receiver (SRR) with reduced system delay. It employs low-delay finite-impulse response (FIR) and digital allpass filters to effectively reduce the system delay of the multistage decimators in SRRs. The optimal least-square and minimax designs of these low-delay FIR and allpass-based filters are formulated as a semidefinite programming (SDP) problem, which allows zero magnitude constraint at = to be incorporated readily as additional linear matrix inequalities (LMIs). By implementing the sampling rate converter (SRC) using a variable digital filter (VDF) immediately after the integer decimators, the needs for an expensive programmable FIR filter in the traditional SRR is avoided. A new method for the optimal minimax design of this VDF-based SRC using SDP is also proposed and compared with traditional weight least squares method. Other implementation issues including the multiplier-less and digital signal processor (DSP) realizations of the SRR and the generation of the clock signal in the SRC are also studied. Design results show that the system delay and implementation complexities (especially in terms of high-speed variable multipliers) of the proposed architecture are considerably reduced as compared with conventional approaches.
There are several important contributions to the realization of digital IFs of SRRs [1] [2] [3] (see also the references therein), which is mostly based on the architecture shown in Fig. 1(a) . We can see that the analog IF signal is first digitized at a bandwidth of say 20 to 40 MHz. A programmable digital decimator and a sample rate converter (SRC) are employed to isolate the desired user's channel from the signal spectrum and convert it to an appropriate sampling rate for further processing. Moreover, the programmable digital decimator usually consists of multiple stages of decimators to reduce implementation complexity and power dissipation as shown in Fig. 1(b) . As the sampling rate of the baseband signal is much lower than that at the IF, the output of each stage in the decimator will consist of a bandlimiting (anti-aliasing) digital filter and a downsampler (decimator) to filter out the unwanted signals and lower the sampling rate. By selecting an appropriate number of stages, different integer downsampling ratios can be implemented. The programmable FIR filter (PFIR) is used to remove the residual interference from adjacent channels. It is because the sampling rate is usually not an integer multiples of the channel spacing. Hence, the multistage decimators, which implement an integer downsampling ratio, are unable to remove this residual interference. Together with the SRC, which provides the necessary arbitrary rate-change factor, it is now possible to accommodate signals with a wide variety of bandwidths.
One drawback of this conventional structure is that the output of the multistage decimators, which is obtained by downsampling the high-rate IF signal from the ADC, has to be upsampled again in order to carry out the arbitrary sample rate conversion. Another important problem is the high complexity of the PFIR due to a considerable number of high-speed variable multipliers required for its implementation, especially for wideband signals. Recently, the authors have proposed a new digital IF architecture for SRRs shown in Fig. 2 [4] , [5] . The SRC, which is realized using a Farrow-based variable digital filter (VDF) [6] , [7] , is performed immediately after the multistage decimators. The basic idea of the VDF-based SRC is to provide variable fractional delay in the passband and additional attenuation in the stopband. This allows us to replace the PFIR by a HBF with fixed coefficients, if the arbitrary rate-change factor is properly chosen. This new architecture eliminates the need for the PFIR, which is usually a bottleneck of software radio application for wideband signals. As a result, the implementation complexity is significantly reduced because the fixed coefficients of the SRR can be efficiently implemented using sum-ofpowers-of-two (SOPOT) coefficients or canonical signed digit (CSD) [8] . Apart from the limited number of variable multipliers required in the Farrow structure, the entire SRR can be implemented without any multiplications.
In [4] , the multistage decimators and the HBF are realized using linear-phase FIR filters. The use of linear-phase filters usually results in a longer system delay compared with approximately passband linear-phase (low-delay) FIR or IIR filters. This is undesirable in some applications and it motivates us to study in this paper the application of low-delay FIR, digital allpass filters and SRC and their efficient realizations in order to reduce the system delay of the new SRR. The design of the low-delay FIR and allpass-based filters are performed using semidefinite programming (SDP) [9] [10] [11] . Furthermore, it was found for the low-delay FIR decimators that the constraint of zero magnitude response at , which is desirable to attenuate the aliasing components before decimation, can be readily incorporated in the SDP approach. To design the allpass filters using SDP, the frequency specification is first formulated as a set of matrix inequalities, which is a bilinear function of the filter coefficients and the ripple to be minimized. The overall design problem turns out to be a quasiconvex constrained optimization problem and it can be solved through a series of convex optimization sub-problems and the bisection search algorithm [9] . Besides, the design of the VDF-based SRC is further studied. This SRC is also applicable to software radio transmitters and base stations [12] , as they also require arbitrary sampling rate conversion. In particular, a new SDP method, which is optimal in the minimax design criterion, is proposed and compared with the weight least squares (WLS) method [6] , [7] . Design results show that both methods give similar performances when the order of interpolation is small and the computational time of WLS is significantly lower. If higher order interpolation and additional constraints are required, the SDP method is more flexible and it yields better results, at the expense of increased design time. Other implementation issues of the SRC, such as the generation of the clock signal and control parameters, are also investigated. In particular, a flexible clocking generation scheme to accommodate different communication standards and a unit to calculate the control parameter in the interpolation part of the SRC are proposed.
As mentioned earlier, another objective of this paper is to study the efficient realization of the proposed low-delay receivers. Two approaches are considered. The first one is to realize the SRR using DSP. It is assumed that the DSP is fast enough to deal with the decimated signal in the SRR and its output will be further processed in the baseband, which is more suitable for software radio applications with large downsampling ratios. The second one is the multiplier-less hardware realization, which is more desirable when the downsampling ratio is small, i.e., high rate operations. In the latter approach, the fixed coefficients of the SRR can be efficiently implemented as limited number of shifts and additions by employing the SOPOT representations. These SOPOT coefficients are obtained by the random search algorithm in [4] , [13] . The multiplier-block (MB) technique [14] is also employed to further reduce the implementation complexity. Design re-sults show that the complexity of the proposed allpass-based SRR is less than that using the low-delay FIR filters with the same design specifications for both DSP and multiplier-less implementations. Both approaches compare favorably with the conventional approach in terms of the number of general multipliers required and system delay as demonstrated by a design example for a multistandard receiver for the GSM, W-CDMA, CDMA2000, and Hiperlan/2 wireless interfaces. A reduction in system delay ranging from 10.4% to 15% is achieved over their linear-phase counterpart in [4] , at the expense of modest increase in arithmetic complexity. It should be noted that the proposed techniques for realizing the low-delay decimators are also applicable to conventional receivers.
The rest of this paper is organized as follows: Section II is devoted to the design and implementation of the proposed lowdelay FIR and allpass-based SRR. Comparisons and detailed design examples for different communication standards are illustrated in Section III. Section IV describes other implementation issues of the SRC for the SRR. Finally, conclusions are drawn in Section V.
II. PROPOSED LOW-DELAY SRRs
In this section, the design and implementation of the proposed low-delay FIR and allpass-based digital IF architecture for SRRs are described. Fig. 2 shows the new digital IF architecture proposed in [4] , [5] . In Fig. 2(b) , the digitized IF-signal from the high-speed ADC is first passed through the compensated cascaded integrator-comb (CIC) filter and is decimated by a factor of . Its output is then fed to the multistage decimators, which are realized using general low-pass anti-aliasing filters, denoted by LPF#1, LPF#2, and LPF#3 in Fig. 2(c) . The number of low-pass anti-aliasing filters required depends on the maximum downsampling ratio of the receiver. Without loss of generality, we assume that our receiver consists of three stages so that they can support the signal bandwidths ranging from GSM to Hiperlan/2 standards (i.e., a downsampling ratio from 4 to 295.3849. The maximum downsampling ratio of the SRR is 512). The maximum downsampling ratio can be increased, to say 1024 and higher, by increasing the number of anti-aliasing filters and the decimation factor of the CIC filter. Unlike the conventional receivers in [1] [2] [3] , the output of the multistage decimators is fed to the SRC, which is implemented using a Farrow-based VDF. Finally, the output of the VDF-based SRC is fed to a half-band filter (HBF) with fixed coefficients to reduce the residual interference. An advantage of this architecture is that it eliminates the need for a PFIR in the traditional receiver, which is usually a bottleneck in software radio application for wideband signals. The overall downsampling ratio of the proposed SRR is given by (2.1) where , which is a positive powers-of-two integer, is the downsampling ratio of the compensated CIC filter; , which is chosen to lie between 1 and 2, is the arbitrary downsampling ratio of the SRC; and is the number of the remaining 2-to-1 decimators to be selected. In general, the VDF-based SRC is more complicated and involved to design and realize than the other digital filters in the SRR. Therefore, it is preferable to implement the SRC after the compensated CIC filter and the multistage decimators so that the operating rate of the SRC can be lowered. The system delay of the proposed SRR with given by (2.2) where is the group delay of the CIC filter;
is the group delay of the second-order CIC compensator to be described later in Section II-A; , , and are the group delays of the LPF#1, LPF#2, and LPF#3, respectively; is the group delay of the SRC as a function of ; is the group delay of the HBF. Note that if one of the decimation filters is not selected, the corresponding group delay should be zero. It can be seen that the system delay mainly depends on the group delay of the LPFs and HBF since they increase rapidly with the downsampling ratios of , and . As a result, if low-delay FIR or allpass filters are used to realize the multistage decimators and HBF, then the system delay can be greatly reduced. Next, let us go through the architecture of the proposed SRR in detail. The techniques to be described in Sections II-A-C are also applicable to traditional receivers [2] , [3] though our primary interest will be the architecture proposed in [4] , [5] .
A. Second-Order CIC Compensator
Here, the design and implementation of the second-order CIC compensator to compensate for the passband droop of the basic CIC filter are described. The basic CIC filter [15] is commonly employed when a large downsampling ratio is required, because of its reasonable performance and low hardware complexity. The transfer function of the CIC filter is given by (2.3) where ; is the number of CIC stages. One drawback of the CIC filter is the passband droop that limits the quality of the anti-aliasing filters. In [5] , we proposed a secondorder CIC compensator with the following transfer function:
where and are real-valued constants to be determined and . As shown in Fig. 3 (a), it is placed after the CIC filter. This compensator can also be viewed as the equalizer in the interpolated FIR filters [16] . Its frequency response, as can be seen from Fig. 4 , is periodic, which is designed to equalize the passband droop of the CIC filter. Given the frequency response of the CIC filter in (2.3), the constants and can be readily determined using the Parks-McClellan algorithm. To reduce the implementation complexity, the constants and are expressed as the following CSD or SOPOT representations [8] : where and ; and are positive integers and their values determine the dynamic range of the coefficients; is the number of terms used in the coefficient approximation. Using (2.5), the coefficient multiplications can be efficiently implemented as limited number of shifts and additions only. These SOPOT coefficients can be obtained by a number of methods [4] , [8] , [13] , [17] [18] [19] . In this paper, the random search algorithm [4] , [13] is employed to minimize the total number of SOPOT terms subject to the given specifications in the frequency domain. The resulting SOPOT coefficients are:
; . The frequency responses of the basic CIC filter, the CIC compensator, and the compensated CIC filter for and are shown in Fig. 4 . The worst case passband deviation and aliasing attenuation of the compensated CIC filter for and are 0.0085 and 112.34 dB, while those for the CIC filter are 0.0338 and 112.36 dB, respectively. Therefore, the CIC compensator improves the passband droop by a factor of four while maintaining a comparable aliasing attenuation. It also has a low coefficient dynamic range compared with the interpolated second-order polynomial (ISOP) filters in [2] . Using the noble identity [20] , the compensated CIC filter in Fig. 3 (a) can be implemented more efficiently as shown in Fig. 3(b) , and the structure of the basic CIC filter is shown in Fig. 5 . Next, we shall consider the design and implementation of the low-delay low-pass anti-aliasing filters in the multistage decimators.
B. Multistage Decimators 1) Design:
Conventionally, the low-pass anti-aliasing filters are implemented using halfband filters [2] , [3] . In [4] , linearphase FIR low-pass anti-aliasing filters (LPFs): LPF#1, LPF#2, and LPF#3, as shown in Fig. 2(c) , are proposed to improve the performance of the SRR. Their coefficients are readily obtained using the Parks-McClellan algorithm and efficiently implemented using SOPOT coefficients and MB. As mentioned earlier, this approach usually yields longer system delay. In this paper, we propose to realize these multistage decimators using low-delay FIR and digital allpass filters in order to reduce the system delay of the SRR. First of all, let us consider the design of the allpass-based decimation filters with the following transfer function (2.6) where is the filter order; with , and 's are real-valued coefficients. Substituting into (2.6), we have , where , and . It can be seen that the allpass filter has a unit magnitude response and its phase response can be used to approximate a desired phase response. Here, it is used to realize the low-pass anti-aliasing filters in the multistage decimators as a parallel interconnection of two allpass sections, Fig. 6 (a), as follows:
Since fractional delays are, in general, not required in the multistage decimators, one of the allpass sections is chosen as a signal delay in order to reduce the implementation complexity. The desired phase response of the allpass filter for the low-pass filters is given by (2.8) where and are the passband and stopband edges of the th low-pass filter in the multistage decimators. It should be noted that a zero at is structurally imposed in , which is desirable to attenuate the aliasing components. A number of methods have been proposed for designing allpass filters [21] [22] [23] [24] . In this work, the SDP approach [9] is employed. This approach is able to design causal-stable digital allpass filters with a prescribed pole radius constraint and minimax design criterion. Additional linear constraints such as flatness or zero magnitude response at certain frequencies can be incorporated. Interested readers are referred to [9] for more details.
For the design of low-delay FIR decimators, the SDP approach in [11] is also employed. This is because it is possible to incorporate the zeros at for the low-delay anti-aliasing filters. Traditionally, linearly constrained linear-phase FIR filters are designed using a linear programming approach [25] . To the authors' best knowledge, the optimal minimax design of nonlinear-phase FIR filters with linear constraints and convex quadratic constraints has not been reported. Next, we show that this problem can be solved readily using SDP. More precisely, the th low-pass filter in the multistage decimators of length to be designed is given by (2.9) where , with the following desired frequency response: (2.10) where is the corresponding group delay. This will reduce to the linear-phase case when . Let be the number of zeros to be imposed at for . This is equivalent to for (2.11) Expanding (2.11) and after slight manipulation, one gets a set of linear equality constraints as follows:
where , , and . Here, denotes the -th entry of matrix . This will be used to eliminate the redundant variables in the SDP method to be described later in this section. To minimize the maximum ripple of the approximation error is equivalent to the following: for and (2.13) where ; is a positive weighting function. To solve (2.13) using SDP, we densely discretize over the band of interest into a set of frequency points , . This yields subject to (2.14a) where ; ; ;
. Using the Schur complement [10] , it can be shown that (2.14a) is equivalent to subject to (2.14b) where , and means that matrix is positive semidefinite. Since is affine in , it is equivalent to a set of linear matrix inequalities (LMIs) [10] . In order to simultaneously solve the SDP problem in (2.14b) and the constraint in (2.12), the dependent variables can be expressed as a linear combination of independent variables. The number of variables to be optimized is therefore reduced. It not only speeds up the optimization process but also structurally imposes the desired constraint. where ; . Theoretically, it is possible to determine whether a feasible solution exists for the SDP problem, and if so, it is possible to determine the global optimal solution, since the problem is convex. Moreover, the SDP problem is very general in that other design criteria such as least squares, and least squares with peak error constraints can be employed, possibly with linear and convex quadratic constraints. Due to page limitations, their illustrations are omitted.
2) Multiplier-Less Realization: As mentioned earlier, the fixed coefficients of the multistage decimators can be efficiently implemented without multiplications using SOPOT coefficients [8] and the MB technique [14] , [26] . When applying the MB technique to the realization of digital infinite-impulse response (IIR) filters, Dempster and Macleod [26] reported that the cascade structure is in general more efficient. Therefore, the allpass filters in our decimators are implemented using a cascade of first-and second-order sections [20] as shown in Fig. 6 . More precisely, let be the th root of , , in (2.6). For real-valued , the first-order section has the form for (2.18) and are the number of first-and second-order sections, respectively. The total number of sections in the allpass function is . For DSP implementation, the multiplications in the first-and second-order sections are implemented by a dedicated high-speed multiplier as shown in Fig. 6(b) and (c), respectively. For multiplier-less (hardware) implementation, the fixed coefficients of the first-and second-order sections can be represented using SOPOT coefficients. For the second-order section, the MB technique is employed to further reduce the complexity as shown in Fig. 6(d) . The specifications and performances of the LPFs in the multistage decimators are summarized in Table I . The SOPOT coefficients of the designed LPFs are shown in Tables II-IV . The frequency responses and the corresponding group delays of the allpass-based LPFs are shown in Fig. 7 . (Filter Order = 6, K = 0, K = 3) Fig. 8(a) shows the pole-zero plots of the LPF#3. It can be seen that all poles of the filter are inside the unit circle. Due to page limitation, pole-zero plots of LPF#1 and LPF#2 and details of the low-delay FIR decimators are omitted. The multiplier-less realization follows closely the approach presented in [4] . Their performance comparison will be presented in Section III. Next, we shall consider the design and implementation of the sample rate converter for arbitrary sample rate conversion.
C. SRC
The design of programmable SRCs with arbitrary conversion factors was studied in detail by Ramstad [27] . In general, there are two approaches to implement a SRC with different tradeoff between the operating rate and the hardware complexity for SRRs. One is to employ the structure in Fig. 9(a) [28] where the input signal is first up-sampled by a factor of by inserting zeros between successive time samples. This creates images in the frequency domain, which are then removed by an -band interpolated filter with spectral support from to . If is sufficiently large, further interpolation with an irrational downsampling ratio can be achieved simply by a loworder interpolator such as Lagrange interpolation [29] , cubic spline [30] and a low-order fractional-delay digital filter (FDDF) [28] , etc. As an example, the cubic interpolator is able to provide rather accurate fractional delays up to about . After which, both the amplitude and phase responses deviate considerably from an ideal FDDF [13] , [31] . Therefore, an -band interpolated filter should be used to upsample the input signal so that it can be fitted into the operating range of the cubic interpolator. It is also required to remove the images created by the upsampler due to the limited stopband attenuation of the cubic interpolator. One drawback of employing this structure in the SRR is that the output of the multistage decimators, which is obtained by downsampling the high-rate IF signal from the ADC, has to be upsampled again by the -band filter. To overcome this problem, the functions of the -band filter and the low-order interpolator can be simultaneously implemented using a VDF [6] , [7] , [32] . A VDF is a digital filter whose frequency and/or phase responses can be controlled by a parameter . The ideal frequency response of the VDF-based SRC is given by (2.20) where is the group delay of the SRC. and are the passband and stopband edges of the SRC, respectively. In the passband, it behaves like a FDDF with a parameter to provide the required arbitrary fractional delays. In the stopband, it helps to attenuate the undesirable frequency components. More precisely, the impulse response of the VDF, , is approximated by an th-order polynomial in variable as follows:
The -transform of (2.21) is then given by Fig. 10 . It consists of a set of subfilters followed by the multiplications with the appropriate powers of the parameter . It computes the required delayed (fractional) samples of the signal components in the passband, while attenuating those in the stopband. For modest downsampling ratios, the VDF-based SRC in Fig. 9(b) is more efficient than the structure in Fig. 9 (a) because its coefficients can be jointly optimized to fulfill the given spectral and fractional-delay specifications. In the proposed SRR, the downsampling ratio of the SRC, , is chosen to lie between 1 and 2. Thus, the VDF-based SRC leads to a better performance without having to increase the sampling rate as in the -band filter approach [28] . As a result, the operating rate of the multistage decimators can be significantly lowered by a factor of , say 4 to 8 in our example. In general, will increase with the accuracy required. The VDF-based SRC can be designed using the WLS [4] , [6] , [7] 
1) Weighted Least Squares (WLS) Approach:
In the WLS approach, the following least-squares cost function is minimized: (2.24) where ; is a positive weighting function; is the spectral support over which is to be approximated, and is the tuning space, which is chosen to be . It can be shown that the matrix is symmetric and positive definite. Conseuqently all its eigenvalues are distinct and real, and the matrix is nonsingular [20, pp. 55] .
2) SDP Approach: The problem of designing the VDFs in the minimax sense can be formulated as (2.26) Densely discretizing the frequency variable and the control parameter over the spaces and into a set of points , , and , , we obtain the following equivalent problem of (2.26) subject to (2.27) for , and , where ; and . Using Schur complement [10] , the constraints in (2.27) can be rewritten in the following LMIs: (2.28) which is affine in the variable vector . Defining the augmented variable , the problem in (2.27) can be cast into the standard SDP problem in (2.17) .
The WLS approach is attractive for its simplicity and fast design time. Additional linear equality constraints can also be incorporated using the Lagrange multiplier method, by solving a quadratic programming problem with linearly equality constraints. It is also known as the eigenfilter method. For the VDFbased SRC, the design time for the WLS approach in a PIII-866 MHz personal computer is 14 s for . For VDF with low order of interpolation, say , the WLS and SDP approaches yield similar performance [32] . However, when is increased and additional constraints are required, the SDP approach is more flexible and it yields better results, at the expense of more design time. For example, when is increased to 7 with the same subfilter length, the worst-case stopband attenuation and the design time are respectively 93.2 dB and 28 s for the WLS approach, as compared to 98.1 dB and 43 min for the SDP approach (all the SDP designs are carried out using the LMI toolbox in MATLAB). Fig. 11 shows the corresponding frequency responses and the group delays of the VDF designed.
The multiplier-less realization of the VDF-based SRC was studied in [4] [5] [6] . In particular, all the subfilters in the Farrow structure are implemented in their transposed forms as shown in Fig. 10(b) . By representing all these coefficients as SOPOT coefficients and employing the MB technique, the total number of additions can be kept to minimal by reusing the immediate results generated. As a result, the VDF-based SRC is free of variable multipliers except for the limited number of variable multipliers in the interpolation part of the Farrow structure. The specifications and the performances of the VDF-based SRC so obtained are summarized in Table I and their frequency responses are shown in Fig. 12 . Due to page limitations, Table V only shows the SOPOT coefficients of the first subfilter . The results for other subfilters and the real-valued coefficients are omitted.
D. HBF
In this subsection, the design and implementation of the HBF shown in Fig. 2(b) is presented. In [4] , the HBF is implemented using linear-phase FIR filters because it leads to more flexibility in choosing the cutoff frequency [4] , [5] . Although HBFs has fewer nonzero coefficients than a general LPF, the difference in hardware complexity here is rather small when they are implemented as SOPOT coefficients and multiplier blocks. In order to reduce the system delay of the SRR, we propose to realize the HBF using low-delay FIRs and the allpass filters in (2.6) with the following transfer function:
The desired phase response of the allpass filter is given by (2.30) where is the passband edge of the HBF. Note, due to the structural constraints of the HBF, the stopband edge is given by . For the low-delay FIR HBF, in (2.29) is replaced by a FIR function with the following desired response: (2.31) where is the filter length of the HBF. The design and multiplier-less realization of this HBF are also based on the methods described in Section II-B. Table I shows the specifications and the performances of the HBF, and its SOPOT coefficients are listed in Table VI . The frequency response and the group delays of the allpass-based HBF are shown in Fig. 7 . Fig. 8(b) shows its pole-zero plot.
III. DESIGN EXAMPLES
In this section, we demonstrate the application of the proposed low-delay SRR to support the GSM, W-CDMA, CDMA2000, and Hiperlan/2 standards. The hardware complexities and the performances of the SRR using the allpass-based and low-delay FIR filters for both DSP and multiplier-less (hardware) implementations are examined and compared. A comparison between the proposed SRR and traditional programmable receivers is also presented. First of all, let us assume that the digitized IF signal is sampled at 80 M samples per [35] . It also includes the configurations and computational complexities for both DSP and multiplier-less implementations of the SRR. It can be seen that the computational complexities of the allpass-based SRR for the four communication standards are less than that using low-delay FIR filters for both DSP and multiplier-less implementations, especially for wideband application. In additional, the system delay of the low-delay SRR is (866.46, 60.08, 194.76, 11) samples lower (i.e., a reduction of 10.8%, 11.26%, 10.4% and 15%, respectively, in system delays) for GSM, W-CDMA, CDMA2000 and Hiperlan/2, respectively, as compared with their linear-phase counterpart, at the expense of modest increase in hardware complexity. The target specifications of the SRR are 0.015 dB in passband deviation, 100 dB in stopband attenuation and 35 dB in fractional-delay error. By employing the random search algorithm [4] , [13] , the SOPOT coefficients of all the components, as shown in Tables I to VI, are obtained. Table VIII shows the passband deviations, stopband attenuations and group delay errors of the SRR using the allpass-based and low-delay FIR filters with both real-valued and SOPOT coefficients for different operating ranges of , i.e., cascading different components. It can be seen that the performances of the SRR using real-valued and SOPOT coefficients are similar. As an illustration, the frequency responses of the SRR with , i.e., cascading the LPF#3, HBF and the VDF-based SRC with , using the SOPOT allpass-based and low-delay FIR filters are shown in Fig. 13 . The frequency responses of the SRR, using the allpass-based filters with SOPOT coefficients, and the following operating ranges: a) , i.e., cascading the HBF and the VDFbased SRC with ; b)
, i.e., cascading the LPF#1, LPF#2, LPF#3, HBF and the VDF-based SRC. These are shown in Fig. 14(a) and (b) , respectively. Since the proposed SRR is considerably different from the traditional programmable receiver, it is very difficult to make an exact comparison. In order to give the readers an idea of the potential benefits and hardware savings of the proposed SRR, a comparison with the programmable receiver proposed in [2] is considered below. The architecture in [2] consists of a CIC filter with , an ISOP sharpening filter, five modified HBFs (MHBFs) as the multistage decimators, and an PFIR. Since a SRC was not designed in [2] , we assume that it is done using the same VDFbased SRC that we have proposed in Section II-C so that they have the same complexity. Furthermore, as the programmable receiver proposed in [2] is designed to be linear-phase, a SRR using the proposed technique but employing linear-phase FIR filters [4] is also included as a comparison. Table IX shows the hardware complexities of the linear-phase SRR excluding the VDF-based SRC for the two receivers. It can be seen that the major hardware resources of the architecture in [2] is the variable multipliers required in the PFIR. Although the multiplications can be time multiplexed using a high-speed multiplier, it will limit the maximum clock speed of the receiver for wideband applications, i.e., small downsampling ratios. In the proposed SRR, the PFIR is replaced by a HBF with fixed coefficients, which results in very low implementation complexity, thanks to the novel VDF-based SRC. Therefore, the number of the variable multipliers can be drastically reduced. Note, the stopband attenuation of the linear-phase SRR is slightly lower than that in [2] . However, it considerably outperforms [2] in passband deviation and the number of variable multipliers as shown in Table IX. Table X shows the hardware complexities of the proposed low-delay SRR, excluding the VDF-based SRC using both allpass-based and low-delay FIR filters. It is observed that in order to reduce the system delay from (8012.77, 533.49, 1870.56, 71) to (7146.31, 473.41, 1675.8, 60) samples for GSM, W-CDMA, CDMA2000 and Hiperlan/2, respectively, the complexity is increased from 178 to 286 adders for the FIR realization. It can also be seen that for DSP implementation, the allpass-based SRR requires lower hardware cost (43 multipliers and five adders less) than that using the low-delay FIR filters. For multiplier-less (hardware) implementation, it still requires 51 fewer adders than the FIR realization. Table XI shows the total number of multipliers and adders required to implement the whole low-delay SRR. Note, though the multiplier-less SRR requires three variable multipliers in the interpolation part of the VDF-based SRC shown in Fig. 10(a) , it is still much lower than the PFIR approach reported in [2] . Although not shown here due to page limitation, it is also possible to reduce the system delay in the traditional receiver by employing the techniques described in Section II-B. However, this receiver still requires considerable number of variable multipliers in the PFIR filter.
IV. OTHER IMPLEMENTATION ISSUES OF THE SRC
In this section, other implementation issues of the SRC for the SRR are presented. First of all, the flexible generation of the clocking signal for the sample rate conversion is very critical in order to support multiple standards in SRRs. In particular, it requires the generation of the clocking signals with different frequencies and high spectral purities. Although the sampling rate at the input of the SRR is fixed, the clock rate in the interpolation part of the VDF-based SRC has to be varied according to the required downsampling ratio of the receiver, which depends on the communication standard to be supported. These clocking signals can be generated by a direct digital frequency synthesizer (DDFS) [36] . The coordinate rotation digital computer (CORDIC)-based DDFS architecture proposed in [37] is particularly suitable for the SRC because of its high spectral purity and efficient multiplier-less realization. In the CORDIC-based DDFS, a digital sine wave with a certain frequency is generated by the CORDIC algorithm using a phase accumulator and a phase-to-amplitude converter. The DDFS is driven by a clock signal at a fixed frequency, which is considerably higher than the frequencies to be generated. At each time instant, the appropriate values of the sine wave is calculated using the CORDIC algorithm, which can be performed by a sequence of shift-and-add operations. This yields an efficient multiplier-less implementation of the DDFS with high phase resolutions, high precision, and low spur-free dynamic range (SFDR). To generate the required clocking signal, the digital values of the reference sine wave generated by the DDFS is sent to a digital-to-analog converter (DAC) to produce a staircase-like analog approximation of the sine wave. After appropriate low-pass filtering, a comparator can be used to generate the desired binary clocking signal.
Another implementation issue of the SRC is the calculation of the fractional-delay parameter for each output sample of the SRC. This requires a fractional-delay calculation unit shown in Fig. 15 , which is based on the fractional part of , to calculate the required value for each output sample of the SRC. For example, as shown in Fig. 12(b) , if , the fractional delay is 0.2 for the first output sample of the SRC. The corresponding fractional-delay parameter is equal to 0.3. Similarly, the second output sample has a fractional delay of 0.4 and is equal to 0.1, and so on. In general, let be the fractional-delay parameter at the th output sample of the SRC. The fractional-delay parameter can be computed from the fractional part of , Fig. 15 , as follows: (4-1) where and denotes the fractional part of the value .
V. CONCLUSION
The design and multiplier-less realization of a new SRR with reduced system delays is presented. Its employs low-delay FIR and digital allpass filters to effectively reduce the system delay of the multistage decimators in SRRs. The optimal least-square and minimiax designs of these low-delay FIR and allpass-based filters are formulated as a SDP problem, which allows zero magnitude constraint at to be incorporated readily as additional LMIs. By implementing the sampling rate conversion using a VDF immediately after the integer decimators, the needs for an expensive programmable FIR filter in the traditional SRR is avoided. The design of the VDF-based SRC using the WLS and SDP methods are formulated and compared. Other implementation issues including the multiplier-less and DSP realizations of the various digital filters and the generation of the clock signal in the SRC are also studied. Design results show that the proposed architecture considerably reduces the system delay and implementation complexities (especially in high-speed variable multipliers) as compared with conventional approaches.
