Introduction
The aim of a digital filter is to provide a magnitude response with zero value at frequencies where the stopband is planned and a unit value at frequencies where the input signal is intended to pass through the system. Numerous filter design methods are based on an approximation of ideal magnitude, without considering the phase characteristic at all. However, in many practical filter applications, the linear phase is the most important feature of a digital filter. For example, in telecommunication systems, preserving of the signal shape is a primary goal.
Filter design methods based on the equiripple magnitude error in all bands produce an elliptic filter as a result. In the paper, we will consider two different structures for the realization of filters which simultaneously provide elliptic magnitude and approximately linear phase characteristics. The traditional structure is based on a selective filter in cascade with an appropriate group delay corrector (EC), while the alternative realization consists of two allpass filters in parallel (PA).
The PA filters are characterized by exceptionally low coefficient sensitivity in pass-bands [1] . For a long time, papers have been dealing with methods to improve the design of standard filters (low-pass, high-pass, etc.). Nowadays, PA structure is capable of realizing notch and comb filters [2] , [3] , differentiators [4] , [5] , Hilbert transformers [6] , [7] , etc. PA filters also represent convenient building blocks for the realization of filter banks [8] , [9] , which play very important role in signal compression. The PA halfband recursive filters are a standard structure used to realize efficient down-sampling or up-sampling filtering tasks. The task performed by a filter bank is combination of the common operations of spectral translation, bandwidth reduction and sample rate changes [10] , [11] .
During the last decade, software defined radio (SDR) has become a popular platform for realization of digital communication systems. For a high performance SDR system, field programmable gate arrays (FPGAs) are commonly used as key components. FPGAs show high efficiency for digital signal processing applications because they are suitable for implementation of fully parallel algorithms [12] . In the SDR receiver, the most demanding component is the channelizer, which operates at the maximum sampling rate. High-speed finite impulse response (FIR) filters with linear phase are commonly applied in the channelizer [13] . The alternative solution could be approximately linear phase PA filter of significantly lower order.
For real-time applications, a lower group delay of the filter is a crucial characteristic, regardless of the type of realization (software or hardware). Different structures for the realization of the filter [14] will require a different number of adders, multipliers and delay elements, which affect power consumption and the occupied silicon area on the chip. This paper is a further step in PA filter research with the intention of analyzing whether the PA filter realization could match the traditional solution.
The Traditional Approach to Obtaining an Approximately Linear Phase Filter
The traditional approach starts with the design of an elliptic filter which satisfies the given magnitude specifications. The problem of the nonlinear phase of the elliptic filter can be solved by introducing a phase or group delay corrector of the order N c in cascade. In Fig. 1 , H Ne (z) denotes the transfer function of the elliptic filter of the order N e . The obtained EC filters from Fig. 1 satisfy both the magnitude and phase predefined specifications simultaneously. Approximately linear phase is achieved with the corrector of the order N c which is often significantly higher than the order of the elliptic filter N e .
Selective Filters Obtained with the Parallel Connection of Two All-pass Filters (PA)
The PA filter configuration is given in Fig. 2 . The obtained filters are doubly-complementary [8] and ideal candidates for application in a multirate filter bank [15] .
This structure is especially useful in case we need to implement both complementary filters. The particular configuration allows the second (complementary) filter to be obtained at the price of only one additional adder. The transfer functions of complementary filters are given by
where parameter p has the value p = 0 for the low-pass and p = 1 for the high-pass filter, respectively. The order of the IIR all-pass filter is denoted with N a , while N d is the order of the delay line from another parallel branch. For low-pass /high-pass filter pair design, the condition N a = N d + 1 needs to be fulfilled. To obtain a pass-band/stop-band filter On the unit circle z = e jω , H p (z) can be expressed as
The magnitude response is given by
The phase response is given with
Corresponding group delay of the PA filter is
where τ Na (ω) is the group delay of the IIR all-pass filter H N a (z) and N d is the constant group delay which corresponds to the delay line with an ideal linear phase -N d ω.
The magnitude response of the PA filter depends only on the all-pass sub-filter phase difference (3) . Therefore, it is natural to define the problem of the design of a selective filter as a phase approximation problem [16] . In all the following examples, the equiripple approximation approach to the linear phase is adopted. Filter design can be done by using other approximation methods, for instance, via flat delay filter [17] , [18] .
As a consequence, according to (3), PA filters have elliptic-like magnitude characteristics. The magnitude error extrema are located at the same frequencies as the IIR allpass phase error curve extrema. Therefore, the obtained filters are compared with adequate standard elliptic filters with appropriate group delay correctors. The EC filters do not only fulfill the same magnitude specifications as the PA filters, but also provide a similar group delay error. According to (4) , the PA filter can provide an arbitrary phase shape [19] if the delay line is substituted with an IIR all-pass filter of the same order. In that case both all-pass filter phases approximate the same predefined ideal phase of the chosen shape. At frequencies where the phase difference is close to 2kπ radians, for p = 0 the pass-bands are obtained, while at regions where the phase difference is close to (2k + 1)π radians, the stop-bands are realized. In practice, in many filter applications, the linear phase is a desirable feature of crucial importance. That is the reason why many papers still deal with IIR linear filter design [20] , [21] . The FIR filters could provide the ideal linear phase, but to satisfy the given magnitude specifications sometimes the order of the filter needs to be very high which could limit application. The FIR filter is a typical solution in applications which do not tolerate phase distortions (digital communications, audio signal processing, etc.). New techniques for FIR filter design are still emerging [22] , [23] , but the ideal linear phase is achieved at the cost of a very high filter order. On the other hand, it is very difficult to compare the FIR and the IIR solution, so we are focused on designing PA filters which we intend to compare with a standard solution consisting of an elliptic filter with a cascaded group delay corrector.
To achieve an approximately linear total phase, the PA filters have pure delay parallel with the IIR all-pass filter, as given in Fig. 2 . The design of the PA filter is reduced for the determination of IIR all-pass filter with an adequate phase. It is important to point out two facts:
First, in addition to the standard realization structures (the parallel or cascade connection of the first-and secondorder sections), the all-pass filters could be realized by a multiplier extraction method i.e. with the least possible number of multipliers [24] , or even as a multiplierless IIR filter [25] . The reduced number of multipliers certainly leads to lower power consumption. In all the given examples, in order to achieve minimal consumption, all all-pass filters (the IIR all-pass filter from the PA structure and the correctors from the EC structure) will be realized with a minimal number of multipliers.
Second, there is the fact that in practice half-band filters are widely used for bringing efficiency to multi-rate applications. The half-band filter poles have symmetry and every other coefficient as a consequence is equal to zero. Hence, it is natural to expect that the half-band filter has reduced power consumption compared to a filter of the same order without magnitude characteristic symmetry.
The Design of an Approximately Linear Phase IIR All-pass Filter
To obtain a low-pass and complementary high-pass filter realized by the parallel structure shown in Fig. 2 , it is necessary to solve a system of equations (
where ω k are frequencies where the phase error curve has maxima and minima, ε 1 and ε 2 are maximal allowed phase errors and parameters m 1 and m 2 are the number of phase error extrema in the pass-band and stop-band, respectively. For that purpose, we have calculated the algorithm for the design of an all-pass filter with a quadratic phase φ(ω) = aω 2 + bω + c, explained in detail in [19] , with a = 0. Parameter r in (6) defines the nature of the first phase error extremum. For r = 0, the first extremum is minimum, which corresponds to the all-pass filter of odd order with a real pole at the frequency ω = 0. In all other cases, the value of r is equal to 1. The phases of the lowpass and corresponding high-pass PA filter are displayed in Fig. 3 . In the pass-band, the phase difference is close to zero while in the stop-band the phase of IIR all-pass filter deviates from the delay line phase by π radians.
The pass-band and stop-band PA filters are designed by solving the following system of equations ( 1) , 1, 2,...,
The total number of the phase error curve extrema m 1 + m 2 + m 3 is equal to N a , which represents the all-pass filter order. Since the focus of the paper is on hardware complexity and power consumption analysis, we will avoid a detailed explanation of the design procedure. For more details, the reader should see [2] and [19] .
Structures for Filter Realization
In general, cascade realization is based on first-and second-order section application to ensure that all coefficients are real. Numbers of multipliers, adders and delay elements are the same as in the canonic direct form realization. The multipliers are responsible for the major part of the digital filter power consumption. The all-pass filters are realized with a minimal number of multipliers in order to minimize power consumption and the chip area.
All considered elliptic filters are realized by implementing cascaded first-and second-order sections, except in case of half-band filters. The obtained results confirmed that in the half-band filter case, no gain can be achieved by implementing a method for the extraction of a minimal number of multipliers [24] . The half-band filter transfer function already has a minimum number of non-zero coefficients. In order to get a structure for hardware realization which contains a minimal number of multipliers, secondand fourth-order sections are formed, as given in Fig. 5 and Fig. 6 . Moreover, the used approach also gives optimal results for multi-band elliptic filters with symmetrical magnitude characteristic (Filter 3 in Tab. 1). The half-band filter poles and zeros show symmetry. Gathering four of them in a fourth-order section (Fig. 4.) , only two out of the four coefficients are nonzero. The phase and group delay correctors are also all-pass filters and could be realized with a minimal number of multipliers, in the same manner as the PA filters.
Regardless of the given filter specification, the PA filter proves to be a better choice if the minimum number of adders, delays and especially multipliers is the primary goal. Note that the results given in Tab. 1 are obtained by applying the method described in [24] , by using Type 1B first-order and Type 2A second-order sections shown in Fig. 7 and Fig. 8 , respectively. They were selected among other configurations due to the minimal number of delay elements and adders. These sections (Type 1B and Type 2A) are applied for the realization of all group delay correctors in the EC structure and IIR all-pass filters in the PA filter structure, except for the half-band filters (Filter 1 and Filter 2). Approximately linear phase PA filters contain only delay elements in one of the parallel branches (without adders or multipliers). This fact has a fundamental impact on the total hardware and power consumption. Beside the previously mentioned advantages, the obtained PA filter group delay is significantly lower compared to its EC filter counterpart.
Nonlinear phase filters realized as a parallel connection of two all-pass IIR filters also show benefits compared to the elliptic filters with corresponding correctors, despite the increased number of adders and multipliers compared to the linear phase filters. The analysis of the quadratic phase filter hardware complexity is described in paper [26] .
Evaluation
To obtain the all-pass filter transfer functions of the order N a , we adopted a filter design algorithm based on phase approximation [16] . In the other parallel branch, a delay line of the order N d is positioned (Fig. 2) . The value N a + N d -1 corresponds to the number of filter bands. In all the presented examples, minimal stop-band attenuation a min = 55 dB is chosen, as displayed in Fig. 9 . The displayed characteristics correspond to the first example named Filter 1. For the PA filter, the dependence between minimum stop-band attenuation and maximum pass-band attenuation of the complementary filter is straightforward 
Because of the very small maximal allowed phase error and equiripple nature, the group delay error is almost equiripple in the pass-band. The group delay error is only slightly increased near pass-band and stop-band boundary frequencies. The phase of the delay line is ideal linear at all frequencies. The IIR all-pass filter phase approximates the ideal linear phase in all bands with a π rad phase jump in every transition zone between an adjacent pass-band and stop-band, as displayed in Fig. 3 . The standard elliptic filter of the order N e , considerably lower than N a , achieves the same magnitude specifications, as shown in Fig. 9 . The group delay corrector of the order N c is introduced to provide the same maximum group delay error as the PA filter.
Among the numerous filter designs, we experimented with 4 typical designs which are chosen to exhibit the compiled results. The order of the filters, the number of elements for hardware realization and the value of the group delay are listed in Tab. 1, for all examples.
The first two examples, Filter 1 and Filter 2 are halfband filters. Because of the symmetry of the magnitude response, group delay correctors in the EC configuration are of the same order (Tab. 1). In order to obtain a PA halfband filter, the number of the phase error extrema in the pass-band m 1 and in stop-band m 2 have to be the same i.e. m 1 = m 2 = N a /2. The group delay of the PA filters and corresponding EC filters are displayed in the following figures.
Filter 1 is a PA filter realized with a parallel connection of the delay line of the order 25 and an all-pass filter of the order 26, where m 1 = m 2 = 13. As a consequence, the magnitude response of the PA filter has 13 frequency points at the value of attenuation 55 dB (Fig. 9.) . The transition zone boundary frequencies are 0.46π and 0.54π. The same magnitude restrictions are achieved with an elliptic filter of order 10. To obtain the same group delay error as a PA filter, the elliptic filter requires a corrector of the order 16 (Tab. 1). The analysis has revealed that the hardware realization of the PA filter demands 51 delay elements, Tab. 1. The number of hardware components for parallel all-pass filters (PA) and elliptic filters with correctors (EC).
28 multipliers and 28 -adders. To realize a low-pass and high-pass EC filter it is necessary to use 52 delay elements, 74 multipliers and 104 adders.
Filter 2 is also a half-band filter. The PA filter consists of a delay line of the order 17 in parallel connection with an all-pass filter of the order 18. As in case of Filter 1, attenuation in the stop-band is 55 dB. The lower order of the all-pass filter is the reason that the transition zones are wider, as can be noticed in Fig. 10 and Fig. 11 . The transition zone boundary frequencies are now 0.44π and 0.56π.
Both filters realized with a PA structure have exactly the same group delay regardless of the filter specifications. The half-band filters obtained with an EC structure also have the same group delay level (Filter 1 and Filter 2) as shown in Fig. 10 and Fig. 11 . The EC multi-band filters and filters without symmetry about the F sampling /4 frequency have different group delay levels in the bands, as displayed in Fig. 12 and Fig. 13 . It can be noticed that in all the cases, PA filters have a significantly lower group delay compared to their EC counterparts. Note that the values given in Tab. 1 are determined so that the PA and corresponding EC filters fulfill the same magnitude specifications. The order of the corrector from the EC structure is chosen to provide, as close as possible, the same group delay error already achieved with a PA filter. In all the given examples, group delay error is about 0.05 samples (Fig. 10) . Similar results were obtained during the analysis of the performance of Filter 4. The group delay of the PA filters has a value of 25.5 samples. The elliptic filters from the EC solution are of the order 9. The low-pass filter corrector is of the order 10 and the high-pass filter corrector is of the order 26. Low-pass and high-pass EC filters have a group delay of 39.99 and 35.79 samples, respectively. The lower order corrector of the order 8 and 24 provides a maximal group delay error of 0.2 and 0.12 samples. The group delay is still higher than in the PA solution (33.6 and 33.2 compared to 25.5 samples.) with similar power consumption compared to the PA filters.
FPGA Implementation
Both coupled all-pass and elliptic filters with phase correctors were described in hardware description language (VHDL) and implemented in a state-of-the-art FPGA device from the Xilinx Virtex-6 family (XC6VLX75T). For all hardware implementations, we adopted the same fixed point arithmetic, representing each coefficient with 32 bits (8 bits for the integer part and 24 bits for the fractional part). Placement and routing criteria used during the implementation process were balanced between speed and power consumption. The implementation results are summarized in Figs. 14 and 15 which illustrate FPGA resource requirements and power consumption, respectively.
It can be seen from Fig. 14 that PA filters require significantly fewer FPGA resources to be implemented, which is in accordance with the results obtained during the filter analysis. Given that both filter topologies use cascaded sections containing arithmetic circuits such as adders and multipliers, it was expected in theory and confirmed in practice that both filter implementations use a significant number of DSP blocks. FPGA devices of the Virtex-6 family contain DSP48E1 48-bit DSP slices [27] .
In addition to occupying less silicon area, PA filters are significantly less power-hungry, as illustrated in Fig. 15 . As evidenced, the majority of power in all implementations is consumed by the filter logic as well as by the interconnections between the logic blocks (signals). Right after these two parts, the DSP blocks stand as a third major consumer of power.
Finally, a filter implementation CAD tool reported that the PA filters are significantly faster in signal filtering, as they are able to accept a new sample of the input signal more frequently compared to the EC implementations, as evidenced in Tab. 2. Everything described above leads to the conclusion that, from a hardware point of view, PA filters are without compromise a better choice compared to their EC counterparts. There will be no trade-offs (no need to sacrifice implementation area to improve the filtering speed), only pure benefits in increased filtering speed and decreased implementation area as well as power consumption.
Conclusion
For real-time applications, lower group delay and reduced hardware complexity are the most important characteristics of the implemented filter. We performed a comprehensive analysis of hardware complexity and power consumption of a PA filter and an adequate EC counterpart. The analysis revealed that the PA filters are an optimal choice compared to EC filters, in applications where both complementary selective filters are of importance. All considered PA and EC filters meet the same magnitude specifications. The order of group delay correctors in the EC filter are chosen to achieve an almost equal group delay error, the same as in the PA solution. It was found that PA filters are capable of operating at higher frequencies compared to equivalent EC filters. The obtained results confirmed that in the half-band filter case, no gain is achieved by implementing the minimum number of multipliers extraction method. Moreover, the PA filter introduces significantly lower group delay. Those properties make PA filters superior over EC counterparts and more suitable for both hardware and software implementation in many different domains, especially in telecommunications where the need of separating noise from the signal is of utmost importance.
