Abstract -Software-Defined Radio (SDR) technology is evolving rapidly, offering higher flexibility for wireless communication networks. For the sake of performance, and power consumption, filtering is commonly implemented in hardware using FPGAs. Pulse shaping in the transmitter and the corresponding matched filtering in the receiver, which together satisfy the Nyquist inter symbol interference (ISI) criterion, are no exception to this. To decrease the FPGA resources used by filters, to increase speed and to decrease power consumption the filter coefficients can be optimized by expressing them in canonical signed digit (CSD) form, using as few arithmetic operations per filter as possible, while maintaining acceptable filter characteristics. In this paper a new method to decrease the number of nonzero signed digits is presented. With this method a reduction of up to 65% of the nonzero signed digits per filter is realized, while decreasing the ISI ratio too.
INTRODUCTION
Software-Defined Radio (SDR) is a rapidly evolving technology which is capable of enabling the flexibility required in modern-day wireless communication networks, by shifting functionality from hardware to far more dynamic software. A generic hardware platform provides the minimal analog functions and a well-chosen set of hardware accelerators to offload computationally intensive tasks from the processor. Filters are among these accelerators, important specifications of which are size, power consumption and delay. Pulse shaping in the transmitter and the corresponding matched filtering in the receiver, which together satisfy the Nyquist inter symbol interference (ISI) criterion, are no exception to this. In order to decrease the number of operations needed to realize these filters, their coefficients are optimized by expressing them in canonical signed digit (CSD) format. As a CSD numerical digit can take 3 values: +1, 0 and -1, more information can be represented using fewer digits compared to binary, without increasing the hardware complexity as an addition and a subtraction are basically the same operation. If K is the total number of nonzero digits in CSD representation of a coefficient, than K shifters and K-1 adders per coefficient are needed. While the CSD representation of the coefficients can considerably simplify the filter implementation, additional approximation of the coefficients can be beneficial. On the other hand by decreasing the number of nonzero signed digits per coefficient the quantization error will increase, which will result in suboptimal filters. Thus we want to decrease the overall number of adders needed to realize the filter while preserving the desired pulse shaping filter characteristics, such as: the ISI level after matched filtering and the ratio of peak ripple to the average level in the pass-band, and limiting the error from the ideal filter.
In literature, various algorithms for designing FIR filters with CSD coefficients have been proposed [1] [2] [3] [4] [5] [6] [7] . However, pulse shaping and matched filtering are more specific in the sense that the optimization of the coefficients for these filters should be done by taking into account the combination of both transfer functions. In this case, the optimization of filter coefficients becomes harder since for every set of coefficients the convolution of the impulse responses of the pulse shaper and the matched filter should be evaluated in order to verify whether the set is optimal or not. In this paper a new method to find a filter coefficient set for the pulse shaping and matched filter with a forset limited number of nonzero signed digits per coefficient in CSD format is presented. The idea is to increase the accuracy for the central coefficients which contribute to 90% of the filter power while decreasing the number of nonzero signed digits for the rest of coefficients. The remaining filter coefficients are rounded to the nearest CSD representation with a single nonzero signed digit.
The rest of this paper is organized as follow: in the next section the problem is discussed in more detail, in the third section the description of the new method is shown. Results and a design example are discussed in the fourth section while the final section provides some conclusions.
II. PROBLEM DFINITION
Let's consider the root-raised cosine filter (RRC) as pulse shaping filter. The frequency response of the RRC filter is given as [8] :
in which β is the roll-off factor and and are frequency responses of transmitter and receiver filter, respectively. Its continuous-time and finite-spectrum nature would require infinite amount of taps and precision for the coefficients. Given reality's limitations the number of taps should be finite, usually up to 100. Together with the matched receive filter the overall frequency response = * should ideally yield a Nyquist filter, hence satisfying the criteria for zero ISI [8] :
in which ℎ represent the impulse response coefficients of at time nT, M is the oversampling factor, h(N) is the central coefficient of impulse response and L is the symbol length of the filter.
The ISI will be zero for filters with infinite precision coefficients. However, any quantization of filter coefficients will result in nonzero ISI. The conventional measure of the ISI distortion can be expressed as the ratio of the sum of amplitudes of ℎ( ± ) terms over the amplitude of the central impulse response [3] :
The number of coefficients to be optimized depends on the filter symbol length and the oversampling factor. If the filter symbol length is 8 and the oversampling factor is also 8, then the number of coefficients to be optimized is 65. Due to the filter symmetry the real number of coefficients to be optimized will be decreased by a factor of 2.
The set of filter coefficients with infinite precision is given by inverse Fourier transform (IFT) of equation (1) . In order to be able to realize the filter in FPGA, coefficients should be written with finite precision length, thus limiting the number of nonzero signed digits per tap. In the CSD format each coefficient, , is given by [1] :
in which , ∈ −1,0,1 , N represents the bit length of the coefficient.
First of all we will define the theoretical minimal number of operations per filter as the number of the coefficients per filter. Since each coefficient of the filter cannot be written with less than one nonzero signed digit in CSD format then the theoretical lower bound of number of operations per filter equals the number of coefficients used. By quantizing each coefficient to the nearest CSD representation with a certain number of nonzero signed digits, obviously the quantization error will increase. However, the quantization error on particular coefficients is not a sufficient criterion, but the overall error of the chosen set of coefficients from the ideal one has to be considered. It has to be minimal compared to all other potential sets.
Another quality indicator of a pulse shaping filter with quantized coefficients is the inter-symbol interference ratio (ISI). Hence, two performance indicators need to be taken into account in order to find the optimum set of coefficients: equations (3) and (5), both need to be minimized. At the same time the ratio of peak ripple to the average level at pass band / should be kept under -30 dB [3] .
III. METHOD DESCRIPTION Let F be the set collection which contains all the sets of filter coefficients with word length of 16 bits. Let S be the subcollection of F which contains all the sets of coefficient combinations where every CSD representation of taps does not exceed the maximal number of signed digits per tap, K.
The maximal number of nonzero signed digits in one of the sets in S will be K*M, where M is the number of coefficients. We want to find the optimum set of coefficients out of S, such that the number of maximal nonzero signed digits per coefficient K is pre-fixed and the total number of signed digits in the set does not exceed M*K.
Rounding all the coefficients to the nearest CSD representation with a fixed number of nonzero signed digits does not necessarily yield the optimal solution. To broaden the search space, all sets including the D nearest CSD representations of each coefficient will be included. However, if more options per coefficient are taken into account the number of sets in S will increase exponentially with the number of coefficients to be optimized. So the design space S might become too large to scan for the optimum set of coefficients, hence, a way to decrease the number of sets in S is required.
If D is the number of combinations with K maximum signed digits per coefficient and M is the total number of coefficients then the sub-collection S will have potential sets. One should keep in mind that the convolution should be computed for every potential set in order to check equation (6) .
In order to decrease the searching space, only the D nearest CSD representations with K nonzero signed digits of the central coefficients that contribute to 90% of the total filter power are included in the search space, while all other coefficients are rounded up to the nearest CSD representation with a single nonzero signed digit. This is also the theoretical lower bound of the number of operations per coefficient. The representation of these coefficients with only one nonzero signed digit can be allowed since most of them have a low value and even if the quantization error of these coefficients is high it will not impact the ISI ratio too much or the overall error from the case with infinite precision taps. Doing so the number of sets in sub-collection S will decrease by a factor of where is the ratio of coefficients that do not take part in 90% of the filter power. By increasing the symbol length of the RRC filter this ratio increase too, this means the number of sets in sub-collection S will decrease. In turn the number of coefficients written with one nonzero signed digit in CSD format in total is increased. This brings the total number of operations per filter towards the theoretical minimal number of operations bound. So the ratio between theoretical minimal number of operations per filter and the number of operations per filter after optimization will approach 1 by increasing the symbol length (see table II, ratio γ).
In case a filter with filter length of 8 symbols and oversampling ratio of 8 is considered, the number of coefficients to be optimized will be 8 * 8+1 2 = 33, hence, S will contain 3 33 possible sets of coefficients to be checked. Out of these 33 taps just 13 central taps make up to 90% of the filter power, for which D=3 CSD options per tap are taken into account, while for 20 remaining taps a single CSD option is chosen. So by using this method to decrease the number of sets in S the new searching space will have 20 * 3 13 sets to search over. The number of sets in S is decreased by a factor of 3 17 .
In terms of FPGA resources used by the filter, we gain due to the higher number of coefficients written with a single nonzero signed digit. Taking into account that the absolute value of most of these coefficients is nearly zero and the fluctuation of coefficient values are low far from the filter center, then most of these coefficients are rounded to the same value. So most of these coefficients are written with the same expression which even further decreases the area used in the FPGA. Also, by using horizontal and vertical sub-expression elimination [9, 10] the number of adders and shifters will be reduced significantly.
The algorithm of this method is given below:
Step 1 Given the coefficients set of the filter with infinite precision, ℎ( ) ∞ .
Step 2 Find the central coefficients that take part in 90% of the filter power.
Step 3 Find the D nearest CSD format representations of these coefficients that do not exceed the maximal number K of nonzero signed digits per coefficient.
Step 4 Find the nearest CSD format of the remaining coefficients with a single nonzero signed digit.
Step 5 Fill in the search space S with all possible combinations yielding * − possible sets.
Step 6 Calculate the convolution of pulse shaper and matched filtering for each set in S in order to calculate equation (3).
Step 7 Continue searching the space S until the optimal set with (6) and (7) is fulfilled, while <-30 dB is found.
IV. RESULTS AND DESIGN EXAMPLE MATLAB R2013a was used to test the algorithm and to design RRC filters with different symbol length as pulse shaping filters. In Table I the parameters used for simulation are given.
It is known from [3] that a good approximation of FIR filter coefficients is typically achieved with 2 -4 nonzero signed digits per tap. So for each central coefficient that contributes to 90% of the filter power three nearest combinations (D=3) with maximum 2 nonzero signed digits per coefficient were taken (the worst case is considered). The choice of parameter D is crucial for the algorithm since it will define the searching space dimension. The other coefficients were rounded to the nearest CSD representation with a single nonzero signed digit, thus taking the theoretical lower bound of number of operations per coefficient. In this way the number of coefficients written with more than one nonzero signed digit was decreased. A graph showing the ratio of coefficients assigned to a single signed digit over the coefficients with more signed digits is given in Figure 1 . Also in Figure 1 ratio γ shows the ratio between theoretical minimal number of operations per filter and the number of operations after optimizations. By increasing the symbol length this ratio approach 1 since 90% of filter power will be concentrated in fewer central coefficients, which in turn increase the number of coefficients written with just one operation. Table II the data taken from the simulation is summarized. Number of taps gives also the theoretical minimal number of operations per filter. The starting point gives the truncated RRC filter with finite coefficient precision but without optimization of nonzero signed digits number. The ISI ratio is calculated at the output of the matched filter, after the convolution of the two RRC filters.
/ gives the ratio between the peak ripple and the average level at the pass band of the matched filter frequency response. Comparing the number of nonzero signed digits needed in the finite precision case without optimization with the number of nonzero signed digits needed after using the proposed method, it can be seen that around 65% less nonzero signed digits are used to describe the filter coefficients set (column V). Also the ratio between the theoretical number of operations per filter and the number of operations per filter after optimization is increased by increasing the filter length (column VI). So the optimization is higher for filters with longer filter length and number of operations per filter gets closer to the theoretical minimal number of operations per filter. At the same time the ISI ratio after matched filtering is decreased from 20 to 60 dB compared with the starting point while the ratio / is kept under -30 dB, as it is proposed in [3] . The error in time domain between the starting point and the optimized value lies between -15 and -17 dB which is acceptable based on [3] . Regarding the computation time it is in terms of seconds (up to 30 seconds) for short filter lengths (up to 17 symbols) while for larger filter lengths (up to 50) it is in terms of minutes (up to 2 minutes). Keeping in mind that the optimization is done beforehand then the computation time is acceptable. Table IV in [7] A. Comparision with simulated annealing algorithm We compared our results with the simulated annealing algorithm in [7] . For this comparison an RRC filter with roll off factor of 0.25, precision of 10 bits per coefficient and oversampling ratio of 2 was taken, as described in [7] . The results are given in Table III , where the ISI ratio is given in linear scale. It is seen that the ISI ratio after matched filtering is nearly the same or better with our algorithm but up to 78% less nonzero signed digits per filter are used. For SA algorithm there are no data presented for error on time domain, while our algorithm finds the optimum coefficient set with minimal ISI and minimal error. On the other hand the computational time for our algorithm is in terms of seconds and always will give the optimum solution since all the space S is searched. On the other hand SA algorithm for higher filter length than 39 do not give the optimal solution on the first run, so more runs of the algorithm are required in order to find the reliable solution, which increase the computational time too.
Design Example: As a design example we take RRC filter with symbol length of 6, oversampling factor of 8 and precision of 16 bits per coefficient. 25 coefficients have to be optimized in total. Figure 2 shows the magnitude response after matched filtering for the two cases: before and after coefficients optimization. It is seen that the stopband attenuation after matched filtering is around 34 dB ( δ b = −34 ). The ISI ratio from (3) is -71 dB. Figure 3 shows the impulse response after matched filtering for both cases: before and after coefficients optimization. The coefficients which are away from the filter center are rounded to CSD format with a single nonzero signed digit. Their quantization error in absolute terms is small due to their limit contribution to the total filter power. On the other hand, some of the central coefficients that take part in 90% of the filter power have higher quantization error. By increasing the maximum number of nonzero signed digits for central coefficients, let say to 4 signed digits, the quantization error will decrease too. The total quantization error in this case is -16.33 dB, which is on acceptable range [3] .
The quantized coefficients for the design example filter are given in Table IV . It can be noticed that most of the coefficients which are written with one operation have the same expression in common, which means that the FPGA resources used for the filter implementation are reduced further by exploiting methods for sub-expression elimination [9, 10] . For example, coefficients ℎ 5 , ℎ 6 , ℎ(7) have the same expression. The number of operations used for this implementation compared with starting case before optimization is reduced by 65%, as it is shown in Table II (row 6).
I. CONCLUSIONS
A new approach to design multiplier-less pulse-shaping and matched filters with minimal number of nonzero signed digits is introduced. The new approach takes into account the central filter coefficients that contribute to 90% of the filter power by using more nonzero signed digits for these coefficients, while rounding the others to the nearest CSD value with a single nonzero signed digit. It was shown that up to 65% less nonzero signed digits per filter were used compared to before optimization. At the same time the ISI ratio was decreased from 20 dB to 60 dB while the peak ripple in the stop band to average level in pass band was kept under -30 dB.
Comparison with simulated annealing algorithm shows that the ISI ratio was nearly the same or better with the new algorithm, however up to 78% fewer nonzero signed digits were used. This in turn reduces the FPGA resources used for pulse shaping filter realization with 65% compared with the FPGA realization without optimization.
ACKNOWLEDGEMENT Part of the work was supported by the iMinds IoT research program. 
