In this paper, we study the hardware implementation of arbitrary sample rate conversion (ASRC) using recently proposed variable fractional delay filter (V-FDF) structures. The most commonly used solution to implement V-FDFs has been the Farrow structure for the last three decades. In this work, we develop and compare the implementations of different recently proposed V-FDF options based on the Newton structure. These implementations are done on both ASIC and FPGA targets. The obtained results show that the recently proposed solutions offer similar ASRC performance while using up to 3 times less resources relatively to the classical Farrow structure. The generic nature of these filters make them suited for a large number of standards.
I. INTRODUCTION
For many years, the evolution of digital signal processing (DSP) in telecommunication systems has been focused on improving processing power performance. Recently, more focus is increasingly given to improving the system's energy efficiency, and to building lower cost hardware. One of the main motivators of this trend is the evolution of the Internet of Things (IoT) domain, where billions of devices and their corresponding gateways are being deployed. To make this deployment possible at an acceptable cost, both devices and gateways need to be maximally optimized. A main part of every telecommunication system is the interface between radio frequency and baseband domains, implemented digitally in modern systems and known as the digital front-end (DFE) [1] . The DFE has two main roles, sample rate conversion (SRC) and filtering [2] . In this work, we are concerned with SRC. Two types of SRC exist in the DFE. The first is coarse SRC, where the sampling rate is increased or reduced by an important integer factor. The other type is known as fine SRC or arbitrary sample rate conversion (ASRC). Generally, fine SRC is more complicated to implement than coarse SRC due to the extra required precision.
To implement ASRC, a variable fractional delay filter (V-FDF) is the most efficient solution. The Farrow structure [3] , dating back to 1988, is the most widely adopted implementation option that is found in most of today's systems. An advantage of this structure is its capability of implementing any ASRC operation, with a reconfigurable conversion factor. However more efficient structures can be found in the literature. In 2009, the Newton structure [4] was adapted to ASRC in [5] , however limited to the filtering response of Lagrange interpolation. Later more recent work generalized this structure to Spline and Hermite interpolations [6] [7] . Hermite interpolation is preferred for the DFE, due to its wide pass-band and good SRC image rejection performance.
In this paper we aim to investigate the practical hardware complexity of these recently proposed solutions. The rest of the paper is organized as follows. Section II presents an overview of both the most common classical SRC solutions in modern DFE systems, and the recently proposed Newton structures. In Section III, the implementation methodology based on the pipeline approach is developed, and the hardware architecture model is detailed. In Section IV, we implement the developed architectures on both FPGA and ASIC targets. We then compare the Hermite based Newton structures to the modern SRC solutions in both terms of complexity and performance. The conclusion is finally presented in Section V.
II. ASRC SOLUTIONS FOR DFE SYSTEMS
One of the main DFE roles in modern transceiver systems is adapting the sampling rate between the radio frequency and baseband domain [1] [2] . This conversion is done practically using both coarse and fine-tuned SRC modules. The SRC operation can be used at different locations in the same DFE of a radio transceiver. Moreover, fine SRC have other applications in the DFE, most notably for implementing synchronization functions. Therefore, ASRC modules are a main part of today's transceivers, and their implementation efficiency plays an important role in optimizing the total system. We usually find the coarse SRC implemented using cascaded-integrator-comb (CIC) filters [8] , due to their efficient structure consisting of only registers and adders. However, the CIC filters cannot implement all kind of SRC operations, and are only practical for coarse factor SRC. In the case of fine-tuned SRC, ASRC modules are required, and the Farrow structure is the most common solution [3] . This structure implements a polynomial based V-FDF. Polynomial based means that the filter impulse response is constructed To perform the ASRC operation, the value of µ is updated for every output sample, as explained in [5] . The µ value is then used by the V-FDF to calculate the value of the corresponding output. In practical implementations, there are two important points to consider. First, the number of input and output samples is different, since the sampling rate is modified. Second, the hardware module operates at a single digital clock rate higher than both the input and output sampling rates. Therefore a control module is required alongside the V-FDF to manage the samples stream, and to calculate the required output instant µ for each sample. This control function is configured using the input up-sampling U and down-sampling D parameters representing the SRC factor R = U/D. The proposed ASRC module architecture is shown in Figure 1 .
The implementation of the V-FDF using a Farrow structure of order 3 is shown in Figure 2 -a. This structure consists of multiple FIR filters G i (z) that have their outputs multiplied by µ according to the Horner scheme, to find the final output y[m]. For a theoretical understanding of the structure, the reader can refer to [3] and [9] . The Farrow structure can implement any polynomial-based filter response, however the most simple and commonly used option is Lagrange interpolation. To achieve side lobes rejection levels of at least 30 dB, an order 5 is required.
A structure developed in [5] modifies the original Newton structure [4] into a V-FDF form compatible with ASRC. The structure of order 3 is shown in Figure 2 -b. Compared to the Farrow structure, the Newton structure is designed for Lagrange interpolation only. However structurally, an order N interpolation can be achieved with a complexity of order O(N ) through the Newton structure, compared to a quadratic order of complexity O(N 2 ) for the Farrow structure. Later work extended the Newton structure to implement Spline [6] and Hermite [7] interpolation. Spline interpolation is not very interesting for multi-standard DFE systems due to its very small passband, and its bad scaling with interpolation order. These two architectures are very promising from a structural level point of view. In this paper, we are interested in investi-
(a) Farrow structure of order 3 [1] x
Newton structure for Lagrange order 3 [5] x[m]
(c) Newton structure for Hermite order 3 [7] x[m] gating the hardware implementation complexity of these recent structures, and compare them to the modernly used ASRC solutions. In the next section, we develop the methodology we used to implement these structure in order to compare their complexity on both ASIC and FPGA targets.
III. HARDWARE IMPLEMENTATION METHODOLOGY
This section develops the implementation approach used to obtain the results discussed in Section IV. The complete filter is composed of two modules, the controller and the V-FDF filter. As discussed in the last section, the controller is responsible for managing the SRC operation, while the V-FDF filter is only responsible for calculating the output samples.
To implement the ASRC controller, a finite state-machine (FSM) is the most appropriate approach. This FSM has the U and D parameters as inputs. The control FSM has 4 states, starting from the initial IDLE state, the FSM may be in the NEUTRAL state when there is only one sample output corresponding to the current input. The FSM may also be in the INTERPOLATION and DECIMATION states when more outputs than inpust or the opposite conditions exist respectively. The value of µ is continuously updated to keep track of the output sample time delay.
Since there could be more outputs than inputs or vice versa, a mechanism to block or advance the samples stream at certain instants is needed. In this work we used a handshake protocol based on "ready to send and receive" signals. Each module is responsible of signaling its own status. Between the controller and the V-FDF filter modules, extra control signals are used to signal if a register update or a new calculation is required.
To implement the V-FDF module, we use the pipeline approach that is the most adapted for multi-standard DFE, where processing speed is privileged. The pipeline approach consists of breaking the filter structure into stages, with each stage containing only one calculation operation. This has the objective of minimizing the critical path length, and maximizing thereby the operation frequency. The pipeline architecture for the modified Newton structure for Hermite interpolation of order 3 is shown in Figure 3 .
The quantization of this implementation is done using fixed point representation. The signal's quantization parameters are found by developing the analytical expression of the quantization error [10] . This expression is then used to find the signal to quantization noise ratio (SQNR). The optimal quantization parameters are then found using exhaustive research for a given SQNR. Considering for example an input signal quantized on 2 and 16 bits for the integer and fractional parts respectively, Figure 3 shows the quantization parameters for tolerated SQNR degradation of less than 0.6 dB. This quantity is chosen in order to have negligible deterioration of the effective number of bits due to quantization. The signals are quantized relatively to the input, where the number of added bits for integer and fractional parts is shown between braces, e.g. {x.y} → (2+x.16+y). For the fractional delay µ quantization on 6 bits, we referred to the work presented in [11] . Finally, the U and D parameters are quantized on 18 unsigned bits for an ASRC precision of 5 ppm.
Using the approach developed above, the hardware implementations of the different V-FDF options shown in Figure  2 are developed with the same quantization performance. An implementation of the CIC filter is also developed to compare the complexity between fine and coarse SRC solutions.
IV. IMPLEMENTATION RESULTS AND DISCUSSION
We start the discussion by comparing the filtering performance of the different structures. The objective of an SRC filter is to remove the signal images on the multiples of the input sampling frequency F s . As it is shown in Figure 4 , the first side lobe attenuation for Lagrange interpolation of order 5 is around −33 dB. However for both cases of Hermite interpolation of order 3 and 5, we find a side lobes rejection improvement by around 10 dB. All three responses have a maximally flat passband, and zeros on multiples of F s . The only major compromise is the Hermite interpolation of order 3 having weaker zeros. However this may be tolerable for certain (1) b0 (2) b1 (2) b0(3) b03 (2) b03 (3) b03 (4) b0_old
b13 (0) b03(0) b13 (1) b13 (2) b03 (1) b2s (0) {0 SRC applications that do not have high image attenuation requirements, which is the case of the SRC modules in the DFE that come after the filtering operations.
To develop the ASIC implementation, we used the Cadence Encounter tools with the X-FAB XH018 technology. The results are resumed in Table I . The implementations were optimized to operate at 180 MHz. It is seen that implementing Lagrange interpolation using the Newton structure only requires 70% the resources used by the reference Farrow structure. However, the Newton Hermite structure of order For the FPGA implementations, we used the Xilinx Virtex-6, that gave the results shown in Table II . The dynamic power draw of the SRC cores in this case are estimated using Xilinx Power Estimator (XPE) for an operating frequency of 160 MHz, a signal toggle rate of 20 MHz, and 1.0V core voltage. The objective is not to compare the ASIC and FPGA implementations, but rather to study the implementation complexity on FPGA. The order of complexity between the Newton and Farrow structures stay the same, however the implementation of a CIC filter using look-up tables on an FPGA is not optimal, where the relative complexity is much larger on FPGA than on ASIC. Regardless, the results clearly show that the Hermite based Newton structures offer an advantageous replacement of the widely used Farrow structure, by offering improved performance at a lower complexity cost.
V. CONCLUSION
In this paper, we developed the hardware implementation of different ASRC modules, including the recently proposed modified Newton structures for Hermite interpolation. The quantized hardware architectures were developed using a pipeline approach, and the implementation was then done on both FPGA and ASIC. The results showed that the different structures are able to operate at very high frequencies, making them useful not only for IoT standards, but also for high performance wireless standards. The results also validated the high efficiency of the recently proposed Newton structure for Hermite interpolation relatively to the classical Farrow and Newton structures.
