This paper deals with an FPGA implementation of a high performance FM modulator and demodulator for software defined radio (SDR) system. The individual component of proposed FM modulator and demodulator has been optimized in such a way that the overall design consists of a high-speed, area optimized and low-power features. The modulator and demodulator contain an optimized direct digital frequency synthesizer (DDFS) based on quarter-wave symmetry technique for generating the carrier frequency with spurious free dynamic range (SFDR) of more than 64 dB. The FM modulator uses pipelined version of the DDFS to support the up conversion in the digital domain. The proposed FM modulator and demodulator has been implemented and tested using XC2VP30-7ff896 FPGA as a target device and can operate at a maximum frequency of 334.5 MHz and 131 MHz involving around 1.93 K and 6.4 K equivalent gates for FM modulator and FM demodulator respectively. After applying a 10 KHz triangular wave input and by setting the system clock frequency to 100 MHz using Xpower the power has been calculated. The FM modulator consumes 107.67 mW power while FM demodulator consumes 108.67 mW power for the same input running at same data rate.
Introduction
In the prevalent audio broadcasting applications like private mobile radio (PMR) and digital audio broadcastingterrestrial (DAB-T) standards, excellent clarity along with the source stability is required for the voice transmission. Frequency modulation (FM) scheme is used in most of these standards. Traditionally, FM signal generation was performed using some analog components to support the audio broadcasting standards. But difficulties arose in analog FM modulation scheme due to the use of the voltagecontrolled oscillator (VCO). Using the VCO, it is very difficult to obtain a good clarity as well as source stability in FM-modulated or demodulated signal as VCO suffers from lack of linearity over the desired frequency range. Therefore, digital implementation of FM modulation scheme has evolved to replace the traditional analog counterpart. Nowadays, to get superior performance and good voice clarity in any audio broadcasting system, digital FM modulation and demodulation technique is widely used. To ensure linearity over the entire frequency range, designers choose to replace the VCO by a DDFS, sometimes referred to as a numerically controlled oscillator (NCO). Considerable research has been performed on different digital FM modulator architecture. Some of these have laid stress on reducing the distortion effects of quantization noise, which occurs due to the bit resolution at the input and output of the DDFS [1] . Some of them have discussed the area optimization and low-power consumption [2] [3] [4] as main objective. In the present work, one high-speed, low-power, and reduced-area digital FM modulator has been implemented in the FPGA device to support the audio broadcasting system in software-defined radio (SDR) system.
There exist various architectures [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] for implementing digital FM demodulator into a single chip, although their performance has mostly been limited by analog signal processing accuracy. The basic fundamentals behind FM demodulation are how to discriminate accurately a small 2 International Journal of Reconfigurable Computing frequency deviation of the FM-modulated signal from its center frequency. PLL method is one of the popular techniques for FM demodulation. It can be easily implemented in integrated forms, but sudden departure from its linearity property of the VCO in some portions of the frequency range degrades the overall system performance. Digital PLLs provide a better possible solution to overcome some of the bottlenecks of analog PLLs [15] . Due to this, in the present FM demodulators, the digital phase-locked loop (DPLL) is mostly used for accomplishing the frequency discrimination. The DPLL tracks the variations in the received signal phase and frequency. There are also some other techniques by which the frequency can be computed from the ratio of the in-phase (I) and the quadrature (Q) components. Modern communication revolves around high-speed, high data rate transmission and reception. DPLL-based implementation of FM demodulators in DSP often does not meet such demanding requirements of a wireless communication system. An alternative solution is to implement it in FPGA due to its flexibility and modularity. A reduced-area, low-power, and high-speed linear digital FM demodulator using the DPLL technique [5, 6] has been implemented towards the development of an SDR system. Componentwise improvements have been carried out in this work to get compact architecture, a faster system clock, and achieve a less power consumption while compared with existing implementations of digital FM demodulator. In SDR application less-area and low-power consumption with high data rate support is the key concern. Targeting to the next generation SDR-based wireless communication transceiver, in this work all the basic components of DPLL-based FM demodulator are fully optimized without losing the system output behavior in comparison with the previous DPLLbased FM demodulator implementations.
The present paper is as follows. Section 2 describes the principle and architecture of FM modulator and DPLLbased FM demodulator along with the architecture of individual component of FM modulator and DPLL-based FM demodulator, and in Section 3, FPGA implementation results in terms of synthesis results, simulation results, onchip-verified results, and comparison results are mentioned. Conclusions are summarized in Section 4.
Architecture of Digital FM Modulator and DPLL-Based FM Demodulator

FM Modulator.
In the FM modulation technique, which is a kind of angle modulation methods, the instantaneous frequency of the carrier signal varies linearly with the baseband-modulated message signal m(t) as follows:
where A c is the amplitude of the carrier, F c is the carrier frequency, and K f is the frequency deviation constant. The architecture of the FM modulator is as shown in Figure 1 . [3] FM out [4] FM out [5] FM out [6] FM out [7] The FM modulator consists of (1) an FM data generator, (2) an interpolator with interpolation factor of 32, (3) an accumulator, and (4) a DDFS block. The FCW signal has been used for generating different carrier frequency. The accumulator block adds the instantaneous frequency of the input audio signal with the selected carrier frequency. And finally DDFS block takes this frequency as an input and generates the FM-modulated signal. The architecture of the DDFS block has been discussed in the later section.
(1) FM Data Generator. Usually, FM modulation scheme is used to support audio processing in the audio broadcasting system. Generally, the audio signal is processed in the range between 44 Kbps and 320 Kbps. The FM input data is sampled at each FM symbol clock and stored in a register for further processing. The digitized input data is passed through a serial-to-parallel converter to generate the 8-bit FM input data. The architecture for FM data generator is shown in Figure 2 .
(2) Interpolator. Interpolator block is used in FM modulator to get a better power level for the FM transmission. In this work, an interpolation factor of 32 has been used between two consecutive audio samples. The circuits first calculate the difference and then divide the remainder by 32. To perform the division by 32, the new FM input data is shifted by one bit before and by four bits after the subtraction. Then the output is added with the previous input data on every symbol clock. One subtractor, one adder, and some registers are required to perform the interpolation operation in hardware. The architecture for the interpolator is as shown in Figure 3 .
The power spectral density of the designed FM modulator has been shown in Figure 4 . demodulator has been conceived in the early 1970s [16, 17] . The input frequency modulated signal can be expressed as follows:
Feedback loop mechanism of the PLL makes the DDFS to generate a sinusoidal signal V 0 (t) with the same frequency as
The output of the phase detector, which is the product of these two signals, is found using familiar trigonometric identity:
where K d is the gain of the phase detector. The first term in (4) corresponds to the high-frequency component. The second term corresponds to the phase difference between V i (t) and V 0 (t). The phase difference, that is, (θ i (t) − θ 0 (t)) between the modulated signal and the carrier produces the desired original signal with frequency ω i .
Using bilinear transformation The single most important point to realize while designing the PLL is that it is a feedback system and, hence, it is characterized mathematically by the same equations that are applied to other more conventional feedback control systems. The mathematical model of the digital PLL system can be derived to analyze the transient and steady state responses. The block diagram of a typical digital PLL system in z domain [18] (discrete domain) and its transformation in s domain [19] (continuous time) is shown in Figure 5 .
The transfer function of the system is
The second-order DPLL system improves the performance of the loop in terms of speed and locking range as compared to the first-order DPLL system. That is why the DPLL system used here is a second-order system. The unit step response curve is obtained using MATLAB for the system shown in Figure 6 . From the figure it can be seen that the system is stable with overshoots at the transient state.
The complete FM receiver consists of the basic building blocks as shown in Figure 7 . The FM receiver consists of four basic parts: (1) Phase Detector (PD), (2) Loop Filter (LF), (3) Direct Digital Frequency Synthesizer (DDFS), and (4) FIR Filter.
Phase Detector
The phase detector is used to detect the phase error between the incoming frequency-modulated signal from the ADC and the output frequency generated from the DDFS. This operation needs one register and one multiplier module. The modified Radix-4 Booth-Encoded Wallace-tree multiplier [20] [21] [22] architecture is used instead of a signed arithmetic multiplier. This architecture has been chosen because it 4 International Journal of Reconfigurable Computing 
Time (seconds)
Amplitude
Step response of PLL-based FM demodulator reduces the number of partial products to N/2 for an N * M bit multiplication process. Conventionally, in a radix-4 Booth multiplier, there are three basic steps to be followed: (1) generate the reduced partial product according to Booth's algorithm, (2) reduce the number of additions of the partial product, and finally (3) use a high-speed adder like carry look-ahead adder (CLA) for the last two rows of the partial product tree. For the signed multiplication operation, the sign extension scheme has been combined with Booth's algorithm which is known as the modified Booth algorithm. To multiply X by Y using radix-4-modified Booth's algorithm, the three bits of the multiplier part will be Neg (a) Booth encoder grouped and will be encoded into one of {−2, −1, 0, 1, 2} as per Table 1 .
The modified Booth encoder, which is shown in Figure 8 (a), is implemented using some logic gates. The partial products are generated using the Booth decoder as shown in Figure 8 (b). The block diagram of the Boothencoded Wallace tree multiplier [21] is shown in Figure 9 .
While generating the partial products from the modified booth decoder, then we followed Fadavi-Ardekani's [23] sign extension prevention. Wallace Tree Carry Save Adder structure [24] has been used for adding the Pi + 1 with Pi in a parallel fashion until the last two rows remained. The last two rows have been added using a very high-speed Carry Look-ahead Adder (CLA) to obtain the final multiplication result. The architecture as a block diagram of the designed 8 × 8 bit multiplier using modified Booth's algorithm is shown in Figure 9 . Here the multiplicand is X and multiplier is Y · Y input is encoded by Booth encoder to generate the encoded signal which is used by the Booths decoder to generate the partial product term by taking X as the input. After generating all the partial products, the Wallace tree performs the addition operation in a parallel fashion. Finally CLA is used to complete the multiplication procedure of the two 8-bit numbers.
Loop Filter
Loop filter, which is a first-order lowpass filter, is used to remove the high-frequency components of the output of the phase detector given by (4) . Figure 10 shows the block diagram of the first-order loop filter used in the DPLL-based FM demodulator system. The transfer function of the loop filter is given by
Equation (6) can be implemented in hardware by the addition of the output signal from the phase detector (PD OUT) and the register output multiplied by a coefficient α = (1 − 1/16) = 15/16 = .09375, which is chosen to ensure the system stability. Multiplication by a factor of 1/16 has been implemented by 4-bit right shift instead of a multiplier.
Direct Digital Frequency Synthesizer
DDFS finds wide use as a component in modern communication system, radio detector, electronic warfare, high precision measurement system, and high precision biomedical applications. DDFS accepts arbitrary frequency as its reference frequency depending on the frequency control word and generates one or more frequencies. The DDFS architecture was first given in [25] . The arithmetic operations required to build a DDFS are a phase accumulator which generates the phase for generating the cosine waveform and a phase to amplitude converter. Various researches have been performed to design a high-performance circuit for phase-to-amplitude conversion as summarized in [26] [27] [28] . The quarter-wave symmetry ROM technique is very useful where a very low phase resolution has been used [29] . Many ROM compression techniques have been proposed, but for low-resolution bit, these techniques are not suitable as they maximize the error. This DDFS has been designed for waveform synthesis in DPLL-based FM demodulator. The ROM-based DDFS has been designed for use in the DPLLbased FM demodulator. In DPLL-based FM demodulator, the quadrature output from the DDFS is not required. Due to this fact, the ROM-based architecture (LUT) is considered to be superior to the CORDIC-based architecture [30] for the phase-to-amplitude conversion. To overcome the disadvantages of ROM-based DDFS, namely, high-power consumption and low speed, a pipelined ROM-based DDFS approach has been considered in the present work. Pipeline technique will help reduce the power consumption and also maximize the operating frequency. In the present work, the designed pipelined look-up table-based DDFS architecture is used as shown in Figure 11 .
Fir Filter
The DDFS consists of a phase accumulator, a ROM lookup table, two 1's complementers, a pipelined register, and an XOR gate. The designed DDFS has a free running frequency of 1 MHz and requires 1024 sample values to define one cycle of a cosine signal. The DDFS generates the cosine signal waveform by addressing the cosine ROM LUT at a frequency set by an 18-bit control word. If the reference system clock (Fclk) is set to 100 MHz, then the frequency resolution will be 381.468 Hz. According to the accumulation rate in phase accumulator set by the FCW, the ROM produces the cosine waveform at that programmed frequency. In this implementation, the frequency control word (FCW) and the output bits have been chosen to be 18 bits and 8 bits, which provides spurious free dynamic range (SFDR) of 64.3 dB. As the design is pipelined, the frequency switching will suffer from 2-clock cycle latency.
MATLAB 7.4.0 version is used for the performance analysis of the designed two DDFS blocks. The floating point cosine wave generated using the MATLAB in-built function and cosine wave generated by proposed pipelined ROM-based DDFS has been analyzed. The results are shown in Figure 12 . Hence the FLTPNT COSINE is the MATLAB-generated cosine wave and FXDPOINT COSINE is the cosine wave generated by our proposed ROM-based DDFS. The error between these two signals in the first quadrant has been shown in Figure 13 (as quarter wave symmetry property has been adopted). The minimum error is −0.0088 and maximum error is 0.0089 which is nothing but the quantization error (as 8 bits of amplitude has been considered in proposed design).
At the last stage of the receiver, a lowpass Finite Impulse Response (FIR) filter is used to perform the signal shaping. Here a 16-tap transposed FIR filter architecture [31] is used, as shown in Figure 14 . This filter is essentially an averaging 6 International Journal of Reconfigurable Computing filter since its output is equal to the average value of its input values over the last n-tap samples where n is the number of taps used. As in direct form digital FIR filter the total propagation delay of the circuit increases more due to the addition of the 16 data samples, a transposed FIR filter architecture is chosen [32] [33] [34] in the present implementation. Here the coefficients are the same 1/16, and in reality 1/16 can be implemented by just 4-bit right shift operation. Hence no multiplier is required.
FPGA Implementation Details
7.1. Synthesis Results. The proposed FM demodulator has been described using the Verilog hardware description language and Xilinx ISE 9.2i is used for synthesis and FPGA implementation. Xilinx XCV2vp30-7FF896 device has been used as the target device for FPGA implementation, XST has been used as a synthesis tool, and XPower has been used for power calculation. The power is being calculated by simulation-based switching activities of all the signals. The synthesis results for the FM modulator and demodulator have been listed in Table 2 . Table 3 listed the detailed dynamic power analysis result by applying a 100 Mbps data rate to the FM modulator and demodulator. Table 4 shows the componentwise implementation reports.
Simulation Results.
For the post place and route simulation in FPGA, the Modelsim-Xe 6.3c Starter version from Mentor Graphics is used as a logic simulator. The modulated response of a 10 KHz triangular wave is shown in Figure 15 . The demodulated response of the FM-modulated signal is as shown in Figure 16 . Carrier frequency has been taken to be 1.5 MHz by setting the frequency control word to 512 and the input clock to 100 MHz with a modulation index of 10. In Figure 13 , the signals from the top represent the input triangular wave (TRIANG INPUT), the frequency control word (FCW) for setting the carrier frequency, and the modulated input data (FM MOD). For the triangular wave the modulated signals from the top are the demodulated output data (FM DEMOD), the modulated input data (FM MOD), the FIR filter output (TRI FIR), the loop filter output data (TRI LOOP), the DDS output data (TRI DDS), and finally for phase detector output (TRI PD) shown in Figure 16 . At the initial simulation phase, the demodulated output overshoots since the phase synchronization is in convergence phase and after that the system is stable. 
On-Chip-Verified
Results. The designed system has been implemented using the Xilinx impact tool to the Virtex-2 Pro University Board. Xilinx Chipscope-Pro 9.2i has been used for capturing the demodulating (FM DEMOD) data for verifying the FPGA implementation result of the designed circuit. Here 2048 samples of the output have been captured after implementing the design into FPGA. The captured output results are shown in Figure 17 for triangular input. It can be concluded from inspection of these figures that 
Comparison
Results. By optimizing the basic components of the FM demodulator, the reduction of the hardware usage and improvement in the performance has been done. Table 5 has summarized the comparison result with other ROM compression techniques while implementing a direct digital synthesis. In this context, Table 6 shows the comparison result with other existing FPGA implementations of FM demodulator [7, 11] .
The proposed circuit has been synthesized using the Leonardo Spectrum 2005b.24 Level 3 from Mentor Graphics using the TSMC 350 nm (typical) as a target technology library. During the synthesis, speed has been considered as the main constraint for the designed circuit. Another FM receiver circuit has also been designed and synthesized using the Leonardo Spectrum 2004a.63 from Mentor Graphics and TSMC 350 nm (Fast) as a target technology library. From Tables 7 and 8, it is observed that the FM demodulator designed in this chapter is better in performance compared to the available DPLL-based FM demodulator [5, 6] .
Conclusions
A new high-performance digital FM modulator and a digital phase-locked loop-based FM demodulator have been proposed in this paper. The FM modulator and demodulator are designed to satisfy the constraint for the application in personal wireless communication and digital audio broadcasting. Individual componentwise optimization has made the overall design superior than other implementations. FPGA implementation of the proposed design has been carried out for quick prototyping of the digital FM modulator and demodulator chip. The simulation and synthesis result of FM modulator shows that the digital up conversion is very much possible as it can achieve maximum clock frequency of 334.5 MHz. From the on-chip-verified result it International Journal of Reconfigurable Computing 9 can be clearly seen that the proposed FM demodulator can demodulate the signal back in its original form by consuming only 6.4 K equivalent gate count. The comparison results for both FPGA and ASIC implementations have shown that the proposed design is superior to the existing digital FM chips. Hence it is concluded that the designed high-performance FM modulator and demodulator can be easily fitted into the next generation software-defined radio-based handset where low power and minimum hardware utilization with the maximum clock frequency are desired features.
