Abstract: Dynamic partial reconfiguration (DPR) technique is a very efficient for low-cost field programmable gate array (FPGA) for realising several application categories like signal processing. The present work demonstrates a generic framework for implementing Software Defined Radio (SDR) based communication system using DPR. The work switches contexts with two partial reconfiguration blocks. Namely spectrum estimation and frequency shift keying (FSK) receiver blocks. The former uses the streaming type fast Fourier transform (FFT) and later uses frequency shifting and filtering stages. The completely developed FSK receiver is simulated using Modelsim. Xilinx Zynq 7010 SoC with DPR is used for implementation. An FSK signal with symbol rate 64 Kbps is used to drive the analogue to digital converter (ADC) input. The work demonstrates novel direction for SDR implementation with DPR for low-area FPGAs. The results shows 45% lesser FPGA-DSP48 slices are used compared to without-DPR. Reduced power dissipation is observed as byproduct.
Introduction
Dynamic partial reconfiguration (DPR) is a novel method in field programmable gate array (FPGA), where a portion of FPGA can be configured on the fly (at run time) with required functionality so that same FPGA block can be configured for different applications. The most important advantage of using dynamic partial reconfiguration (DPR), with the help of FPGA, is that the functionality of FPGA can be changed or updated at some time in the future. DPR makes it possible for only a portion of the dynamic area of FPGA reprogrammable, while without disturbing the static are of the FPGA. The programming the dynamic area takes place without disturbing the static area and system setup. The flow of data is unaltered and FPGA programming is upgraded in this manner on the fly. The subsystem functionality may be loaded with next version of the algorithm. Partial bitstream instead of full bitstream used for DPR purpose. As a result, reconfiguration times are reduced considerably. Researchers are exploring the opportunities towards DPR to achieve several advantages such as low area, low power and multiple configuration features.
The major research in DPR is growing in application-specific direction where a specific application is being prototyped using DPR technique. In this direction researchers are addressing challenges which are specific to each design. Due to this even though the DPR is under study from last two decades there is no major industrial utilisation of this feature. There is absolute necessity for studying DPR towards low area signal processing applications. The key research element emphasised for the present research is to establish a framework for realising any software defined radio (SDR) digital signal processing application using DPR. DPR technique dramatically improves the functionality of the single FPGA. One of such application is SDR. In applications of SDR type, time multiplexing of functionalities is possible. Here, DPR way of programming the FPGA genuinely yield good results. The hardware is switched and not the software. The major challenges in using DPR for SDR applications are given below.
• Most of the SDR applications demand streaming kind of sample processing. In data stream processing there is continuity between each sample block data to next block data. The DPR module must be able to process the data correctly.
• To implement FSK demodulator using DPR technique effectively, the receiver architecture need to be organised to ensure immediate usability after configuration.
• A state machine-based controller is required to be implemented in ARM9 processor to configure each partial reconfiguration block based on the sequence of operations in SDR and spectrum sensing algorithms. This state machine should be able to operate under two partial bit files, which can have asynchronous operations and execution times, latencies and any other initial conditions.
• Reducing the latency/improving the throughput in the presence of unavoidable reconfiguration times
• Finding enough memory resources for storing intermediate results on board FPGA
• Optimising the design for fewer FPGA resources especially dedicated DSP48 blocks of Xilinx FPGA and see that design fits in a low cost FPGA In this paper, an architecture addressing above challenges is proposed and same is verified on Zynq FPGA board. The remaining part of this paper is divided into seven sections. In Section 2, discusses the related work done by other researchers. Section 3 illustrates the architecture of proposed design. Section 4 has Implementation details. Section 5 has VHDL simulation results. Section 6 presents the hardware implementation and verification. In Section 7, the conclusion of the work is presented.
Literature survey
The present work is part of ongoing research towards developing digital demodulator using DPR. The work presented at (Reddy et al., 2015) discusses the block-based processing concept to develop FSK demodulator addressing some of the challenges discussed above. The work analyses advantages of dual memory architecture for partial reconfiguration. At present Xilinx FPGAs offer only single partial reconfiguration memory. The present research is continuation of the work at (Reddy et al., 2015) to prototype the proposed block-based processing in single partial reconfiguration memorybased present Zynq FPGAs. In early 70s, the need for reconfiguration was identified and the first equipment released was 'SPEAK easy' (Redpitaya, 2016) . However, the concept of re-configurable terminal was first appeared in the military area. The flexibility of a terminal requires the system to be adaptive and reconfigurable was well explained by Polydoros et al. (2003) .
If a system can respond to application changes by properly altering the numerical value of a set of parameters (Harada et al., 2000) , then the system is said to be adaptive.
An improvement in power reduction, speed increase and area reduction has been shown with Xilinx FPGA in Joseph and Nirmal (2012) realisation of SDR in Partial Reconfigurable FPGA using different types of Modulation Techniques.
In recent days and also in professional journals (Mitola, 1995; Doner, 1996) much attention was given to the 'software radio' concept. Karabi and Abolhassani (2014) describes an application of reconfigurable hardware cores for SOC SDR at communication network. Their framework architecture is based on the partial reconfiguration technique and the architecture provides a new and wide horizon in hardware designing. The first digital processing step is real signal which is downconverted into a complex baseband signal. Which is in turn done by multiplying the IF signal with sine and cosine signals from a numerically controlled oscillator (NCO). CORDIC algorithm is one of the common methods to generate sine and cosine functions in digital hardware. Since CORDIC is an iterative algorithm, a pipelined version would most likely be used to provide sufficient throughput. It is also possible to use an optimised, reconfigurable CORDIC processor to achieve higher performance and less area requirements (Keller, 2000) .
In order to support a change of channel between transmission slots, a down-converter must be able to adapt quickly to different frequencies. The performance of a run-time reconfigurable system is dictated by the reconfiguration downtime. As per paper, Seng et al. (2002) , the overall performance is usually improved despite of the configuration overhead, if reconfigurable hardware is used as an accelerator for software functions.
In a large range of signal processing applications in order to improve high performance, flexibility and adaptability authors in Compton and Hauck (2002) and Hartenstein (2001) proposed an idea. Multiple standards can be supported for communication devices and have the capability of switching one to another. To support different standards, as in SDR systems, there is a key challenge of construction of a communication device which is designed of a flexible device that can dynamically configure itself to run different algorithms.
FPGA can form any digital circuit using (Kuon et al., 2007) this structure. FPGA is integrated with partial reconfiguration feature which help in changing the functionality of the FPGA. Different papers related to FPGA, Partial reconfiguration, QAM modulation can be studied in paper (Kuon et al., 2007) .
In paper Becker et al. (2007) , author quoted an accelerator, which results in a smaller bit stream and shorter programming delays. As DPR allows much faster reconfiguration by changing only a small portion of the FPGA. A radio system is divided up into sub modules and uses DPR to exchange the sub modules without interrupting the overall data flow. The sub modules share available logic resources on the FPGA and thus allow the use of a smaller and cheaper FPGA which is similar to DSP threads sharing a processing unit. The contribution of this paper is to evaluate the proposed buffered architecture, a DAB receiver Digital Audio Broadcast EUREKA 147 (Comitee, 2006 ) is used as example application.
There is a systematic gaining importance of field programmable gate arrays (FPGAs) since their introduction in 1980, both in the commercial and in the research setting. FPGAs are gaining popularity due to their increased capability and performance. FPGAs contain thousands of look-up tables (LUTs), flip-flops and a large variety of other built-in digital components. DPR is an interesting feature that has been viable for most recent FPGAs. Sedcole et al. (2006) emphasis on the possibilities offered by partial reconfiguration in the implementation of system adaptivity which are enormous. As a result of this and other advancements, both in industrial and research applications widely FPGAs have evolved from being early prototyping and hardware emulation platforms devices.
In Koch and Tørresen (2010) , partial runtime reconfiguration allows to fit circuits on an FPGA that would exceed the device capacity in only static implementation. If the system contains modules with mutual exclusive functionality or if the utilisation of some modules allow time-multiplexing of the same FPGA resources this implementation technique is only applicable. To accelerate a system partial runtime reconfiguration can also be used. The execution time for each task can be reduced by providing more area for a particular task and sharing the same area with the help of runtime reconfiguration. Assuming that a task can start after the completion of its successor, the total execution time (latency) can be reduced time for all tasks. For example, in a system providing hardware acceleration for a secured SSL network data transfer, runtime reconfiguration can be used to switch between an accelerator module the asymmetric key exchange and a symmetric cipher module which is used for the entire data encryption.
It can be summarised that partial runtime reconfiguration can help to use smaller devices which helps to save monetary cost, power consumption, and space in a system. To reduce latency or throughput further some systems can be used. In spite of having this benefit, partial runtime reconfiguration is still exotic and is not popular in industrial systems, this has been focused in papers (Kao, 2005) . Runtime reconfiguration is not widely applied, because not all systems are suitable for profitably applying this technique (e.g., if all modules are active at any time) and because a weak tool support for implementing corresponding systems. However, partial runtime reconfiguration is not supported in all FPGA devices.
Proposed design and high level architecture
The proposed architecture for partial reconfiguration-based SDR is shown in Figure 1 .
The ARM core present in Zynq SoC (popularly known as Processing System or simply PS) acts as controller block and also carries out the partial reconfiguration. The two modules spectrum estimation and FSK demodulator are compiled as partial bit file. The input and output ports are maintained with same bit widths so that the partial reconfiguration is achieved. With the 'power on' condition the Zynq FPGA (popularly known as programmable logic or simply PL) gets loaded with total bit file and ARM9 processor runs the given software. The software is so developed such that by power ON initially the spectrum estimation block is configured. The spectrum estimation block continuously checks the presence of signal power in the band 0-62.5 MHz rate continuously. This is because the total ADC works at 125 MHz and the possible spectrum which can be sensed is 0-125/2 MHz. The spectrum peak and its index value are read by the processor continuously. When the spectrum peak crosses a predefined threshold, processor will understand the presence of a signal on the frequency associated to the index value. Considering the index value processor decides the carrier frequency for modulated signal. The spectrum estimation is based on streaming type of FFT calculation. Further to this the demodulator partial bit file is loaded in the place of spectrum estimation block in the dynamic portion of PL. After demodulator configuration, based on the peak index value the carrier frequency is also configured through nco_freq_word tuning word. The signal generator output is applied to ADC and that is same in both the partial reconfiguration bit files. The output of FSK demodulator is also observed on oscilloscope.
While the demodulator is running in PL, PS continuously reads magnitude levels of symbol. When the magnitude levels go below a certain level, it is understood that either the signal frequency is changed or discontinued. This stage is sensed in processor and the spectrum estimation partial bit file is loaded again. This process repeats continuously and associated states and signals are made available at the FPGA board. This developed architecture is suitable for SDR where run time configuration and detection of signal parameters is desirable. The design metrics optimised in this work are area, cost and power. The following analysis explains them in detail. A definite application of SDR receiver is considered here for analysis purpose. The blocks FSK demodulator and spectrum estimation can be time-multiplexed. So, partial reconfiguration approach can be readily applied here. The intermediate results can be saved on memory. Here, a generalised approach of partial reconfiguration for implementing in FPGA is presented.
Let A, B, C and D are four different time-multiplexed blocks which need to be configured. Each of these blocks have respective power, resources (area) and associated cost. With partial reconfiguration approach the memory blocks will be configured outside the partial reconfiguration zone and a state machine will be maintained to control the memory and partial bit file settings. The proposed partial reconfiguration-based realisation application configures the partial bit file one after another and maintains the state of each bit file output through memory blocks to achieve an optimal solution. As explained in previous sections the DPR-based solution can yield following parameters Power consumption with DPR(P DPR ) = Max(P A , P B , P C , P D ) + Max(P AB , P BC , P CD )
Resources needed with DPR (
The corresponding values for without DPR cases are Power consumption without DPR(P WO_DPR ) = SUM(P A , P B , P C , P D )
Resources Needed without DPR(R WO _ DPR ) = SUM(R
Hence, it is evident from the above discussion is that, the power consumption, resources needed and cost function for with DPR cases are always improved compared with their without DPR counterparts. The optimisation of RDPR gives augmented logic density. It also helps in the savings in P DPR , C DPR as a by-product.
Spectrum estimation
Spectrum estimation is implemented by streaming type of FFT module.
The pipelined, streaming I/O solution pipelines several Radix-2 butterfly processing engines are to offer continuous data processing. It is shown in Figure 2 . Each processing engine has its own memory banks to store the input and intermediate data. The core has the ability to simultaneously perform transform calculations on the current frame of data, load input data for the next frame of data, and unload the results of the previous frame of data. The user can continuously stream in data and after the calculation latency, it can continuously unload the results. If preferred, this design can also calculate one frame by itself or frames with gaps in between (Xilinx, Inc. US, 2011) . Figure 3 shows peak detection logic diagram of spectrum estimation. The 1024 point FFT is a part of Spectrum Estimation block, which generates outputs FFT_re_out and FFT_im_out. These outputs are applied as inputs to the peak detection logic. First, magnitude is computed using multipliers and adder. The maximum value of the magnitude is observed over total index range that is from 1 to 1024. This is indicated as max_value in the Figure 3 . The index value is also stored in register whenever max_value is detected. The max_value and corresponding index value presented at the output at the peak_detection_end. These values are fed to processor (PS). 
Area efficient SDR receiver without and with dynamic partial reconfiguration 183

FSK demodulation
The block diagram for FSK Demodulator is shown in the Figure 4 . The FSK Demodulator has blocks Spectrum shifting, Decimation, DDC, Complex multiplication, Symbol filtering, Magnitude computation and Decision device. The FSK Demodulator is explained in the paper (Reddy et al., 2016) more clearly.
FSK input is applied to Spectrum shifting block. The spectrum is shifted based in the value of the NCO word. This NCO word can be configured by the user through virtual input control. The shifted signal is fed to the next block which is decimation. The amount of decimation to be applied can also configured by using virtual input.
Decimated signal is fed to DDC. The output of DDC will be applied to two parallel paths each tuned to a tone of FSK. In both paths, complex multiplication is used for symbol shifting. The upper data path shown for FSK tone 0 and the lower data path shown for FSK tone 1.
For each symbol magnitude is calculated by using Magnitude Computation logic. Based on the magnitudes of the both symbols, decision device will decide the symbol 0 or symbol 1. Finally, decision device produces demodulated bit, which is FSK demodulated data output. 
Implementation
The Redpitaya development board has several features which are suitable in the communication demodulator applications. The present paper utilises 14-bit ADC, which is available on this board. The ADC is used for taking the signal from signal generator which will act like an input to spectrum sensing block and FSK demodulator. Figure 5 shows Redpitaya development board (Redpitaya, 2016) .
Redpitaya FPGA board has two Analogue to Digital converters. The ADCs uses LTC type of technology. Figure 6 shows block diagram of ADC.
Spectrum estimation
This section describes the implementation wise aspects and simulation of the Spectrum Estimation and FSK demodulator architectures which are used for partial reconfiguration. Table 1 shows the specifications of Spectrum Estimation considered for the present design. Table 2 shows the specifications of FSK demodulation considered for the present design. 
FSK demodulation
Simulation results
In the present section, simulation results of two blocks of the DPR are shown. Figure 7 shows simulation results of Spectrum Estimation. The signal clk is main operating clock which is 125 MHz. The signal pinc_in is the 32-bit DDS frequency word, which is the input of DDS IP core. This frequency word is calculated over the clock frequency 125 MHz. This signal is given with value '0A3D70A3' in HEX, corresponds to 5 MHz frequency. So, DDS generates sine and cosine signals with 5 MHz frequency. The signals din1_re and din1_im are the inputs of the Spectrum Estimation. The red Coloured circle for the signal corr_val_out12 shows computed peak and corresponding index is also available at the below signal. The signal max_corr_index_out12 shows the maximum peak value detected, this value can be observed clearly from Figure 8 .
In Figure 8 , max_corr_val_out12 is the maximum peak value detected by the peak detection logic. In Figure 8 , peak detected is at the index 983, which corresponds to 5 MHz frequency in conversion (N value of FFT is 1024). So, the Spectrum Estimation is producing output properly. Figure 9 shows the top level module simulation of FSK demodulator. In this, signal clk is the main operating clock which runs at 125 MHz. The second signal rst is the asynchronous reset to the module. The fsk_bit is the input data which is generated at 64 Kbps. The input data enable is named as fsk_bit_en_rising. FSK modulated data is generated to validate the implemented demodulator. For modulator two tones are required which are generated for input data bits '0' and '1'. The FSK modulated data I and Q signals can be observed from cosine_out and sine_out in Figure 9 . These I and Q are given as input to FSK Demodulator. Then complx_multiplier_out_i_1 and complx_multiplier_out_q_1 are the outputs of complex multiplier. The signal sig_demod_out is the FSK Demodulated output data bit. The same bits are detected by demodulator can be seen in Figure 9 which are indicated by an arrow. Figure 10 shows results of complex multiplier. The signals ar, ai, br and bi are the real and imaginary inputs and pr and pi are the outputs. There are four inputs in complex multiplier, two are real and two are imaginary. FSK modulated data I and Q are applied as two inputs, other two are applied with cosine and sine generated with DDS. Figure 11 shows results of Low pass filter with sampling frequency 1.25 Msps and clock rate 125 MHz. Cut-off frequency is around 42 KHz. The filter input is din and output is dout. There are four such filters used in the design. The presented figure is of one filter results out of four.
The symbol_0_power and symbol_1_power are the magnitudes computed for the two symbols shown in Figure 12 . The magnitude comparator decides the detected bit as '0' or '1'. If the symbol_0_power is more than the symbol_1_power, then the detected bit will be '1' else '0'. 
Hardware verification
The developed VHDL code is synthesised to Zynq FPGA using Xilinx ISE tools. Table 3 shows device utilisation summary of spectrum estimation and FSK demodulator blocks. The area analysis of the implementation without DPR and with DPR is shown in Table 4 . The implementation without DPR, requires both spectrum estimation and FSK demodulation modules presence at the same time. So, the area required is addition of those two modules which is given in Table 4 . Where as in the implementation with DPR, at a time only one module is present the area required is maximum of the two modules. The presence of one module at a time in the implementation with DPR results in the less area, details are given in Table 4 . The area utilisation details for spectrum estimation and FSK Demodulation are separately shown in Tables 3 and 4 . Figure 13 experimental setup of DPR is shown. The left instrument is the signal generator which generates input data (FSK Modulated) and the right instrument is the Oscilloscope used to observe the demodulated data. There are two partial bit files generated for Zynq FPGA, one is Spectrum sensing and the other is FSK Demodulator. These two bit files are switched between one and the other automatically based on the requirement. The spectrum sensing bit is loaded to find the frequency of the signal and the corresponding NCO word is calculated and written to FSK Demodulator, then the FSK Demodulator bit is loaded immediately after computation of frequency of the input signal. FSK Demodulator is given with the NCO word and it will shift the signal according to the NCO frequency, then performs demodulation. If the input signal frequency is changed, then automatically Spectrum sensing bit is loaded. This process of partial bit configuration is repeated when there is a change in the input signal frequency. The demodulated output is observed in oscilloscope (top right instrument in Figure 13 ) and compared with generated bit sequence. The observed output shows the same bit pattern which is set in the signal generator. In signal generator pattern '10101010' is applied. The same pattern can be seen in the oscilloscope. The configuration of two bit files one after another depending on input signal conditions is verified in practical test conditions. Table 5 shows the power analysis carried out for both the modules with the help of Xpower Tools provided by Xilinx. The power dissipation in CMOS circuit is classified as dynamic power and static power. The power consumption in an FPGA is predominantly dynamic power. The power is estimated in FPGA at three different levels. Viz. concept phase, design phase and system integration phase. The dynamic power is calculated with the following formula.
where C EFF is the effective switched capacitance. ∑ where α i is the switching activity factor at node i and C Li is the load capacitance at node i.
To get the full functionality without DPR, both blocks FSK Demodulator and Spectrum Estimation are used at the same time. So, the power required for this will be addition of both the blocks' power, which is 1.092(W). But, with DPR, at a time only one block is configured, so the power required will be only the maximum of these two blocks which is 0.549(W). By using the concept of DPR, the same principle is applied to area also.
Conclusion
The present work demonstrates novel architecture for SDRs under DPR framework. The DPR framework allows spectrum estimation block, demodulator block and any other block to be configured in time-multiplexed manner. This is an essential requirement in SDR as several functionalities of radio need to be developed through software. In most of the applications it is not required that these modules run concurrently. This throws a window of opportunities in optimising the important design metrics like low area, low power and low cost solutions at the same time without compromising too much of the throughput of the system. In partial reconfiguration, swapping of configurations on-thefly during the run-time, it is possible to develop a complete SDR application with multiple contexts in a single configuration.
The work demonstrates Spectrum estimation and FSK demodulation configured using DPR. The results prove correct functionality up to 64 kilo symbols per second. The results demonstrate full functional validation including run time state detection and partial reconfiguration of Spectrum estimation and FSK Demodulator blocks. The total SDR receiver is implemented through two partial bit files. The total work realised in Zynq 7010 type SoC platform with Slice utilisation 18%, BRAM36 utilisation 41%, BRAM18 utilisation 9% and DSP48 utilisation 32%. The power saving and average area saving amounts to 49% and 40% respectively, when compared with without DPR cases. These are traded-off with latency/throughput. The work will be continued towards cognitive radio (CR) transmitter and receiver to detect white space for utilising available spectrum efficiently. The spectrum is shared dynamically between licensed and cognitive user.
