Abstract
Introduction
With the recent increase in the degree of integration of semiconductors, the capacity of Field Programmable Gate Array (hereafter referred to as FPGA) has also been increased. As the capacity increases, the size of modules that can be loaded in the FPGA becomes larger, and therefore complex algorithms can also be processed by the hardware.
Unlike Application Specific Integrated Circuit (ASIC), FPGA enables the reuse of module after quickly correcting a problem even if the problem occurs in the module operation. However, it has a disadvantage because it takes a long time to synthesize modules to be used in FPGA in the case of general desktop PC, not server-class PC. In particular, even when various items need to be tested in accordance with the frequency of the clock used in the created module, the synthesis should be repeated every time the clock frequency changes. [1] IJACT 16-4-5 In this regard, this paper seeks to design a hardware that can control the clock frequency in an application program without going through a re-synthesis step even when the frequency of clock to be used in the internal module of FPGA changes, and thus to propose a method to solve the problems caused by it.
Design of Hardware Loaded with Mixed-Mode Clock Manager Module and Application Program

Mixed-Mode Clock Manager
Mixed-Mode Clock Manager(MMCM) module[2] is basically provided by Xilinx, a supplier of programmable logic devices that develops FPGA and synthesis tools. MMCM module has a function to generate multiple clocks with different frequencies using the input clock named CLKIN1. The module has a parameter that can control the clock frequency and outputs the clock with desired frequency using the input clock. Table 1 shows a summary of input/output signals used in the MMCM module. 
Hardware Design Method for Controlling Input Clock
In order to control the clock frequency in the application program, the MMCM module should be added to a hardware wrapper. As shown in Figure 1 , the clock entered in the wrapper is used as the input clock used in a system bus. The input clock is entered through a CLKIN1 port of the MMCM module, and the values of the parameters set in the application program are transferred to the MMCM module via the Advanced Peripheral Bus(APB). Then, the clock with the desired frequency is displayed from the CLKOUT0 port, and the corresponding clock is used as the input clock of the module. With this method, the system bus clock and the clock entered in the module have different frequencies, and the control becomes possible in the application program without a module re-synthesis process when there is a change in the clock frequency to be entered in the module.
However, a problem of data asynchrony occurs due to the difference between the clock used in the module and the system bus clock. The problem can be solved by adding an asynchronous bus module entered with the clock displayed from the existing input clock and MMC module between the system bus and user modules. [5] The asynchronous bus module operates separately into a master bus and a slave bus, and in the case of a master signal, the address to be requested to the external memory in the module and the size of the requested data are stored in the asynchronous bus module. If the requested address and data size are transferred to the external memory, MR_DATA and MR_LAST signal that indicates the last data are stored in the module, P_MR_DATA and P_MR_LAST signal are transmitted to the module, thus completing the read operation. The write operation of the master bus operates similar to the read operation. In the case of the slave bus, however, since read and write operations work according to Write Enable signal through one address, the asynchronous bus module can be designed more easily than in the master bus. Figure 2 shows the input and output data from the asynchronous bus module. 
Input Clock Control in Application Program
The application program is responsible for controlling the FPGA loaded with the designed module and initializing the memory to be used by the access of the module. [3] It controls the memory to be used in FPGA and FPGA through PCI Express(PCIe) interface.
[4] The basic control process proceeds in the order of the reset of the entire system, the initialization of the clock frequency to be used in the module, the initialization of the momory and the transmission of the start signal of the module. Figure 3 shows the internal code used to initialize the clock frequency used in the module. The values contained in the MMCM::SetClock function are transmitted to the MMCM module after calculating the parameter values to be used in the MMCM. The parameters are modified using Divide and Multiplier values to be used when dividing the clock to obtain the clock of the desired frequency. Meanwhile, dwID means the address of the corresponding module when accessing the module in the application program. 
Experiments and Results
FPGA board used in the experiment was a VC707 board equipped with Xilinx Virtex-7 FPGA. In the case of the module mounted on the FPGA, a Single Instruction Multi Thread(SIMT)-based processor, which is a miniaturized NVIDIA's Fermi structure [6] , was implemented and used. With the system bus clock fixed, the MMCM module was controlled to check the change of the input clock and the addition operation time of the array according to the input clock change. Figure 5 shows the simulation results after the input clock is controlled in the application program without the FPGA re-synthesis step. In Figure4a, the input clock is 100MHz, and in Figure4b, the input clock is 50MHz. Figures 5a and 5b show the results of processing time according to the change of the input clock in the actual FPGA. The total time is the sum of the processor initialization time and the calculation time of the processor, and the processing time can be used to check the time taken for calculation of the processor. In the second experiment, the processing performance of application that implements a simple Neural Network was compared. As shown in Figure 6 , it is an application that predicts the next output value when three different inputs pass through the Neural Network.
Figure 6. Input and output data of the Neural Network Application
In the application initialization phase, weight data and bias data are initialized to random values in the FPGA control application and stored in the memory. In the Neural Network, exponential function and tangent hyperbolic function are used. In the case of SIMT-based processor used in this experiment, the function of processing a special function is excluded, and therefore the control application prepares the data in a lookup table (LUT) in advance and loads them to the memory. Lastly, the Neural Network application is loaded into the code area memory, and the processor start signal is sent to process the application. Since in the mist of processing the Neural Network, the weight and bias update operations are undergone by repetitive learning, efficient processing using multiple threads is required, and the weight and bias are loaded into the memory, the application processing performance can be improved through efficient access. Table 3 shows the results of processing the Neural Network application on a general processor using Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz and SIMP-based processor at 100MHz and 50MHz. In the case of using Intel Core, approximately 2.38 seconds were consumed. However, when the application was processed using the SIMT-based processor at 100MHz and 50MHz, the desired results were achieved only with 38ms and 77ms, respectively. 
Conclusion
This paper proposed a method to control the frequency of clock by controlling the MMCM module in the application program through mounting the MMCM module in order to shorten the FPGA re-synthesis time according to the change of the clock frequency. Since the clock frequency can be flexibly controlled, a problem of asynchrony with the system bus clock can occur. However, this problem can be solved by adding an asynchronous bus module. Through the proposed method, a variety of experiments according to the clock frequency could be performed without the consumption of unnecessary re-synthesis time. In conducting experiments to move the data stored in the array to another array and processing a simple Neural Network application, 100MHz and 50MHz were used as input clock frequencies. The experimental results confirmed that the processing time was 38ms at 100MHz, and 77ms at 50MHz respectively, and there was almost no time consumption due to the change of the input clock frequency.
