Abstract: This paper presents a framework based on embedded SOC systems, which allows the testing and use of control algorithms implemented on FPGA circuits. For monitoring and parameterization a real-time Linux operating system was used. In this paper different methods for data exchange between the operating system and the control unit were introduced. The delay between the operating system and the control unit during the data exchange was studied and the measurement results discussed reflect the real-time functionality of the system.
Introduction
The goal is to design an FPGA based embedded control interface capable to run a real time operating system that can be accessed and administrated remotely. The FPGA core can run a control circuit implemented by the user and the operating system is capable to exchange data in real time with the hardware implemented control algorithm. The control algorithm in the FPGA core must be able to function in real-time in parallel with the embedded processor. The paper is focused on the data exchange interface between the real-time operating system and the hardware implemented control algorithm. This paper doesn't cover the implementation of a specific control algorithm. The goal was to build a single board computer system that integrates all major components necessary for process control.
To achieve the highest transfer rates between the control circuit and the operating system it is a good practice to integrate the processor running the operating system in the same chip with the control circuit. The most common method is to use embedded soft-core [4] MicroBlaze processors [1] [2] [3] on which the operating system runs. This approach uses valuable FPGA resources and can only operate at a maximum clock frequency of 100 MHz. We chose a SoC (System on Chip) circuit that contains a reconfigurable core and integrated ARM processor cores that can be used up to 1 GHz frequency. The paper also focuses on the data exchange module between the processor and the control unit. Other implementations studied by us use modules that are provided by the development environment [2] or commercially provided modules [3] ,these modules can be implemented easily but are not as customizable and fast as the circuits recommended by us. These problems were approached from a hardware and a software point of view presented in the following sections.
Hardware implementation
On Figure 1 the application diagram of the system can be seen. The Linux operating system will run on the ARM Processor, the different methods to build the controller interface will be presented in this chapter. The development board used for implementation is based on the Zynq chip architecture designed by Xilinx. The Zynq architecture combines two physically implemented ARM Cortex A9 processor cores and a reconfigurable FPGA core in the same chip. The communication between the FPGA core and the processor cores is realized through the AXI Interface. The development board used was the Digilent Zynq ZYBO Z-7010 that has a processor clock frequency of 650 MHz, and an FPGA core clock frequency of 100 MHz. The board contains a microSD card reader, the operating system resides on the flash card, the Ethernet TEMAC controller is used to connect the system to the network. For rapid programming of these systems Xilinx provides a development environment called Vivado Design Suit that can automatically generate various interface circuits.
To implement an interface through which the operating system can exchange data with the control circuit on the FPGA three possible solutions are presented: data exchange through shared memory, data exchange through register interfaces supervised by a state machine, data exchange through register interfaces supervised by an embedded soft-core MicroBlaze processor. In all three cases will be examined how many FPGA resources are needed for the interface module. The resource tables presented only apply for the interface module itself, a specific control circuit was not implemented. Also the pros and cons of these interfaces in a context of speed and usability will be discussed.
Data exchange trough shared memory
The first approach was to create a Dual Port BRAM [5] memory block that can be written and read by two peripheries at the same time. The A-port of the memory is connected to a BRAM Controller Interface which is provided as an IP core (Intellectual Property Core) by the Vivado Design Suit environment. This core will map the BRAM memory in the AXI Interface so the operating system can access it through the AXI Interface. The B-port of the memory is connected to the control circuit implemented by the user. The operating system and the control circuit exchanges data through the BRAM memory. A great disadvantage of this interface is the lack of a supervising circuit. The operating system always has to poll the memory to get information of the control circuit state, putting the operating system under high load due to the many I/O operations. Also moving large chunks of data will increase latency. For the system to work at least the Zynq processing core is needed, this being the physical processor that runs the operating system, the AXI Interconnect block and a Reset block through which the operating system can reset its peripheries. The control circuit would be the hardware_controller_0 block. The table below shows the resource estimation of the design. 
Data exchange through register interfaces supervised by a state machine
In this approach we automatically generated a frame circuit by the development environment that implements an interrupt controller and a register interface in VHDL hardware description language. Our task is to implement a state machine which will supervise the control circuit, realizes a protocol for managing the instructions in the registers and generates the interrupt vectors. In Figure 3 we can see the block diagram of the VHDL circuit. The Axi Interface, the Axi Interrupt Interface and the register blocks are generated automatically by the development environment, only the state machine and the control circuit have to be implemented. One of the advantages of this system is the usage of interrupts so the operating system doesn't have to poll the content of the registers all the time. When intervention is needed by the operating system, the module generates an interrupt. A disadvantage of this approach is that complex control circuits or complex communication protocols need a sophisticated state machine, and at any change in the communication protocol or in the control circuit the user needs to rewrite the state machine to fit the new environment. This approach is perfect for simple control circuit implementations such as PWM signal generation or to implement a loop for testing purposes. In Figure 4 the implemented design of the hardware_controller_fsm_0 containing the VHDL code can be seen. The table below shows the resource estimation of the design. 
Data exchange through register interfaces supervised by an embedded soft-core MicroBlaze processor
In this approach data exchange also takes place through registers but instead of a state machine we use a MicroBlaze processor to supervise the control circuit. Since the MicroBlaze core is a fully functional processor core it can run programs in C++ programming language and the control algorithm can be written as a program making the control circuit unnecessary. It can be parameterized by the operating system through the registers but it functions in parallel with it. An advantage of this system is that it can be adapted easily to environment changes by simply changing its program. Without an operating system running, with just a plain program the core can achieve real time controlling tasks. In Figure 5 the implemented circuit is presented: a GPIO module was added to the design so the MicroBlaze can communicate with the outside world. The GPIO module is connected to the AXI Interface so it is accessible also by the operating system. The table below shows the resource estimation of the design. 
Software implementation
This section starts with the presentation of the device driver [6] . The goal is to give the user function calls that are device independent. These function calls will be mapped by our device driver to device specific instructions hiding the raw device structure from the user. When a device driver is loaded in the kernel it will create a file in the /proc directory of the operating system, that will serve as an interface between the user application and the hardware module. The user can read and write this file using standard C file operation functions (fopen(), fread(), fwrite(), fclose()).
After building the hardware interfaces presented in the previous sections using the Vivado Design Suit development environment a device tree can be created [7] , a file that specifies the address map of all peripheral devices connected to the AXI Interface. The interfaces created are directly addressable but will not get automatically loaded in the system startup, none of the modules above have plug and play capabilities, to implement kernel modules for such devices the platform device API will be used. Implementation of a platform device structure is as follows:
static struct platform_driver mydrv_driver = { .driver = { .name = DRIVER_NAME, .owner = THIS_MODULE, .of_match_table = mydrv_of_match}, .probe = mydrv_probe, .remove = mydrv_remove, .shutdown = mydrv_shutdown }; After loading the module into the kernel it has to be initialized and the physical address of the hardware module has to be mapped to the virtual memory address space of the Linux system, from where the user can access it. This is done by the mydrv_probe() function that is called upon module insertion to the kernel. In case the module uses interrupts they have to be registered with the kernel using the request_irq(). Upon module removal from the kernel the mydrv_remove() function is called that releases the allocated virtual memory and if interrupts were used in the module it removes them from the kernel using the free_irq() function. The mydrv_of_match() function looks up and reads in the device physical address and parameters from the device tree.
To exchange data between the kernel module and the hardware module the kernel module needs to realize the following file operations structure:
static const struct file_operations proc_mymodul_operations = { .open = mymodul_open, .read = mymodul_read, .write = mymodul_write, .release = mymodul_release }; When the user accesses the driver through the file in the /proc directory the mymodule_open() function will be called. The fread() function in user space will call the mymodul_read() function in kernel space. If the module implements interrupt handling, the mymodul_read() function will block execution with the wait_event_interruptible() function call until the hardware module generates the interrupt. The write function works the same as the read function the only difference being that the write is not a blocking function. In case of an interrupt the kernel calls the interrupt handler function specified in the request_irq() function call. The interrupt handler can handle the interrupt in kernel space or through the read function it can wake up a process running in user space.
The Linux operating system
The task of the operating system is to monitor the implemented control circuit and give feedback to the user connected from a remote computer. Delays caused inside the operating system will be examined, delays on the network will not be dealt with. The system needs to handle errors and exceptions by a predetermined set of rules which can be upgraded remotely. Linux was chosen because it is a widely used open source operating system. The system is based on an already existing Linux distribution called Arch ARM Linux with a modified Linux 3.14 kernel. For the kernel to be compatible with the Xilinx Zynq chip architecture the kernel source provided and maintained by Xilinx was used and modified. To make the operating system compatible with the development board the board manufacturer Digilent inc. provides a basic kernel configuration file. We patched the kernel source with the RT-Preempt [8] patch, which provides a preemptible locking mechanism so tasks can be interrupted even when they are in the critical section execution stage, greatly improving the Linux system real-time response.
After compilation a costume kernel was made that we switched out in the Arch ARM Linux distribution leaving the rest of the distribution intact. This step shortened the development cycle because the focus was only on the kernel, all other programs needed are already present in this Linux distribution, so the network can be configured focusing on security [9] . For security reasons all network communication is done through encrypted ssh channel, the Embedded Linux system runs an OpenSSH [10] server provided by the Arch Linux distribution. Users can now connect remotely on an encrypted channel to the SOC.
To test the response time of the system a hardware module discussed in section 2.2. of this paper was implemented with costume state machine that can measure the elapsed time between the generation of an interrupt signal and the operating system response to this event. After measurement the measured value can be read back by the operating system through a register. The state machine operates at 100 MHz clock frequency. The state machine measures the elapsed time between a generation of an interrupt event and the appearance of the response generated by the operating system. On Figure 6 how the state machine operates can be seen. It generates an interrupt signal that waits for the response written in reg1, after that puts the counter value in reg2 which multiplied by 10 gives back the elapsed time in nanoseconds.
Two scenarios were tested: interrupts are handled in kernel space, a predeclared response is given to a specific interrupt, and the interrupt wakes up a process in user space that computes a response. In both cases we measured the elapsed time using the state machine presented before.
Interrupts handled in kernel space have the fastest response time, both in the standard kernel and the RT-Preempt could capture an interrupt, generate a response and write a register inside 9.13 us. To test the kernels response time using the state machine presented above 50 measurements were made. The minimum time delay was 4.33 us the maximum 9.13 us, the average response time was 7.14 us. This means that an event in the FPGA circuit can be detected by the user in a maximum time of 9.13 us. Interrupts that wake up a user space process show high differences in the response time between the standard and the RT-Preempt kernel. From table 4 it can be seen that the standard kernel has more than triple delay time opposite to the RT-Preempt kernel.
Conclusion
An Embedded Linux operating system combined with reconfigurable FPGA circuits can serve for controlling real time systems. The control circuit can not only be implemented but also can be monitored and administrated in real time, responses for many error scenarios can be predefined. Administration of the system can be done remotely on an encrypted communication channel. In case the user needs complex algorithms to run alongside a circuit partly implemented on the FPGA it is best to use the RT-Preempt kernel to achieve better response times.
