Applying FPGAs and real-time bare-metal code in experimental robots has many benefits but puts high demands on the skills of students and researchers. This contribution presents an interface between FPGA, hard-macro microcontroller cores and the popular middleware ROS implemented on a single chip.
I. INTRODUCTION
The key development presented in this paper is the integration of programmable logic with the Robot Operating System (ROS) [1] middleware on a single chip. This adds the various benefits of using a field-programmable gate array (FPGA) to the system while masking the complexity of programmable hardware from coders.
The system can be setup to handle all signal processing on board or it can be offloaded to an external ROS node. The two Advanced RISC Machine (ARM) cores on the selected board are booted in an asymmetric multi-processing configuration. The interface to ROS is handled by a core running Linux, the control algorithm is on a bare-metal core while the interaction with external inputs and outputs is handled by the FPGA. The system is distributed, mirroring memory across multiple FPGAs which allows for varied architectures and deterministic timing. The combination of robotic middleware and FPGAs in a distributed system enables researchers to build complex experimental robots while working within hard timing constraints.
A. Related Work
Similar research includes the proposal of ROS-compliant FPGA components [6] to quickly interface between ROS and programmable hardware. The communication between hardware and software differs from the developed ROSenabled hardware framework, using a FIFO buffer and running Linux (Xillinux) symmetrically. This makes the design more portable since the chip architecture is not required to contain two ARM cores as compared to this paper's architecture which requires a dual-core chip with the ability to run an asymmetric setup. On the other hand, the presented framework utilizes each processor and the programmable hardware based on their individual strengths, allowing for the bare-metal core to handle real-time control calculations.
Acutronic has launched H-ROS [7] , the hardware robot operating system, for plug-and-play robot modules. It is *This work was not supported by any organization 1 B. Strohmer and L. Larsen are with the Advanced Computer Systems Group under the Faculty of Engineering at the University of Southern Denmark 2 A. Bøgild and A. Sørensen are with the Welfare Technology Group under the Faculty of Engineering at the University of Southern Denmark described as a System on Module (SoM) and can be attached to different robotic components in order to control them as ROS nodes. The system includes ROS 2 which is customized for real-time applications so it offers modularity as well as deterministic timing. However, H-ROS is proprietary so it is not as suitable for research purposes as this paper's presented framework. Regardless, it is an alternative tool to be considered for quickly prototyping new robotics architectures.
The functionality for mirroring memory across distributed FPGAs was originally implemented in the TosNet framework [3] . In order to adapt it to a custom architecture, knowledge of hardware programming is required to design components to interface with TosNet. Unity-Link [4] addressed this issue, adding a software interface which allows users to issue standardized commands through serial communication to TosNet. The ROS-enabled framework is developed with the same idea as Unity-Link but replaces the standardized commands with a ROS node interface.
This research adds to the existing solutions by offering a different setup, jitter in the hundreds of micro-seconds and control loop speeds in the low kHz range over a distributed network.
B. System overview
The Xilinx Zynq-7000 series system on chip (SoC) combines FPGA/programmable logic (PL) and two hard-macro ARM A9 application processors (PS) in a single chip. This provides an opportunity for tight physical integration of a three-stage controller architecture which interfaces with robotic middleware, runs control algorithm calculations and handles low level actuator signalling and sensor feedback. For the higher levels in the controller, Ubuntu Linux and ROS Kinetic are selected to run on one core. The second core handles the PS-PL interface by translating values and performing necessary mathematical calculations based on the controller type. Finally, hardware interfacing and memory distribution are synthesized in programmable logic. The FPGA block in Figure 1 represents either a single FPGA or a network of distributed FPGAs. In the case of a distributed network, there are multiple SoCs but the slave nodes only utilize the FPGA portion of the chip whereas the master node is set up as shown in the block diagram. By using only programmable logic for the slaves, these nodes can be interchanged with more space-efficient FPGA boards. The distributed FPGAs are connected with TosNet [3] which mirrors memory across multiple nodes connected by optical cables. TosNet is used as a black box in the framework and communication is handled by following the communication protocol outlined by the user guide [5] .
II. METHODS
The evaluation of the framework is designed to give metrics helpful for researchers in experimental robotics. The speed of the control loop is a useful parameter to understand which types of actuators can be controlled. The latency provides insight into the possibility of synchronous control of distributed nodes. And finally, the jitter indicates the reliability of the system to meet timing constraints. These test parameters are investigated with a bit flip test where a bit is passed through the system and flips an output pin each time it is received. The bit flip test is initiated by the ROS node and propagated through the system using the data flow shown in Figure 1 . The expected outcome is a pulse width modulated (PWM) signal on the output pin of the FPGA which has a 50% duty cycle. The length of time the signal is either high or low signifies one complete cycle of the control loop.
III. RESULTS & DISCUSSION
The control loop speed is limited by the chosen TosNet setup. The maximum frequency that a TosNet node can run is controlled by TosNet's read/write cycle frequency. In order to ensure data is only read or written once per cycle, the distributed memory component is synched with TosNet. TosNet's cycle frequency is determined by the number of nodes and registers in the network. Therefore, it has varying speeds from 300Hz up to 25kHz[2, pg 66] based on the selected architecture. The framework setup for these tests uses 4 registers and 8 nodes (1 master, 7 slaves) which limits the frequency to ∼3kHz.
The tables in Figure 2 show the results of the bit flip test for each metric. The mean control loop frequency when writing to 2 slave nodes is approximately 1.48kHz. This frequency equals the number of slave nodes written to multiplied by the TosNet cycle time (∼ 333μs). This is due to the current setup where the processors sequentially update data to the FPGA, sending only one outgoing control parameter per TosNet cycle. The results show that TosNet occasionally misses a cycle, creating a max pulse width for the slave of 1.3 ms. The loop speed jitter is determined by subtracting the mean from the maximum pulse width and is shown to be the same as one control loop due to the missed TosNet cycle. The latency is considered to be the mean delay while the latency jitter is the maximum delay minus the mean delay. The delay is created because the two slave nodes are written to sequentially from ROS and only one control parameter is updated per bare-metal loop. The master-slave mean latency is recognizable as the control loop speed plus some data-passing overhead. The master-slave latency is twice the TosNet cycle time in this setup as the flag will first be sent to one slave and then the next, creating a delay of 2 TosNet cycles. The slave-slave latency is the same as one TosNet cycle for the same reason. The jitter is significantly decreased when measuring latency between slave nodes, this highlights the deterministic timing within the TosNet network. The slave-slave latency is not affected by a missed TosNet cycle because all slaves are equally affected.
IV. CONCLUSION
The research shows that using TosNet to ensure deterministic timing across a distributed network comes with the cost of limiting the control loop speed. The presented framework is confirmed to be a viable control option for distributed robotics architectures which require feedback control loop speeds of up to 1.48kHz and can tolerate jitter of up to 673μs (see Figure 2 ).
