This paper presents the design and implementation of an automated gateware 1 discovery mechanism using generic reconfigurable computing hardware and toolflows. This mechanism was devised in an effort to build a software system with the goal to improve performance and reduce software development time spent on operating gateware pieces. It accomplishes this by reusing existing device drivers in the framework of the chosen technology, namely OF (Open Firmware).
INTRODUCTION
FPGAs (Field Programmable Gate Arrays) are semiconductor devices containing programmable logic components and programmable interconnects that can be reconfigured many times with different functions. Over the years there has been an increased use of hybrid FPGA-CPU architectures to speed up computationally intensive algorithms and problems -this constitutes the broad area called "reconfigurable computing". The use of FPGAs in conjunction with microprocessors allows reconfigurable computers to offer much of the flexibility of a general-purpose computing architecture, but at the same time provide many of the performance benefits of having an algorithm implemented in a hard-wired chip. A recent example of hardware platforms that utilise the hybrid Xilinx FPGA-PowerPC architecture are the ROACH 2 (Reconfigurable Open Architecture Computing Hardware) series of hardware boards. The ROACH is an example of a FPGAbased reconfigurable computer which is managed by an offthe-shelf operating system, here Linux. In a DSP (digital sig- * Thanks to SA SKA bursary for funding 1 Gateware is a University of California, Berkeley coined term for design logic that goes into the FPGA 2 ROACH is a South African collaborative project in conjunction with University of California research group at Berkeley: CASPER (Center for Astronomy Signal Processing and Electronics Research) and NRAO (National Research Astronomy Observatory) group nal processing pipelined) approach, gateware design implementations like correlators and spectrometers (instruments) needs the right software (device drivers) to detect the function implemented and operate on it. For each gateware design implementation, writing a piece of software for that particular piece of instrument can be an arduous task.
The objective of this paper is to present the design and implementation of a mechanism which automates gateware detection for reconfigurable hardware designs and simplifies the task of writing software for each of the gateware image programmed. Projects relying heavily on reconfigurable computing resources may find this mechanism useful in saving time and effort spend on implementing new ways to interact with FPGA designs.
BACKGROUND
This section provides a brief background on the device detection mechanisms available in conventional computer platforms and FPGA-based software systems. We conclude the section by mentioning the reasons to choose OF (Open Firmware) and Linux for accomplishing the goal of automating gateware detection on ROACH platform.
Device Detection in Conventional Computer Platforms
Most x86 based machines are based on PCI (Peripheral Component Interconnect). PCI configures and makes device detection possible in a straightforward way. The kernel scans the PCI bus and determines what devices are attached along with their respective addresses. For this PCI relies on configuration headers to be made available for each device attached to the bus. The devices that are not enumerable (devices not attached to bus) are made available at known locations. This information is hard-coded for the kernel to use. In short, PCI builds information about the system with data available from PCI configuration registers.
ACPI (Advanced Configuration and Power Interface) makes BIOS (Basic Input-Output System) a responsible layer for passing underlying hardware information and together with information from ACPI tables, OS (Operating System) assembles the list of devices. It relies on AML (ACPI Machine Language) methods to be written by developers that gets included as definition blocks. This makes it more sophisticated for device description. ACPI assembles system information with data from ACPI tables, mainly definition blocks in the DSDT (Differentiated System Description Table) table.
OF provides firmware support and makes device discovery simpler [1] . It gathers information about the system by assembling the device tree. Some buses / interconnects that are not self-enumerable also get represented in the device tree format. The information is not hardcoded, it is represented in this format and gets passed onto the kernel. The kernel uses the information supplemented for initialisation, device detection and binding appropriate driver to device.
Device Detection in FPGA-based Software Systems
Some of the FPGA software systems like ReConfigMe [2] , GATOS [3] dedicate their efforts to solving the problem of dynamic FPGA resource allocation, partitioning and memory management of FPGA resources or virtualization between software and hardware tasks on FPGA-based systems. For a larger project like MeerKAT (a precursor instrument for the SKA (Square Kilometer Array), where high speed data is handled, the capabilities of an OS to perform is very important. CASPER (Center for Astronomy Signal Processing and Electronics Research) had early adopted BORPH 3 (Berkeley Operating System for Reprogrammable Hardware) [4] on its reconfigurable hardware platforms BEE2 (Berkeley Emulation Engine2) and ROACH. Even though the usability of OS increased, it was noticed that the modified Linux kernel, BORPH was using a significant number of system calls for performing a read / write operation [5] . This is not considered optimal. A solution for enhancing performance would be conforming to the Linux device driver framework where the complex abstraction is moved away from the kernel to the userspace.
Choice of OF and Linux
From the hardware detection mechanisms, we see that OF provides a mechanism for describing a device in a format that is easy to understand and record -namely the device trees [6] . Once the device is described, we require a mechanism to operate the device. Hence we explored some of the software detection mechanisms for FPGA, namely BORPH and Linux, commonly used in the radio astronomy field. We prefer to use Linux because we want to bind the device described in device 3 BORPH is an operating system that extends a standard linux kernel to include support for FPGA resources in a reconfigurable computer tree to a device driver. The advantage of using device drivers, whether existing or custom-written for operating a device are the following. It reduces the effort spend on writing software for a particular device. One can benefit from the growing database of device drivers that are supported. One can also take advantage of the performance benefits of a Linux system by conforming to the unified device driver framework.
METHODOLOGY
A requirements review was conducted upon the user requirements generated for the meerKAT project and were refined through an iterative process relying both on literature review and discussion with experts.
The finer user requirements are listed below:
• Investigate OF infrastructure with a view to extending it to include suitably designed gateware images and provide an architectural design.
• Describe gateware in device tree and verify detection and loading of appropriate device driver by the OS.
• Understand OF platform bus initialisation and probing capabilities to ensure correct loading of modified device drivers.
• Generate gateware designs that can be applied to radio astronomical instruments.
• Write device drivers to operate new instruments described and detected from the device tree.
• Implement the software and utilities required to demonstrate the concept.
DESIGN
A system architecture diagram elaborating the design concept on ROACH platform is illustrated in Figure 1 . The various design stages of the diagram are explained below, from generating gateware to programming the FPGA to device detection at various levels of the software. The PowerPC microprocessor configures the FPGA on the ROACH with user gateware designs and provides it with a functionality that can be described in a textual format called DTS (Device Tree Source) files [7] . U-Boot, a bootloader for the PowerPC system, probes the supplemented file and augments it to the list of physical devices attached to the PowerPC which gets represented as a device tree. This augmentation of functionality programmed onto the FPGA extends the database of devices available to the bootloader to operate and pass onto the kernel thereby providing run-time configuration support. When the Linux kernel gets loaded, it is provided with a Flattened Device Tree (FDT), which contains information on Fig. 1 . Automated gateware mechanism system architecture on ROACH hardware board the hardware devices that it needs to operate. The actual probing and detection of devices within the OF framework starts and the corresponding device drivers are fetched to operate on the gateware designs implemented on the FPGA. The user applications now can communicate with gateware designs as if they were physical devices, using the same unix device node abstraction.
Gateware Design
The CASPER approach uses the MSSGE 4 (Matlab, Simulink, System Generator and EDK) toolchain to generate gateware for the FPGA. In simple terms, an application model file described in a graphical programming language called Simulink is passed through the MSSGE toolchain to generate bitstream, also referred to as gateware image, gets programmed on the FPGA. The FPGA changes its functionality as different pieces of gateware are programmed.
FPGA Programming Design
The PowerPC is the CPU (Central Processing Unit) of the ROACH hardware platform. It interacts with the FPGA using the selectmap interface [8] . The FPGA can be programmed using selectmap interface at different software levels. At the most basic level, FPGA programming can be achieved by using JTAG tools. At the bootloader and kernel level, we implement the selectmap interface inorder to program the FPGA.
Device Detection Design
This is the core design stage where we integrate OF components and concepts into our system architectural design. The gateware programmed on the FPGA is the device that gets represented in the form a device tree at the bootloader level.
From the bootloader, the device tree gets converted into a kernel recognised format called the FDT.
Fig. 2. Graphical representation of PPC440EPx device tree
The device tree is the mechanism by which we pass information about an underlying platform (ROACH) to the bootloader and kernel. It is the fundamental OF data structure that imparts capabilities of device database extension. If we translate PPC440EPx functional diagram [9] to OF device tree, we derive the following graphical representation. Each device gets represented as a node in the tree structure. The nodes that are arranged hierarchically provide structure to the system. In the above figure, the PLB-OPB (Processor Local Bus -OnChip Peripheral Bus) bus infrastructure hosts a number of devices that gets represented as nodes of a tree. New devices are added to this nodal structure depending on its place on the bus infrastructure thereby extending the tree of devices available to the system. This paper emphasises on describing gateware pieces programmed on the FPGA and thereby extending the database of devices resident in a platform. As depicted in Figure 3 , the EBC (External Bus Controller) bus infrastructure shaded in blue is added to the device tree, and the yellow shaded blocks are the gateware implementations that we describe in device tree format. This diagram extends the device dictionary available to U-Boot and Linux for probing, detection and operation of devices through loading of appropriate device drivers.
Fig. 3. Graphical representation of ROACH device tree
The Linux kernel has been supporting device trees for a long time, with OF being increasingly used in PowerPC platforms. OF provides this mechanism to the kernel to discover and register devices dynamically, thereby eliminating the need to hard code details of the underlying devices. Linux with OF support code and information from the device tree looks for the devices and loads the device drivers to operate the device identified.
IMPLEMENTATION AND RESULTS
This section elaborates on how the design components get assembled together or modified to develop an automated gateware detection system using OF.
Gateware Implementations

Serial loopback implementation
The serial loopback gateware implementation was done and tested on the ROACH platform. The FPGA design chosen was a Xilinx UARTLite OPB serial core module pulled from EDK library. The design is implemented such that TX and RX are tied to create a loopback mode and the test involves transmitting a character and receiving the same. The control and status registers serve for enabling interrupts / FIFOs and capture errors in the process. The implementation imparts a UART personality to the gateware design implying we can treat the design on FPGA to be a serial device to operate on. 
Simple data capture implementation
The data capture gateware implementation was done and tested on the ROACH2 5 platform. The implementation em- 5 Next generation ROACH hardware board built by meerKAT digital back end team ulates an audio device running onthe FPGA. The iADC 6 attached to the ROACH2 board acts as a soundcard, capturing samples (in this case noise) and multiplying it with the gain we set before outputting the data to the snapblocks. The embedded PowerPC runs the user audio application that sets the gain and reads back the data for visualisation through the device nodes. 
Device Description
The device tree, an OF contribution, is a simple tree structure consisting of nodes and properties with nodes directly corresponding to devices. The respective platform device trees are stored in a .dts format under arch/powerpc/boot/dts directory. A device tree is assembled with information about the system to be used. For this implementation, amongst others, ROACH2 platform consists of one 32-bit PowerPC CPU, processor local bus, interrupt controllers, four userconfigurable serial ports, NOR flash and gpio controllers. We start by building a skeleton of the device tree with a unique platform name that consists of manufacturer and model name. In Figure 6 , we see the compatible property has a value "kat,roach2". This provides a unique identification to the device tree. Linux uses compatible value information to choose the right platform to run on. The chosen node, the last node in the device tree, is a special node that does not represent any device. It stores environment information like the boot arguments or contains information to choose the default input / output devices. In our implementation, the bootargs environment variable value gets built in statically with information from DTS and gets passed to the kernel during boot-time. It can also be changed dynamically using U-Boot FDT commands explained in next subsection 5.3.
As mentioned above, data can be supplied to Linux in the form of OF device tree which gives the flexibility and con- Fig. 6 . ROACH device tree entries venience to decribe a device. Gateware implementations programmed on the FPGA are described in a similar manner with nodes and properties. Additional properties can be mentioned like the xlnx,family which is useful for distinguishing between different FPGA families.
Device Discovery and Enumeration
Fig. 7. Adding a device dynamically on FDT blob from UBoot
The extension of device tree source dynamically based on OF properties and methods is illustrated in Figure 7 . In our example we want to enter dynamic information on serial Xilinx UARTlite device "SERIAL DEV" by parsing through the FDT blob. We can either enter the serial device UARTlite entry into dts source and pass it to the kernel or we can add the serial device entry on the fly by inspecting similar serial device entries. "fdt print /plb/opb/serial" lists the properties associated with the device ns16550 serial port. We create a new device by issuing "fdt mknode", set properties and values for the node created using "fdt set". We can confirm the node creation and its associated properties by listing the device by issuing fdt print "/plb/opb/SERIAL DEV".
Device Operation and Control
Linux kernel uses OF functions to discover and register devices dynamically with the information from the FDT. Most of the physical devices registered are operated with a device driver. The device driver will match with compatible property and will determine how to configure the device based on the matching description in the device tree. Figure 8 shows the output of the UARTlite device driver with debug messages enabled while the serial device being operated. The output of the driver with and without gateware bearing serial personality is displayed for comparison. The sample output with serial bit file programmed demonstrates the behavior of a serial device with loopback mode enabled and interrupts disabled. The transmitted character "A" is displayed from the receive buffer. The ULITE STATUS RXVALID flag becomes active upon receiving the character. 
Builtin serial UARTLite driver
Custom built audio driver
A gateware with an audio device personality is programmed on the FPGA. The context of programming audio device drivers in Linux applies now. The iADC acts as a capture device and we need a device file namely "/dev/dsp" to access the sampled data, here not audio but radio astronomical signals . We want to tune an audio device with gain setting, in radio astronomy terms we want to amplify or reduce the signal strength by setting the gain value. The main function of a mixer in audio devices is to set gain level. This also requires an audio device file, "/dev/mixer". Reading the "/dev/dsp" device file activates the A/D (Analog to Digital) converter for signal capturing. Analog data is converted to digital samples and stored into BRAMs (Block RAMs). When a sound application program like aumix tries to use the mixer device, the data stored in BRAMs is displayed. The kernel functions assembled together in Table 1 provides a complete audio device driver for controlling the gateware programmed on the FPGA with an audio device personality. One of the useful feature of Linux is that it provides capability to load and unload driver from userspace during run-time. This is very useful for testing the driver and its associated functions. This is accomplished using the Linux commands insmod and rmmod. The equivalent kernel functions for insmod and rmmod command, bram init() and bram exit() can be seen in Table 1 . The aumix user space audio application sets the gain using its controls and the corresponding gain value is used as the input gain value for running the FPGA design. As the gain value changes, the output data streamed from BRAM also changes. This example demonstrates that one could use exisiting user space applications by providing a custom-built audio driver to run the FPGA design. This reduces the software development time as existing userspace applications are used.
CONCLUSIONS AND FUTURE WORK
The present implementation emphasises on providing the basic infrastructure for building an automated gateware detection system using OF. We used simple gateware designs like the serial and audio implementations for demonstrating proof of concept of device detection. The serial device implementation establishes the concept that gateware programmed on the FPGA can be treated just like any other physical peripheral, OF device trees can be extended to describe the gateware and further it can be operated on by an existing device driver that gets loaded by the Linux kernel during run-time. The audio device implementation establishes the concept that OF probe and infrastructure can be utilised to load custom device drivers for operating instruments programmed on the FPGA.
It will be interesting and useful to extend the infrastructure by testing complex gateware designs, especially instrument designs that are used for radio astronomy like correlator and spectrometer. More contributions in the form of Linux kernel device drivers to support these instrument designs would also be useful not only for radio astronomy but also for the open source community.
