, Luca Pezzarossa a , Martin Schoeberl a , Jens Sparsø
Introduction
Packet-switched networks-on-chip (NoCs) have become the preferred paradigm for interconnecting the many cores (processors, hardware accelerators etc.) found in today's complex application-specific multi-processor systems-on-chip [1, 2] and general-purpose chip multi-processors [3, 4] .
5
In the multi-processor systems-on-chip domain, a significant amount of previous research has targeted the generation of application-specific NoC platforms e.g., [5, 6] . With the growing cost of developing and fabricating complex VLSI chips, application-specific platforms are only feasible for very few ultra-highvolume products. In all other cases, a cost-efficient platform must support a 10 range of applications with related functionality. This implies that the hardware resources and the functionality they implement should be as general-purpose and generic as possible, targeting a complete application domain instead of a single application. This view is expressed in the principle provide primitives not solutions that is well-known and accepted in the field of computer architec-15 ture. We adopt this view, striving to avoid hardware resources for dedicated and specific functionality.
The application domain we target is real-time systems. In real-time systems, the whole architecture needs to be time-predictable to support worst-case execution time (WCET) analysis. A NoC for real-time systems needs to support 20 guaranteed-service (GS) channels. Furthermore, many hard real-time applications have multiple modes of operation. To support applications that change between operating modes, the NoC must be able to reconfigure the virtual circuits (VCs) at run-time.
This paper proposes and evaluates a flexible and resource-efficient network 25 interface (NI) for hard real-time systems. Our NoC implements VCs using 2 static scheduling and time-division multiplexing (TDM). A VC provides GS channels in the form of a guaranteed minimum bandwidth and a maximum latency. Furthermore, transfer of data between an on-chip memory and the NoC is coupled with the TDM schedule so that we can give end-to-end guarantees for 30 the movement of data from one core-local memory to another core-local memory.
This architecture avoids both physical VC buffers in the NIs and credit-based flow control among the NIs that are found in most other NoC designs [7, 8, 9] .
Moreover, the usage of TDM schedules leads to a reduced hardware complexity due to the lack of buffering in the routers and due to a static traffic arbitration.
35
The main contribution of the paper and a key feature of this NI is its very efficient support for mode changes. The active schedule can be switched from one TDM period to the next, without breaking the communication flow of VCs that persist across the switch. This contrasts to the AEthereal family of NoCs [10, 9] , which provides similar functionality at a higher hardware cost and longer 40 reconfiguration time.
Our NI can store multiple TDM schedules and it supports instant switching from one schedule to another, synchronously across all NIs. The last TDM period of one schedule can be followed immediately by the first TDM period of a new schedule. This allows VCs that persist across a schedule switch to be 45 mapped to different paths, without any interference to their data flow. This again avoids the fragmentation of resources seen in the previously published solutions [10, 9] , in which no changes can be made to circuits that persist across a mode change and where the set-up of a new circuit is limited to using free resources.
50
If the schedule tables are too small to store all necessary schedules, our NoC can transparently transmit new schedules via the standard VCs. In this way, we avoid fixed allocation of resources for schedule transmission.
The NI presented here is an extension of [11] , which is part of the Argo NoC [12] . A preliminary version of the new NI was published in [13] . In the rest 55 3 of the paper, we refer to the NoC that uses the new NI as the Argo 2.0 NoC.
The main contributions of this paper are:
• support of instant reconfiguration of VCs;
• a more elaborate analysis of the TDM schedule distribution through the NoC;
60
• variable-length packets to reduce the packet header overhead, resulting in shorter schedules and/or higher bandwidth on the VCs;
• interrupt packets to support multicore operating systems;
• a more compact TDM schedule representation in the NIs, reducing the schedule memory requirements;
65
• analysis of the effect on the TDM period length of using GS communication for reconfiguration;
• a discussion on the scalability of the architecture.
This paper is organized into seven sections. Section 2 presents related work.
Section 3 provides background on mode changes, TDM scheduling, and the Argo
70
NoC. Section 4 presents the Argo 2.0 architecture in detail. Section 5 describes the reconfiguration method and its utilization. Section 6 evaluates the presented architecture. Section 7 concludes the paper.
Related Work
This section presents a selection of NoCs that offer GS connections and
75
that support run-time reconfiguration of the GS provided. One approach to implementing GS connections is to use non-blocking routers in combination with mechanisms that constrain packet injection rates. These NoCs are reconfigured by resetting the parameters that regulate the packet injection rates to the new requirements. arbiters of the router and by bounding the injection rate at the source NI.
Connections are set up and torn down by programming the crossbar switches, which is done using best effort (BE) traffic. In Mango, we can observe that the reconfiguration directly interacts with the rate control mechanism in the NIs, the crossbars, and the arbiters in the routers. In addition, the fact that
90
GS connections are programmed using BE packets may compromise the timepredictability of performing a reconfiguration.
The NoC used in the Kalray MPPA-256 processor [15] uses flow regulation, output-buffered routers with round-robin arbitration, and no flow control.
Network calculus [16] is used to determine the flow regulation parameters that
95
constrain the packet injection rates such that buffer overflows are avoided and GS requirements are fulfilled. The Kalray NoC is configured by initializing the routing tables and injection rate limits in the NIs.
IDAMC [17] is a source-routed NoC using credit-based flow control and virtual channel input buffers together to provide GS. IDAMC provides GS con-100 nections by implementing the Back Suction scheme [18] , which prioritizes noncritical traffic while the critical traffic progresses to meet the deadline.
To our knowledge, details on how reconfiguration is handled in Kalray, Mango and IDAMC have not been published. However, we can safely assume that setting up a new connection must involve the initialization and modifica-105 tion of flow regulation parameters, and tearing down a connection must involve draining in-flight packets from the VC buffers in the NoC.
An alternative to the usage of non-blocking routers in combination with constrained packet injection rates is VC switching implemented using static scheduling and TDM. These NoCs can be reconfigured by modifying the sched-
110
ule and routing tables in the NIs and/or in the routers.
5
The AEthereal family of NoCs [10, 9] uses TDM and static scheduling to provide GS. The original AEthereal NoC [19] supports both GS and BE traffic. The scheduling tables are in the NIs and the routing tables are in the routers. Reconfiguration is performed by writing into these tables using BE 115 traffic. Analogously to the Mango NoC approach, using BE traffic may compromise the time-predictability of a (re)configuration. The dAElite NoC [9] focuses on multicast and overcomes this problem by introducing a separate dedicated NoC with a tree topology for the distribution of the schedule and routing information during run-time reconfiguration.
120
The aelite NoC [10] only supports GS traffic and it is based on source routing. is presented in [20, 21] . A more mathematical framework for modeling the dynamic behavior of reconfigurable NoCs is developed in [22] , where it is used to formulate NoC reconfiguration as dynamic optimization problem.
Background
This section provides background on the T-CREST platform, the Argo NoC,
155
TDM scheduling, and reconfiguration for mode changes.
The T-CREST Multicore Platform
T-CREST is a multicore platform to support real-time systems [23] . The vision of T-CREST is to provide a time-predictable computer architecture to enable WCET analysis. The project includes the time-predictable processor
160
Patmos [24] . Several processors are connected to two NoCs: (1) towards the Argo NoC [12] to support message passing between processor local scratchpad memories (SPMs) and (2) to the Bluetree memory tree [25] . This memory tree connects all processors to a real-time memory controller [26] to support timepredictable access to a shared SDRAM memory.
165
T-CREST includes a compiler that supports the instruction set of Patmos [27] . The compiler optimizes for the WCET and interacts AbsInt's WCET analysis tool [28] , which has been extended to support Patmos as well.
The T-CREST platform has been evaluated with an avionic use case [29] . CompSoC platform [31] , and the Epiphany processor [32] . Argo uses a very efficient NI architecture [11] in which the DMA controllers have been integrated with the TDM mechanism in the NI. This integration avoids all the buffering and flow control that is found in most NoCs. In addition, the NI hardware is dominated by area-efficient memory structures in the form of configuration 185 tables.
TDM Scheduling
A parallel application on a multicore platform can be described as a set of tasks mapped to a set of processors. The steps of mapping a real-time application onto a multi-core platform and the generation of a TDM schedule for the By assigning the tasks to the processing nodes, it is possible to derive a core communication graph (Figure 1(b) ). The assignment of tasks to processing H P H H P P P P P P
c1
H P P P nodes must be performed in a way that minimizes the total number of hops for traffic. For this graph, the vertices represent the processing nodes, and the edges represent the set of VCs between each pair of processing nodes.
TDM scheduling shares the resources of the NoC in time between multiple
VCs. The Argo NoC uses the scheduler described in [33] . This approach divides 200 the time into TDM periods, and a period is further divided into timeslots.
The scheduler is an off-line software tool that uses the bandwidth requirements and a description of the NoC topology to compute a schedule that avoids deadlocks and collisions, and that ensures in-order arrival of packets. The static schedule is stored in the NIs of the NoC and specifies the route of each packet and 205 the timeslot in which each packet is injected into the router. We can calculate the minimum frequency that the schedule should run at, from the bandwidth requirements and the created schedule.
Figure 1(c) shows (part of) two TDM periods for the traffic out of the processor P 0 (VC c1 and c4 ). VC c1 has been assigned four timeslots and
210
VC c4 has been assigned two times three timeslots which will use two different (shortest) paths through the NoC (c4' and c4"). The length of our TDM schedule period is typically 10 -100 clock cycles as seen, for example, in Table 2 and in [33] . This is very short compared to the periods or minimum interarrival times of tasks and it is considerably shorter than the execution time of a communicating task. Moreover, the amount of 220 data moved in a single NoC packet is smaller than the size of the sent and received messages. In other words, the scheduler does not schedule for entire messages between tasks, but for small and frequent network packets. This leads to an efficient use of the allocated bandwidth, for both periodic and non-periodic communication flows between the tasks.
225
Our method allows the calculation of the maximum latency of a message.
This calculation also considers the maximum waiting time for the first timeslot assigned to a specific communication channel. Due to the fine granularity of the schedule, this waiting time is negligible with respect to the sending time of the entire message. Message passing between tasks is performed at a higher level in 230 software by using the services provided by the Argo NoC [35] . 
Argo NoC Architecture
As already mentioned, in the Argo NI architecture, the TDM-driven DMA controllers are integrated into the NI. This avoids buffering and flow control and leads to an efficient NI architecture. Both issues represent overhead and dedication of resources to specific purposes, something we aim to avoid.
In the Argo NoC, when multicast or broadcast is needed, it is implemented by setting up dedicated VCs from the transmitter to all receivers. This avoids 265 the cost of the dedicated resources mentioned above, and it is not as inefficient as it may first seem: the NI offers a logical DMA controller per VC, and in combination with the fine grained TDM-scheduling, all the point-to-point communications are interleaved. Thus, the latency of multiple equivalent point-to-point VCs is not necessarily longer than the latency of a multicast or broadcast. NIs to be off by several clock cycles. More details on this aspect are found in [36, 37] .
The Argo router is a pipelined crossbar that routes incoming packets according to the routing information contained in the packet header. Argo supports 285 both synchronous and asynchronous router implementations.
In this paper, where focus is on reconfiguration and NI design, we assume for simplicity a synchronous implementation of the router as shown in Figure 3 .
However, the NoC is compatible with any of the Argo routers [38] . The router shown in Figure 3 , consist of the three pipeline stages: link traversal, header 290 passing unit (HPU), and crossbar. The header of an incoming packet is read in the HPU and, based on the route in the header, the packet is routed to the output port in the crossbar stage.
Reconfiguration for Mode Changes
Finally, we provide some background on reconfiguration for mode changes,
In a parallel application, a mode change is defined as a change in the subset of the executing software tasks during normal operation. Mode changes can be triggered as part of the normal operation of the system or in response to external events [39, p.340]. In normal operation, a mode change is triggered 300 at a well-defined moment in the application execution. As a response to an external event, a mode change is triggered to adapt the system behavior to new environmental conditions. For example, an external alarm may require the execution of a set of tasks to manage specific situations. 
Packet Format

335
The microarchitecture of the Argo 2.0 NI supports three types of network packets: data packets, interrupt packets, and configuration packets. Figure 4 shows the general packet format, it contains a 32-bit header followed by n 32-bit payloads. For configuration and interrupt packets, n = 1 and for data packets n = 1, 2, . . . , 15. The variable length of data packets that allow quite long 340 packets may be used to reduce the header overhead for VCs that require high bandwidth. The two most significant bits of the header contain the packet type.
The next 14 header bits contain the write address in the target SPM where the payload data of the packet will be written. The last 16 header bits contain the route that the packet will take through the NoC. 'PL' stands for packet length.
sequence of packets sent during several consecutive TDM periods. If the sender process needs to notify the receiver when the DMA transfer is complete, the sender can mark the last packet to generate an interrupt at the destination 350 core. We call this a local interrupt, as it is generated and processed in the processor node that receives the message.
Interrupt packets are used to generate an interrupt in a remote processor core, and this feature is needed to support multicore operating systems. When an interrupt packet arrives at the remote core it generates an interrupt. We call 355 this a remote interrupt, as it is triggered by a remote core.
Configuration packets are used to write configuration data into the tables of a remote NI. The data of a configuration packet is written word by word into the tables of the NI.
Compact Schedule Representation
360
The AEthereal family of NoCs and the original Argo NoC use a fixed 3-word packet format. In both designs the TDM counter is incremented once every 
Transmitting Packets
380
The transmit module of the NI, shown in Figure 6 , consists of the following components: the TDM controller, the schedule table, the DMA table, the 
S Reconfiguration controller
To processor
IRQ IF
To processor
Pkt. type TDM counter Figure 6 : A block diagram of our NI. The block diagram is split into two parts: the transmit module and the receive module.
corresponding to c4" in cycle 13. As the implementation is pipelined, the Figure 1 and Figure 5 this feature may be 430 used for VC c4 where packets c4' and c4" may be sent along different routes.
To program a TDM schedule into the NIs, information must be written into the TDM controller, the schedule table, and the reconfiguration controller of every NI. This can be done by the local processors or by a remote master processor sending out configuration packets as explained in the next subsection.
435
The entries in the DMA table can be written and read by the local processor.
Receiving Packets
The receive module shown in Figure 6 consists of two blocks: the receive unit and the interrupt (IRQ) unit. The receive unit processes incoming packets depending on the packet type. Incoming data packets carry the target address 440 as part of the header and the data payload is written directly into the SPM as it is being received. For each packet, the receive unit increments the target address for each write into the SPM. If the data packet is the last packet of a DMA transfer, the target address of the last word is written into the IRQ FIFO for local interrupts.
445
If the received packet is a configuration packet, the data payload is written into one of the NI tables in the transmit module. The data structures in these blocks are mapped into a private address space of the NI and the address of the configuration packet header points into this address space.
If the received packet is an interrupt packet, the data payload is written 450 into the SPM and the target address is written into the remote interrupt FIFO.
The IRQ unit contains two FIFO queues that store interrupts. One queue is for external interrupts communicated using the interrupt packet format. The other is for local interrupts that are generated when the last packet belonging to a message is received.
455
The transmit and receive modules share one port to the SPM. To allow sustained and concurrent 32-bit reads and writes, the SPM uses a double width 20 read/write port. The associated buffering and arbitration is implemented by the SPM arbiter.
The data payload of incoming packets is written directly to its target address.
460
Therefore, there is no need for buffers or flow control in the NI, or for extra DMA controllers in the processor to copy the received data out of the NI. This makes the area of the receive module very small.
Reconfiguration
This section describes how we support mode changes by reconfiguration.
465
Firstly, we present the underlying observations and ideas, and introduce the architectural features supporting reconfiguration. Secondly, we discuss several ways in which the reconfiguration mechanism can be used by an application requiring mode changes.
Key Observations and Ideas
470
As we target domain-specific platforms that support a multitude of applications, our primary concern is to avoid adding resources that are specialized for one use. Therefore, we decided to use the available NoC for reconfiguration commands and transmission of schedules. This contrasts with a dedicated (re)configuration network, as for example used in dAElite [9] . Given a fixed 475 amount of hardware resources for the NoC, a dedicated reconfiguration NoC establishes a static split of bandwidth between regular traffic and configuration traffic. We prefer to use all hardware resources to provide as much total bandwidth as possible, leaving it to the application programmer to allocate bandwidth for schedule transmission and regular traffic. can be used to support reconfiguration and mode changes in several ways, as 520 described at the end of this section.
Reconfiguration Controller
To support reconfiguration, we add a reconfiguration controller to the NI and connect the receive unit to the configuration bus, as seen in Figure 6 .
Connecting the receive unit to the configuration bus allows the receive unit at the end of the TDM period. In this way, the flushing of the network at the end of a TDM period and the filling at the beginning of the next TDM period is overlapped. Because of this, and because an NI need two clock-cycles to process 550 the reconfiguration command, a reconfiguration request issued by the master processor in TDM period i can take effect at the earliest from TDM period i + 3. This argument assumes that the period of the TDM schedule is longer than the 6 + 2 cycles. In the rare case where the schedule is shorter it can be unrolled two or more times within a TDM period.
555
All the tables that contain configuration data in the NI are connected to the receive unit through the configuration bus. The receive unit writes the incoming NoC configuration packets into these tables. Therefore, we can also use the NoC to transmit new schedules from the master core to the slave cores by sending the schedules using configuration packets. This transmission is transparent to 560 the slave core.
Using the Reconfiguration Features
The reconfiguration mechanism described above can be used to implement reconfiguration in several ways when an application requests a reconfiguration:
1. In cases where the schedule table and the DMA table have sufficient ca-565 pacity to store all possible configurations, these can be loaded into the NIs when the platform is booted. In this way, a master only needs to send reconfiguration requests to the NIs, and this method has the lowest reconfiguration latency.
2. Another approach is first to transmit the new schedule and then send a 570 reconfiguration request. As seen in the next section, the time required to first distribute and then activate a new schedule is relatively short and comparable to the reconfiguration seen in other NoCs. 
Evaluation
This section evaluates the proposed architecture in terms of six criteria: (i) the TDM period extension due to statically allocating VCs for reconfiguration,
(ii) the impact of variable-length packets on the schedule period, (iii) the storage size of the schedule in the schedule For space reasons, we leave out the H264-1080p benchmark, as its communication pattern is identical to that of H264-720p.
Virtual Circuits For Configuration
This subsection evaluates the TDM period extension due to statically allocating VCs for configuration packets in the application-specific schedules from Table 1 shows the TDM period of the baseline schedules without VCs for configuration and of the baseline schedules plus the added VCs for configuration.
The VCs for configuration connect the master core to all slave cores in the platform.
615
When the traffic patterns from the MCSL benchmarks are mapped to the cores of the platform, we choose the core with the smallest outgoing bandwidth as the reconfiguration master core. We believe that this mimics a real application best, since in most cases the reconfiguration master would not have a high communication load. The Sparse benchmark in Table 1 reaches this I/O bound. Table 1 shows that the TDM periods of some benchmarks are only increased by two clock cycles when VCs for configuration are added. These two slots are the two words of the configuration packet from the master to the node that has the 630 highest incoming bandwidth requirements. The TDM periods of the All2all and FFT-1024 benchmarks are increased considerably, because all cores have high outgoing bandwidths.
Variable-Length Packets
This subsection evaluates the TDM period reduction of the MCSL bench-635 marks, when allowing variable-length packets. We let the scheduler route fewer packets with more payload words, such that the number of payload bytes during one TDM period is the same as without variable-length packets. In Table 2 
Schedule Storage Size
This subsection evaluates the size of the schedules in the schedule schedule table at the same time, avoiding the need to transmit a new schedule from the master core to all the slave cores through the NoC.
660
The minimum and maximum number of bytes that are required in the DMA between the schedules of an application.
In the original version of Argo and in Argo 2.0, the number of entries in the DMA tables of each node is the same, since it is determined by the application.
Therefore, we only compare the number of schedule table entries in each node of the original version of Argo against the number of entries in Argo 2.0, in table 4.
675
The average reduction in the schedule table entries of each node is 58 %, this improvement is due to the new and more efficient architecture of the Argo 2.0 NI.
Worst-case Reconfiguration Time
This subsection gives an overview of how to calculate the worst-case reconfiguration time T recon of a new schedule C new . T recon depends only on the currently executing schedule C curr . From Figure 7 we see that:
FFT-1024 222
Fpppp 285
RS-dec 231
RS-enc 219
H264-720p 234
Robot 381
Sparse 90
All2all 225 Where, P curr is the TDM period of C curr . We calculate T recon of the MCSL 680 benchmark and an All2all schedule, shown in Table 5 . An application programmer needs to add the software overhead of setting up DMA transfers and triggering a reconfiguration request to the numbers in the table.
For the benchmarks presented in Table 5 , the T recon is between 135 and 381 clock cycles, depending on the current benchmark. For the maximum number 685 of entries shown in Table 3 and assuming a schedule table of A NoC schedule is different in each NI and with the compact schedule representation that we evaluated in subsection 6.3, the schedules for each NI might be of different sizes. The T st of transferring C new to the slave NIs is the maximum of the individual worst-case latencies for each slave NI. We calculate the T st as:
Here i is the slave NI from the set N of nodes in the platform, L i curr is the worst-case latency of waiting for a time slot to slave i, S i new is the number of words of C new to be sent to slave i, P i curr is the number of words that can be sent in one packet towards slave i in C curr , B i curr is the bandwidth of C curr towards 715 slave i, L curr is the TDM period of C curr , and L i chan is the NoC latency in clock cycles to slave i.
We apply (2) to calculate the worst-case schedule transmission time between the schedules of the MCSL benchmark and an All2all schedule, shown in Table 6 . We see that the T st in Table 6 is between 519 and 3822 clock cycles. The
720
Sparse benchmark, as C curr , results in the lowest T st , as Sparse has the shortest TDM period, and thus the highest bandwidth to the slaves.
In the rare case that a schedule needs to be transmitted to the slave NIs, our approach is still comparable. The maximum schedule transmission time in Table 6 is 3822 clock cycles. For Argo 2.0, this transmission represents the 725 transmission of 255 VCs. In this time interval, AEthereal and dAElite can only set-up 16 and 64 VCs, respectively, this does not include tearing-down VCs.
Hardware Results
This subsection presents the evaluation of the Argo 2.0 FPGA implementation presented here with respect to hardware size and maximum operating 730 frequency. All the results presented in this section were produced using Xilinx ISE Design Suite (version 14.7) and targeting the Xilinx Virtex-6 FPGA (model XC6VLX240T-1FFG1156). All the synthesis properties were set to their defaults, except for the synthesis optimization goal which were set to area or Table 7 shows the comparison of the Argo 2.0 implementation to the TDMbased NoCs aelite and dAElite [9] , and to the IDAMC [17] NoC that uses a classic router designed with virtual channel buffers and flow control. The table   shows The results in Table 7 show that overall the Argo 2.0 NoC implementation 745 is smaller than the other NoCs. The results also show that the numbers for the IDAMC are much higher that the aelite, dAElite and Argo 2.0. This is due to its use of virtual channels buffers and the flow control mechanisms.
The results in Table 7 shows that the maximum frequency f max of the As mentioned, the Argo 2.0 NoC is designed to be used in a domain-specific platform. Therefore, Table 8 , we used BRAM to implement the tables in the NI.
760
In terms of f max , the Argo 2.0 5-port router implementation optimized for area is around 33% faster than the 3-port one, since it uses BRAM instead of distributed memory (implemented using FFs), and around 7% faster than the original Argo.
Scalability
765
The results in the previous subsections are based on a 16-core platform. As the number of cores in the platform increases, we consider the hardware size and the TDM period to evaluate the scalability of the new reconfiguration capability of Argo 2.0. We consider the hardware size of the NoC per core and the extension of the TDM period due to statically allocating VCs for reconfiguration. As the number of cores increase, the hardware size of one NI and one router increases due to the number of required entries in the schedule table and in the DMA 
Conclusion
This paper presented an area-efficient time-division multiplexing networkon-chip that supports reconfiguration for mode changes. The NoC addresses hard real-time systems and provides guaranteed-service VCs between proces-795 sors. The NI provides reconfiguration capabilities of end-to-end VCs to support mode changes at the application level. For the set of benchmarks used for evaluation, we showed that the TDM period overhead of statically allocating VCs for configuration was on average 10%. Furthermore, we showed that our compact schedule representation reduces the memory requirements by more than 50% on 800 average.
We evaluated an implementation of the proposed architecture in terms of hardware cost and worst-case reconfiguration time. The results show that the proposed architecture is less than half the size of NoCs with similar functionality and that the worst-case reconfiguration time is comparable to those NoCs.
805
If the new schedule is already loaded in the schedule table, the worst-case reconfiguration time is significantly shorter.
Acknowledgment
The work presented in this paper was funded by the Danish Council for Independent Research | Technology and Production Sciences under the project 810 RTEMP, 2 contract no. 12-127600.
Source Access
The presented work is open source and can be downloaded from GitHub and built under Ubuntu as described in the Patmos reference handbook [41,
Chap. 6].
