Abstract-Lightweight Real-Time Operating Systems have gained widespread use in implementing embedded software on lightweight nodes. However, bare metal solutions are chosen, e.g., when the reactive (interrupt-driven) paradigm better matches the programmer's intent, when the OS features are not needed, or when the OS overhead is deemed too large. Moreover, other approaches are used when real-time guarantees are required. Establishing real-time and resource guarantees typically requires expert knowledge in the field, as no turn-key solutions are available to the masses.
I. INTRODUCTION
Resource constrained platforms such as the ARM-7 and ARM Cortex Mx/Rx architectures have gained widespread use for cost, size and power efficient implementations of embedded real-time applications. The Artemis Research Agenda predicts the number of embedded processors to be over 40 billion by 2020 [1] ; their functionality is to a large extent dependent on embedded software. Unlike programs for general-purpose processors, embedded software typically expresses interaction with internal and external peripherals and typically operates under resource and timing constraints.
Lightweight Real-Time Operating Systems have gained widespread use in implementing embedded software on such platforms. A prominent example thereof is FreeRTOS [2] , whose success is (arguably) due to its familiar API, documentation, wide platform support, mature/stable code base, open source licence etc. However, 1) bare metal solutions are chosen, e.g.,
• when the reactive (interrupt-driven) paradigm better matches the programmer's intent, • when the OS features are not needed, or • whenever the OS overhead is deemed too large; 2) other approaches are used when real-time guarantees are required, e.g., [3] and [4] .
Co-funded by EU-ERDF, 978-1-4799-0658-1/13/$31.00 2013 IEEE Establishing real-time and resource guarantees typically requires expert knowledge in the field, as no turn-key solutions are available to the masses. In this paper we set out to bridge the gap between bare metal solutions and traditional Real-Time OS paradigms. Our goal is to meet the intuition of the programmer and to provide a resource-efficient (w.r.t. CPU and memory) implementation with established properties, such as bounded memory usage and guaranteed response times. We outline a roadmap for RealTime For the Masses (RTFM) and report on the first step: an intuitive programming API backed by an efficient scheduler suitable for resource and timing analysis.
Our approach is based on the reactive programming paradigm, where run-to-completion jobs are triggered either from the system's environment or programmatically inside the system. Since such systems are inherently concurrent, critical sections are typically introduced to avoid race conditions on shared variables or other system resources. In the context of real-time applications, reactions to events are typically associated with priorities or timing constraints expressed in terms of deadlines.
By associating events to interrupts and implementing the corresponding reaction (job) directly in the respective interrupt handler (ISR), reactive (event/interrupt-driven) systems can be straightforwardly implemented and efficiently scheduled directly by the hardware. However, in order to support shared resources between jobs we need to adopt a resource management protocol. We choose the Stack Resource Policy (SRP) [5] , for which we can construct an efficient implementation that exploit commonplace interrupt hardware. Moreover, SRP brings additional benefits of deadlock-free execution on a single stack, bounded priority inversion, etc. SRP has been extensively studied over the last decades and a rich set of methods for system analysis have been developed.
In this paper, the programming model is restricted to SRP with single-unit resources, static priorities and sporadic tasks (deadline < inter-arrival time). These restrictions allow us to develop a set of kernel primitives that utilise the underlying interrupt hardware for static priority preemptive scheduling under SRP. In particular, we show that the job request and admission mechanisms for SRP can be handled with zero overhead on common micro-controllers that support nested, priority-based interrupt handling.
Furthermore, we report on the development of KCC (Kernel Configuration Compiler), a tool to automatically derive resource ceilings and to synthesise a target-and application8th IEEE International Symposium on Industrial Embedded Systems (SIES 2013) specific kernel configuration from an XML system model (derived from a C program). Additionally, the tool performs basic stack depth analysis and, in order to determine schedulability, response time analysis given inter-arrival and worst-case execution times for jobs and critical sections.
We characterise the kernel primitives for the marketdominating ARM Cortex M0/M3 architectures and find that job and resource requests introduce only a few bytes of memory overhead, and the CPU overhead is in the submicrosecond range even for instep M0-MCUs like NXP/LPC11. In comparison, overhead is less than a tenth of the eventing mechanism in FreeRTOS; in fact, the proposed scheduler is hard to beat even by means of manual coding.
II. REAL-TIME FOR THE MASSES, ROADMAP
With the ambition to facilitate real-time programming for the masses outside the relatively small and domain-specific communities, we propose the following steps:
Step 1 Basic Kernel Primitives
• Choosing a suitable task model for RTFM. We choose the task model of SRP with jobs and resources.
• Specifying an intuitive programming API to RTFM kernels allowing both direct application development and the use of RTFM as a micro-kernel for other OS/RTOS. To this end, we propose a platform-independent C code API, compatible with gcc-based tool chains.
• An efficient implementation of kernel primitives for commonplace hardware. Here we present kernel primitives for static priority SRP scheduling for ARM Cortex M0/M3.
• A tool for target-and application-specific kernel configuration with support for a basic resource and schedulability analysis. Here we present the KCC tool, supporting response time and basic stack depth analysis, producing kernel configurations for ARM Cortex M0/M3. Step 2 Infrastructure
• Automatic generation of XML system models from C programs.
• Implementing virtual interrupt sources necessary to support platforms with insufficient hardware capabilities.
• Implementing support for job requests with arguments (messages) and virtual timers (for postponed messages).
• Extending basic system analysis to exploit additional information, such as task offsets.
Step 3 Other Programming Models and True Parallelism
• Mapping of more elaborate programming models to RTFM. These include TinyTimber [6] , the REKO framework [7] , and TinyOS [8] .
• Investigating other scheduling approaches for RTFM, including dynamic priority SRP, hierarchical SRP, and M-SRP for multicore systems.
• Extending basic program analysis and kernel configuration to support the additional programming models and scheduling approaches. The presented approach is influenced and motivated by previous work on Concurrent Reactive Objects and their implementation in TinyTimber [6] . The first step covers scheduling primitives with a simple C code API and basic tools. The second step further facilitates program development and increases the applicability of RTFM. Support for virtual timers and job requests with arguments forms the necessary infrastructure for mapping more elaborate programming models to the RTFM kernel, offering further abstractions and implementing OS-like services.
Other lightweight approaches include EDFI [9] . The EDFI approach is similar to ours in that it adopts the notions of SRP. However, it relies on (dynamic) EDF-based scheduling, which precludes exploiting static hardware priorities for efficient scheduling. Another approach, combining non-preempting background tasks with interrupts, is used in TinyOS [8] , which allows for a bounded stack depth. However, TinyOS provides no native real-time support, and hence the performance of a system relies entirely on the programmer's ability to split tasks into sufficiently short sections. In [10] , a method to preemptively execute native TinyOS tasks is suggested. This method could be deployed using the proposed RTFM SRP kernel and is a subject of future work. Yet another mechanism for single-stack execution is proto-threads, used in e.g. Contiki [11] . However, there is no notion of timing or priorities for proto-threads, hence no native real-time support.
III. TASK MODEL
We adopt the task model used by Baker [5] :
J A job J is a finite sequence of instructions to be executed on a single processor. J J denotes both a job execution request and its execution. p(J ) Defines the (base) priority of J . p(J ) > p(J ) indicates that expediting J is sufficiently important that completion of J is permitted to be delayed. π(J) Defines the preemption level of a job, defined so that a job J may preempt J only if π(J ) > π(J). R A nonpreemptable resource R can be claimed by a job for the execution of a critical section.
We restrict the SRP model from [5] to single-unit resources.
Following the abstract resource ceiling definition [5] , we define:
R The (static) current ceiling of resource R,
where L(R) is the set of jobs that (may) request R. Π The (dynamic) current system ceiling,
where J is the currently executing job (if any), and R claimed the set of currently claimed (outstanding) resources.
Under SRP a job execution request for J is blocked until p(J ) has the highest priority of all outstanding jobs execution requests and Π < π(J). Listing 2 gives an example for the LPC11x (ARM Cortex M0), defining two jobs (j1 and j2) and their priorities (1, 2), 2 being higher. Since both jobs request r1, the resource ceiling is 2. / * E n a b l e HW s o u r c e s * / } V. STATIC PRIORITY SRP SCHEDULING We assign static preemption levels π(J) = p(J), hence we may use π and p interchangeably in the following. We assume that the hardware supports prioritised interrupt nesting, which (as described below) allow us to view the interrupt hardware as a preemptive static priority scheduler for our system. For the discussion, we exemplify the kernel primitives (in C/assembler) by notions of the ARM Cortex Mx architectures, (the kernel primitives can be devised similarly for other architectures).
IV. PROGRAMMER'S API
We define a mapping H from priorities p to interrupt priorities H(p), where a higher p gives a higher priority on the underlying hardware (typically 0 is the highest hardware priority). We make the assumption that the number of priorities and ISR vectors (interrupt sources) are sufficient. Interference (due to preemption) for a job J j is defined as sum({E(J i )|p(J i ) > p(J j )}) [12] , where E(J) is the execution time for J, under the assumption that we select the oldest highest priority job first. However, typical interrupt controllers choose the ISR on basis of the vector position, rather than the time of arrival (for reasons of hardware implementation complexity). This has the implication that the interference is in the worst case sum({E(
B. Resource request/release
Under SRP, when claiming a resource R, the system ceiling Π should be set max( R , Π) for the duration of the critical section. We present two efficient approaches to manipulate the system ceiling.
1) Source Masking, All Cortex Mx: -An interrupt IRQn(J) may be taken only if the corresponding source is enabled. Hence, we can emulate the effect of setting Π = x by disabling sources corresponding to jobs J with p(J) ≤ x. Listing 3: SOURCE MASKING gives an example implementation for the M0/M1 architecture (having 4 priority levels).
To ensure that interrupt masking has taken effect before entering the critical section, reordering directives [13] and memory and instruction barriers may be required [14] . A safe approach is to enforce barriers after (potentially) raising, and before restoring, the priority. This approach is highly portable to architectures supporting centralized interrupt mask, (the LockMask needs to be adapted to the number of levels of the target architecture).
2) Global Priority, Cortex M3 and above: For architectures that directly support changing the base priority (global priority level) efficient implementation is straightforward. Only sources with higher priority than the current base priority can be admitted. In effect, the system ceiling for SRP scheduling is implemented directly by the hardware).
VI. RTFM UTILITIES
The presented RTFM API together with the static priority SRP kernel primitives is sufficient to implement reactive software onto lightweight platforms. However, using the API requires the programmer to manually derive resource ceilings and to establish resource usage and real-time properties. To this end, we have developed a Kernel Configuration Compiler (KCC) that, given an XML defining the target architecture and chosen scheduler, the jobs, their resource requests, and environment bindings, produces a target-/application-specific kernel configuration. Furthermore, given WCET for jobs and critical sections, deadlines and inter-arrival times, the tool performs a schedulabilty test through basic SRP response time analysis and a basic, yet safe, stack memory analysis. The KCC will be described in a forthcoming paper. VII. EVALUATION AND FUTURE WORK The kernel primitives have been implemented for the NXP LPC11c24 (M0) / LPC1769 (M3) platforms and tested with both gcc v4.6 and v4.7. The implementation is expected to work for all other M0/M3 MCU's, hence a large portion of the embedded market is already covered. Table I shows a preliminary evaluation of CPU usage (in cycles) and memory overhead (in bytes). Devices operate at maximum speed of 48 and 120 MHz, respectively. The test programs were compiled using gcc with the option -Os. Job Latency/Job OH denotes the cost of invoking a higher priority job from a lower, Lock OH/Unlock OH reports the overhead for resource management, while Critical Section ID reports the number of cycles with disabled interrupts used for resource management. The footprint excludes the CMSIS SystemInit (368 bytes). RTFM overhead is constant and hard to beat even with manual coding and amounts only to the necessary and sufficient protection mechanisms presented.
