INTRODUCTION
The port-based object (PBO) is a new software abstraction for designing dynamically reconfigurable realtime software (DRRTS). It forms the basis of a programming model that provides very specific guidelines for control engineers to create and integrate DRRTS components, yet is flexible for most any type of control application. The PBO is supported by an implementation based on domain-specific real-time operating system (RTOS) mechanisms. Together, the PBO and RTOS mechanisms form a software framework which supports the design and implementation of sensor-based control systems.
The software framework was developed as part of the Chimera RTOS Project [35] in the Advanced Manipulators Laboratory at Carnegie Mellon University (CMU). It is an offshoot of a project to develop reconfigurable robots [26] . We refer to the theory behind this framework as the Chimera Methodology. That methodology, and the corresponding RTOS mechanisms needed to support it, are the subject of this paper.
The following goals for a robotics programming environment were initially set forth by several robotic projects at CMU:
• Support reconfigurable robots;
• Integrate multiple sensors; David B. Stewart
• Enable real-time sampling rates of up to 1000Hz;
• Change controllers dynamically;
• Execute code transparently on multiple processors; and
• Promote collaboration in the lab through code sharing.
The Chimera Methodology is the solution to meeting the above goals. The key contributions of our solution are the following:
• A detailed definition of a port-based object, which combines the port-automaton algebraic model of concurrent processes with the software abstraction of an object, to obtain a model for creating and integrating dynamically reconfigurable real-time software components.
• Operating system services, including a PBO framework process, a multiprocessor state variable communication mechanism, and automated timing and analysis of a configuration of PBOs, which create a framework for straightforward implementation of applications which use PBOs.
In addition, perhaps the most important criterion to achieving our goals is to hide the real-time programming and analysis details. The target users of our framework are control engineers, who do not have extensive background in real-time systems or software engineering. Rather, their strength lies in developing control systems, and they want a high-level tool that is easy to use and free them from the low level implementation details such as programming timers, analyzing schedulability, synchronizing processes, or communicating in a multiprocessor environments. Chimera satisfies this primary criterion.
The background and the terminology used in this paper is given in Section 2. The architectural components of the framework are considered in Section 3. The focus of Section 4 is the domain-specific communication in a multiprocessor environment. Details of the PBO framework process are given in Section 5. Favorably, the resulting framework has many advantages which go beyond these original goals. These other advantages, which fuel our current research in this area, are summarized in Section 6 2. BACKGROUND
The Origin of the Chimera Project: Reconfigurable Robots
Robotic manipulators, such as those found on the production line of car assembly plants, have always been created with a fixed posture. However, depending on the task to be performed, some postures are unsuitable.
For example, a manipulator is like a human's arm. Consider the difficulty in picking up the receiver on a modern pay telephone with the left hand. Because of the construction of the phone, it is much easier to use one's right hand to pick up the receiver. The same type of problem occurs when using robots. Different robots are better able to perform different tasks. But requiring multiple robots is expensive.
To solve this problem, CMU developed the Reconfigurable Modular Manipulator System (RMMS) [27] . In this system, modular joints and links were created, such that they can be assembled quickly (i.e. within min-utes), to create robots in a variety of configurations. The genesis of the Chimera Project was to create the software environment for the RMMS [26] .
In addition, we identified several other cases where software needed to be reconfigured alongside the hardware. This included adding and removing sensors, changing the computing hardware from single processor to multiprocessing environments, and reusing generic software components with non-reconfigurable robots.
The need for dynamic reconfiguration came from the need to change control algorithms on-the-fly, in order to support more intelligent control strategies. However, in reviewing the capabilities of our framework, other major advantages of dynamically reconfigurable systems emerged, including having the ability to implement real-time systems using time-based decomposition, and the ability to perform maintenance onthe-fly, especially for real-time systems that cannot be shutdown, or are too expensive to reboot. Additional benefits of our framework which go beyond the initial goals are discussed in Section 6.
Modular Components vs. Reconfigurable Components
Modular software is characterized by many guidelines, such as a simple structure, data encapsulation, functional and informational cohesion, separation of the interface specification and the internal behavior implementation [20, 25, 32] . The degree of modularity refers to a subjective measurement used to describe the extent to which a software module followed these guidelines. For example, a system decomposed into modules may be classified as "somewhat modular" or "highly modular", depending on a software engineer's assessment of how well the module meets the defined criteria [6] .
Reconfigurable components are modular components with the highest degree of modularity. Most important, it is a module designed to have replacement independence. In a modular system, there is often only one way to piece all the components together, because the interfaces of modules which need to be integrated are designed according to the other modules it interacts with. For example, if you write a C or C++ module, and #include a .h file of another module, then your module becomes dependent on the interfaces of that other module. In contrast, interface specifications for reconfigurable components are designed according to a predefined standard, and not according to the interfaces of other modules with which it will be integrated. Interaction between components are through these standard interfaces only.
As an example, Purtilo created the Polylith Software Bus for designing reconfigurable distributed systems [21] . Software components were made to interface with the bus, and not with other modules. The bus was implemented as a message passing transmission layer. However, the unbounded execution and blocking times of the transmission layer prevents the approach from being used directly for real-time systems. Nevertheless, Polylith demonstrates a software method for achieving replacement independence.
Static Configurability vs. Dynamic Reconfigurability
An important distinguishing feature between our approach and many other efforts in configurable systems is that of static configurability versus dynamic reconfigurability of software components.
In a statically configurable systems, reusable software modules are selected and integrated off-line, and only executed after configuration is complete. Examples of these methods include real-time object-oriented programming [5, 28] software synthesis [1, 24, 30] , also known as automatic code generation, interface adaptation [4, 13, 17, 22] , and the Polylith Software Bus mentioned above. The static nature of these systems is a result of the need to create or generate "glue" code to integrate the components for each different configuration, then compile and link the application with this new code.
In contrast dynamically reconfigurable systems can be modified on-line, without the need to recompile and re-link the application nor shutdown and reboot the system. For example, the Regis environment [16] uses detailed software models and operating system services to obtain dynamic reconfigurability. Like Polylith, modules are defined according to a standard interface, rather than to other modules. Regis then uses the Darwin configuration language, based on the Conic [19] interface adaptation method, to interactively structure the components using input and output communication objects. Regis, however, has not been applied to real-time system design. One of its primary limiting factors for use in a real-time environment is that like the Polylith transmission layer, communication objects are based on message passing, with no considerations for execution or blocking times of processes. STER [2] is another method based on Conic, which can be used to create reconfigurable real-time systems. However, STER sacrifices the ability to perform dynamic reconfiguration in favor of providing real-time guarantees.
The research we present in this paper uses a similar concept as Regis and Polylith, by designing components according to a pre-defined standard interface, rather than to the interface of other modules. Like Regis, detailed software models are also defined, and operating system services are created which directly support those models. Our work, however, concentrates on many of the real-time system issues not addressed by the Polylith and Regis environments. The major differences in our work include a software abstraction that is specific to control processes, predictable communication and synchronization based on distributed shared memory, a detailed software structure that enables the automated timing of real-time properties of an application, and a programming interface that is designed especially for control engineers. In Section 5.2, we also show that static configurability is a subset of dynamic reconfigurability, and thus our solution provides the same benefits as those that only support static configurability.
Reconfigurable Software vs. Generic Software
Reconfigurable software does not necessarily imply generic software, as it is possible to have both hardware dependent and application dependent components which are not generic, but are reconfigurable. In this sec-tion, we define our classifications of reconfigurable software components. Examples of components are given later in Section 3.3.
A generic component (GC) is a module that is neither hardware dependent nor application dependent. The component can be configured for different types of hardware, and can be used in different applications.
Hardware dependent components are software modules that can only be executed when specific hardware is part of the system. Hardware dependent components can be of two types:
Hardware dependent interface components (HDIC) are used to convert hardware dependent signals into hardware independent data, such that other GCs can interface with these modules. The HDIC is an interface to the application hardware such as robotic actuators, switches, sensors, and displays. They differ from RTOS I/O device drivers, in that they are processes with their own thread of control, and have the same standard interface as other software components, rather than being defined as system calls which are called by other processes.
Hardware dependent computational components (HDCC) provide similar functionality as GCs, but with better performance or added functionality, due to hardware-specific optimizations or modifications of the GC. Unlike HDIC components, they do not communicate directly to hardware; they are simply dependent on having specific hardware as part of the system.
Application dependent components (AC)
are modules used to implement the specific details of an application. As the name implies, these components are not reusable across different applications. Ideally. ACs are eliminated, since they must be redeveloped for each new application. Modules initially defined as an AC, however, can often be transformed into a GC by converting hard-coded information into variable input. The input can then be obtained from the user through a teleoperating device or keyboard, from a configuration file, or from an external subsystem.
ARCHITECTURAL VIEW OF THE PORT-BASED OBJECT
There are several approaches to attacking the problem of creating a general programming environment for sensor-based control that meets all the goals listed in Section 1. One approach is to create the most general architecture in which every possible situation that may arise can be mapped into it. Generalizing every possible scenario is impractical. A refined version of this approach is to create a domain specific software architecture that can handle every possible situation that may arise within the domain. NASREM [3] is an example of such an architecture for the robotics domain. However, using the NASREM model was difficult, partially due to the complexity resulting from trying to accommodate every possible scenario, and partially because there is no easy way to map from the theoretical architecture to a practical implementation.
We used an alternate approach, based on domain-specific elemental units. A framework is designed which uses these elemental units as building blocks to incrementally create larger, more complex applications.
Various domain specific software architectures can then be created using the framework, depending on how these building blocks are ultimately assembled.
We select the independent process as our elemental process model 1 . An independent process does not have to communicate or synchronize with any other component in the system, and thus integration is simple. A system that comprises only of independent components, however, is very limiting, as there is no means to share data nor resources. Nevertheless, this extreme emphasizes a desire to keep the framework simple.
Rather than trying to achieve the most general model of a task, we attempt to get as close to this "ideal" simple case of an independent process. The simpler each component, the simpler it will be to integrate them.
Streenstrup, Arbib, and Hanes formalized the algebra of independent concurrent processes with their portautomaton theory [31] . They model a concurrent process as an independent automaton, which operates on the state of the environment.
When a process needs information, it obtains the most recent data available from its input ports. This port can be viewed metaphorically as a window in your house, and whatever you see out the window is what you get. There is no synchronization with other processes, and there is no knowledge as to the origin of the information that is obtained from this port.
When a process generates new information that might be needed by other processes, it sends this information to its output ports. An output port is like a door in your home: you can open it, place items outside for others to see, then close it again. As with the input ports, there is no synchronization with other processes, nor is there any knowledge as to who might look at this information that you placed on the output ports.
Lyons and Arbib [15] applied this model specifically to robotics, and showed that a stable control system can be achieved using a formal method they termed Robot Schemas. As compared to the Robot Schemas model, we extend the port-automaton model to include multiple input and output ports per process, and we create a framework for directly implementing software components using this computational model.
In addition to the independent process, we select the object as an elemental software abstraction. As stated by Wegner, an object is the atomic unit of encapsulation, with operations that control access to the data [39] .
The term object does not imply "object-oriented design", which is an extension to objects to include classes and inheritance. The references to objects in this proposal are thus classified as object-based design, as defined by Wegner's distinction of that term and object-oriented design [40] .
The Port-Based Object
We combine the algebraic model of a port automaton with the software abstraction of an object, to create the port-based object (PBO), shown in Figure 1 . In our diagrams, we draw a PBO as a round-corner rect-angle, with input and output ports drawn as arrows entering and leaving the side of the rectangle. Configuration constants are drawn as arrows entering/leaving the top of the rectangle. Resource ports are shown as arrows entering/leaving the PBO from the bottom.
A PBO is an independent concurrent process, whose functionality is defined by the methods of a standard object. Communication with other modules is restricted to its input ports and output ports, as defined by the port-automaton theory. There is no explicit synchronization with other processes. The configuration constants are used to reconfigure GCs for use with specific hardware or applications.
In addition to input and output ports, we also define resource ports, which are needed to create an environment for multi-sensor integration. The resource ports connect to sensors and actuators, via I/O device drivers, which are not PBOs. The details of accessing the sensor or actuator are thus encapsulated within the PBO, resulting in an HDIC (as defined in Section 2.4).
By modelling PBOs to have optional configuration constants and resource ports, we have been able to use the same PBO model for all types of reconfigurable modules, including GCs, HDCCs, HDICs, and ACs. A sample library of PBO objects for robotic manipulators is shown in Table 1 . The library represents a subset of PBOs that were created in our laboratory at CMU.
An important note about the functional descriptions of the modules, is that the framework is designed independent of the granularity of functionality in each PBO. The granularity is defined by the software architect who decomposes an application into modules; our framework then provides the mechanisms for quickly realizing each of these modules by using the PBO model to implement them as reconfigurable objects.
Similarly, the framework does not define the type nor semantics of the port variables. A variable type mechanism [36] is used so that data transmitted over the ports can be any type. For example, it can be raw data, such as input from an A/D or D/A converter; processed data, such as positions and velocities; or processed information, such as structures describing types and locations of objects in the environment.
In our implementation, a configuration file is used to specify the information shown in the PBO Name and Ports column of Table 1 . For example, the configuration file puma.rmod for module puma is shown in Figure 2 . The MODULE line specifies the name of the object module file name (extension omitted) of the code used for this PBO. Multiple PBOs can specify the same code, for cases where multiple versions of the ters for the configuration file, we created our own conventions for mapping names, as shown in Table 2 .
Note that these conventions are personal preferences, and not specifications of the framework. TASKTYPE is either periodic or aperiodic. The FREQ line is the initial frequency at which a periodic process corresponding to this PBO executes; the frequency can be changed dynamically. The SVARALIAS line is used for mapping internal and external names, as described in the next section. The LOCAL line marks the beginning of module-specific information. Any information after this line is not used by the PBO framework, but rather passed on to the initialization code of the PBO. Lines beginning with # are comments, and are ignored. 
Configurations
As defined by Dorf [7] , "a control system is an interconnection of components forming a system configuration which will provide a desired system response." Each component can be mathematically modelled using a transfer function, which computes an output response for any given input response. The port-automaton theory provides an algebraic model for these types of control systems. By incorporating the model into the PBO, our PBOs also provide this same model suitable for control engineers. PBOs are configured to form a control system in the same way as a control engineer configures a system using transfer functions and block diagrams. This approach allows us to satisfy an important criterion for our DRRTS framework:
to make a framework for control engineers, rather than for software or real-time system engineers.
A configuration is a set of PBOs which are interconnected to provide the required open-loop or closed-loop system. This set of PBOs can be specified in a variety of ways. In our implementation, this included using a graphical software assembly tool, using a command-line interface, through the network from an external planning subsystem, or embedded using the C programming language.
A configuration is valid only if for every PBO selected, any data that it requires at its input ports is produced by one of the other PBOs as output. As per the port-automaton theory, the control engineer does not have to be concerned with how data gets from the output of one PBO to the input of another PBO. The communication is embedded in the framework, such that it is transparent to the control engineer. A configuration also cannot have two PBOs which produce the same output, otherwise there may be a conflict as to which output should be used at a given time.
Port names are used to perform the bindings between input and output ports. Whenever two PBOs exist with matching input and output ports, the framework creates a communications link from the output to the input.
If necessary, the output can be fanned into multiple inputs. Our framework uses an internal/external name separation for the ports, such that the name used to code the PBO can be independent of the name used for linking that object to other PBOs. The mapping between internal and external name is done in the SVARA-LIAS lines of the configuration file for each PBO. The left value represents the name used in the configuration, and the right value is the name used internally by the PBO. If an SVARALIAS line is not specified, then the default is that both the internal and external names are the same.
The correctness of a configuration is verified analytically using set equations, where the elements of the sets are the state variables. If X j is a set representing the input variables of module j, Y j is a set representing the output variables of module j, then a configuration is legal only if the following two conditions are true:
where k is the number of modules in the configuration. (1) represents our first condition that their must be a corresponding output port for every input port. (2) ensures that two PBOs do not produce the same output.
Configuration Examples
In this section, we illustrate through examples how we can create configurations out of PBOs stored in a library. Details of designing individual PBOs are given later in Section 5. In our implementation, a configuration can be assembled graphically using the Onika visual programming environment [8] .
Cartesian Control of the Reconfigurable Modular Manipulator System
Figure 3(a) shows a configuration, using modules from our sample library shown in Table 1 , to perform teleoperated Cartesian control of the RMMS. The configuration of the RMMS robot is not known beforehand. Rather, its configuration is read from EPROMs embedded in the robot during initialization. From that
configuration, the rmms module outputs the N DOF and DH configuration constants. Those constants are used as input to the gfwdkin and ginvkin modules, which can be configured for any robot based on N DOF and DH [11] . A teleoperation interface is provided by the 6-DOF trackball, and the cinterp module is used to generate intermediate trajectory points for the robot, because the tball module typically executes at a much lower frequency than the other modules.
The software framework does not pose any constraints on the frequency of each PBO. Rather, as defined by the port-automaton theory, every PBO is an independent concurrent process which can execute at any frequency. Whenever that process needs data from its input ports, it retrieves the most recent data available.
When it completes it's processing, it then places any new data onto its output ports.
A configuration can be executed in either a single-or multi-processor environment. In a multi-processor environment, the control engineer only needs to specify which processor to use for each PBO. The commu- (e) Example of application-specific autonomous execution of a Puma 560 using HDCCs nication between PBOs and synchronization of their processes is otherwise identical, and fully transparent to the control system engineer, as detailed in Section 4.
Cartesian teleoperation of a Puma 560
Suppose that a Puma 560 robot is to be used instead of the RMMS. The rmms module can be replaced with the puma robot interface module, as shown in Figure 3(b) . Since the Puma is a fixed configuration robot, its N DOF and DH parameters are constant. Instead of reading these values from the robot, they can instead be hard-coded into the puma module, and output as configuration constants. There is no need to change any other module, since the gfwdkin and ginvkin modules will configure themselves during initialization for the Puma based on the new values of N DOF and DH.
Improving performance of a Puma 560
GCs are useful for enabling rapid prototyping, but they may not always be computationally efficient. For example, the computation of the forward kinematics based on the DH configuration constants and using matrix operations will naturally be slower than performing similar computations for a specific robot, such as the Puma 560, where the DH parameters are constant, and unnecessary computations (such as multiply by zero or 1, or computing sin(π/2)) can be eliminated.
To improve the performance of an application, an HDCC can be created. The pfwdkin and pinvkin modules are examples of such components. They compute the forward and inverse kinematics specifically for a Puma 560, and they execute faster than their generic counterparts. It is then desirable to replace gfwdkin with pfwdkin, and ginvkin with pinvkin, as shown in Figure 3(c) , whenever the puma HDIC is used.
In order for an HDCC to replace a GC, it must provide at least the same outputs as the GC, and must not require any additional inputs as compared to the GC.
Even when an HDCC is created to replace a GC, it does not eliminate the usefulness of the GC. For example, in order to improve fault tolerance of an application, the GC can still be used as a standby module, or as shown in Figure 3(d) , it can execute in parallel with the HDCC, albeit at a lower frequency, in order to provide consistency checks.
Autonomous Execution of a Puma 560
As an example of an AC, suppose that a custom autonomous trajectory module ctraj is created to replace the teleoperation module tball, as shown in Figure 3 (e). The AC can be integrated into the system by defining it as a port-based object.
Even though a module is application dependent, it does not have to be hardware dependent. Thus if the hardware for the application is changed, the AC does not necessarily have to change. Figure 3 (f) shows this by replacing rmms with puma, but not changing the trajectory of the robot's end effector, as defined by ctraj.
Cartesian Control of a Torque-Mode Robot
As a more elaborate example of a configuration, a telerobotic Cartesian visual servoing subsystem is shown in Figure 4 (note that these modules are from a different PBO library than the one defined in Table 1 ). Input can come either from a user through a trackball or from an external vision subsystem. The port interfaces of each PBO can easily be determined simply by looking at the inputs and outputs of each object. Despite the seemingly complex communication paths between objects, communication remains transparent from the control system engineer, and lines are simply drawn between matching input and output ports.
The configuration of PBOs is not the only part of a subsystem. As shown in Figure 5 , PBOs can interface with device drivers, external subsystems, software libraries, special purpose processors, and user interfaces.
In this paper, however, we focus on the configurations of port-based objects, and their associated communication, synchronization and analysis.
INTER-OBJECT COMMUNICATION
Integrating software components such that all communication is performed in a predictable and timely manner is perhaps the most difficult aspect of creating reconfigurable real-time systems. In order to support the port-based object model, we need a communication mechanism that meets the following requirements:
• Support the port-automaton model of independent processes. That is, read the input ports at the beginning of each cycle to obtain the most recent data available, and write to the output ports at the end of each cycle.
• Disallow synchronization or communication between processes except through ports.
• Target data transfers with low volume but at high frequency (1000 Hz).
• Have a simple and straightforward binding scheme for communication links to dynamically reconfigure subsystem in bounded time. • Fan an output into multiple inputs.
• Support transparent multiprocessing, such that the processes which are communicating can be either on the same or different processors, without any difference in the communication.
• Allow communication between processes which may be executing at different frequencies.
We designed a communication mechanism which meets all of these requirements, based on the combined use of local and distributed shared memory. The domain-specific solution takes advantage of several of the characteristics of control systems, such as the assumption on transfer rates and the need to only read the most recent data, rather than all data. The solution provides a level of performance and predictability that has not previously been achieved using more general message-passing mechanism.
Port communication has often (and more typically) been implemented using some form of message passing.
This alternative was considered, but not selected for several reasons:
• Using messages, we could not support the port-automaton theory, because the most recent data is not always readily available. For example, if the process producing the data is faster, then the messages may be queued, and the message received by the consumer might not contain the most recent data. 
Onika [8]
Chimera [35] • Fanning output to multiple inputs is difficult, because it requires a message to be duplicated for each input, or requires a more complex mechanism to ensure that messages are not deleted until all processes needing it have used it. Duplicating messages based on the number of recipients also violates the port-automaton theory, which states that a process is unaware of the destination of the data on its output ports.
• The overhead with sending messages, especially in a multiprocessor environment, is much higher than that achievable using shared memory. This factor is especially important considering some data must be transferred 1000 times per second.
These drawbacks of message passing systems led to our design of a mechanism based on distributed shared memory. Our work focuses on loosely coupled shared memory architectures.
State Variable Communication
The communication between PBOs is performed via state variables stored in global and local tables, as shown in Figure 6 . Every I/O port and configuration constant is defined as a state variable (SVAR) in the global table, which is stored in shared memory.
A PBO can only access the local table, where only the subset of data from the global table that is needed by that PBO is kept. Since every PBO has its own local table, no synchronization is needed to read from or write to it. A PBO process can thus execute independently of other processes, by using the data in its local table. Consistency between the global and local tables is maintained by the SVAR mechanism, as detailed in the remainder of this section. As an example, Figure 7 shows the contents of the global and local tables for the sample configuration that was illustrated in Figure 3( Although there is no explicit synchronization or communication among processes, we must ensure that accesses to the same SVAR in the global table are mutually exclusive, which creates potential implicit blocking. The locking mechanism we use enables us to maintain the autonomous execution model of the PBO, while also ensuring the integrity of the communication.
Locking the Global SVAR Table
The method we use to lock the global table and preserve the autonomous execution model of the PBO is based on the assumption that the amount of data communicated via the ports on each cycle of a PBO is relatively small. That is, each INVAR or OUTVAR is only a few tens to a few hundreds of bytes. This is in contrast to communication systems, where images may require many thousands or millions of bytes per cycle. The
exact value of what we mean by 'small' depends on a particular configuration, and is quantified as part of our analysis in Section 4.4. For this reason, our framework which defines objects that are specific to the control systems domain cannot directly be applied to a general communication system. However, as was shown in Figure 4 , we can use the external subsystem interface to interact with a communication system.
The global table can be accessed by processes executing on different CPUs, and therefore a multiprocessor solution for lock the table is required. The type of multiprocessor synchronization we require for the global SVAR table has been addressed in [23] . The shared memory protocol (SMP) is presented as an extension of the single processor priority ceiling protocol [29] . The protocol involves defining global semaphores for locking the global shared memory, and placing priority ceilings on accessing the semaphores to bound the waiting time of higher priority jobs.
Unfortunately, there are several problems which prevent the use of SMP within our framework:
• This method assumes that the local scheduling on each processor is the rate monotonic algorithm, with static priorities. As discussed in [37] it is desirable to use mixed or dynamic priority algorithms for scheduling reconfigurable systems, for which the protocol is not suitable. Table: Local Tables:   PBO Processes: • One assumption of the SMP is that the delay to access the backplane bus from any processor is negligible compared to the task execution times. Unfortunately this is usually not the case; buses like the VMEbus are implemented using a static-priority processor assignment which is not under the control of software. Therefore, the time to wait for the bus can be significant if a process is on a low-priority CPU.
• There is significant overhead associated with implementing SMP, which prevents its use with control applications requiring frequencies over 1000Hz. The complexity and overhead of SMP can be reduced significantly for the port-based communication by selecting a single lock for the entire table, instead of a separate lock for each state variable. Selecting a single lock for the entire table is not as restrictive as it seems, since a shared bus connects the shared memory to local memory. Even if multiple tasks have separate locks, only one of them can physically access the shared memory at once; other tasks must wait for the bus even while in their critical section.
An alternate solution for synchronizing access to the global state variable table is to use spin-locks [19] .
When a task must access the global table, it first locks the processor on which it is executing. Locking the CPU ensures that the task does not get swapped out while holding the critical global resource. The task then tries to obtain a global lock by performing an atomic read-modify-write instruction, which is supported by most hardware processors. If the lock is obtained, the task reads or writes the global table, then releases the lock, still while being locked into the local CPU. It then releases its lock on the local processor. If the lock cannot be obtained because it is held by another task, then the task spins on the lock. It is guaranteed that the task holding the global lock is on a different processor, and will not be preempted, thus will release the lock shortly.
In theory, locking the CPU can lead to possible missed deadlines or priority inversion. However, considering the practical aspects of real-time computers, it is not unusual that a real-time microkernel locks the CPU for up to 100 µsec in order to perform a system calls such as handling timer interrupts, scheduling, and performing full context switches [35] . Furthermore, many RTOS are created such that periods and deadlines of processes are rounded to the nearest multiple of the system clock, since more accurate timing is not available to the scheduler. In these systems, if the total time that a CPU is locked in order to transfer a state variable is small as compared to the resolution of the system clock, then there is negligeable effect on the predictability of the system due to this mechanism locking the local CPU.
In comparing this method to SMP, the lock can be viewed as a single global semaphore, and since all tasks can access it, its priority ceiling is constant, which is the maximum task priority in the system. Since there is only one lock, there is no possibility of deadlock. A task busy-waits with the local processor locked until it obtains the lock and goes through its critical section. In Section 4.4 it is shown that for configurations where the volume of data transferred between objects is small, there is a bounded waiting time for obtaining the global lock, even on hardware such as the VMEbus that only has fixed priority bus arbitration.
Initialization
The global table is initialized in shared memory based on the contents of an SVAR configuration file. The file includes the names and types of all the INVARS, OUTVARS, and configuration constants. A default configuration should be provided in conjunction with each PBO library, which defines the name, type, and size of each variable. For example, Table 2 shows the information that would be included in the SVAR configuration file for the library that was shown in Table 1 . For any application, a programmer may update this default configuration file, by adding new variables to it if they make use of the SVARALIAS facility to change some of the port names, or by deleting variables which are not required for the application.
A local table is only created and initialized when a PBO is created, and contains those SVARs given in the .rmod configuration file (as was shown in Figure 2 ). Information about the type and size of those SVARs is retrieved from the global table, which is why it did not have to be specified in the .rmod file. For each SVAR, a pointer is also stored in the local table, which points to the corresponding SVAR in the global table, and used during the block transfer operations.
Configuration constants create a necessary order of initialization for PBOs. Any PBO that has an output configuration constant (which we call OUTCONST), must be initialized before any other PBO which has that constant as input (called an INCONST) is created. INVARS and OUTVARS, on the other hand, do not have a necessary order, since in control systems they often form a closed loop system, and thus the order of initialization of PBOs is not obvious. Beyond the necessary order for constants, ordering initialization of PBOs in a closed loop system to ensure stability of the control system is generally application specific, and thus not discussed further within our framework. Our framework, however, can support any order of initialization of the PBOs, as dictated by the application.
Analysis of SVAR mechanism
In this section, we show that on a fixed priority hardware platform such as the VMEbus, it is possible to provide predictable high performance communication using the SVAR mechanism. The analysis assumes that data from the input ports is transferred once from the global table to the local table at the beginning of a process's cycle, and data destined for the output ports is transferred from the local table to the global table upon completion of the process's cycle. Ensuring that communication occurs at these specified times is handled by the PBO framework as described later in Section 5.
Transfer Times
To ensure predictable communication, the time required to transfer data between the local and global tables for each task must be computed. Let t IP be the time required to transfer the INVARS and t OP be the time required to transfer the OUTVARS of a PBO P, assuming no waiting for the bus. These values are computed as ; (3) where V 1 is the overhead for locking and unlocking the table, excluding waiting time for the bus; V a is the overhead of transferring each additional variable; n IP /n OP are the number of INVARS/OUTVARS for object P;
x Pi /x Po = number of transfers required for the INVAR/OUTVAR i of object P; and R(x) = time required for x transfers. V 1 , V a , and R(x) are dependent on the speed of the hardware. These values can be measured initially for each type of hardware supported, then used by a configuration manager for estimating communication times. As an example, V 1 , V a , and R(x) were measured in our laboratory. The breakdown of times for an Ironics IV3230 single board computer [9] with a 25MHz MC68030 processor on a VMEbus is shown in Table 3 . A VMETRO 25 MHz VBT-321 VMEbus analyzer [38] was used to time the communication,
and provided a resolution of better than 1 µsec. The global state variable table was stored within the dualported memory of a second IV3230.
Note that the value of R(x) is not linear. This is due to the underlying block copy routine, which has a better average time per transfer for larger transfers. Through interpolation, different transfer sizes can be estimated, and more measurements of R(x) with different values of x can give more accurate results. However, for purposes of discussion and examples in this paper, the values shown are sufficient.
The values in Table 3 can be substituted into equation (3) to estimate the transfer times for each port-based object. As an example, consider the joint control of a robot with built-in controllers, whose configuration is shown in Figure 8 , and assume that N DOF =6. The values of t IP and t OP for each module were estimated.
These estimates were then compared to actual transfer times measured with the VMETRO analyzer. As can be seen in Table 4 , the estimates and actual times are sufficiently close to use the estimates for further analysis. This aspect is important since it is not desirable, and perhaps not feasible, to time the communication of every software module for every type of hardware. For simplicity, (3) assumes that V 1 is always present, even if n is 0, such as n IP for jtball. In practice, if there are no transfers to be made, the global table is not locked. As a result, the actual measured time is very small, and accounts for overhead of testing if a transfer must be made.
Waiting Time for Global Table Lock
Until now, we assumed there is no contention for the global table's lock. Next, we compute the worst-case waiting time for the lock by each task. Let L pj be the maximum time that task p on processor j will hold the global table lock. Therefore L pj =max(t Ip ,t Op ). Let M j be the longest time that the global lock is held by any task on processor j: (4) where N j is the number of tasks on processor j.
Ideally, if multiple tasks are trying to obtain the lock, the one with the highest priority succeeds. Unfortunately, on a shared bus where each processor has a fixed priority, such as the VMEbus that is not using round-robin bus arbitration, that is not the case. Instead, the task inherits the priority of the processor. For the remainder of the analysis, assume that the hardware is a fixed-priority VMEbus, such that the lowest numbered processor has highest priority. For different hardware configurations, the following analysis may have to be redone, and perhaps a different form of locking for the global table may be appropriate.
Any task on processor k attempting to lock the global table must wait for tasks on all higher priority processors. Furthermore, the task may also have to wait for a task currently holding the lock on a lower priority processor. Based on the locking mechanism described in Section 4.2, only one task on any processor can request the lock at once, and therefore there is no contention with other tasks on the same processor. 
In Section 4.2, an assumption was made that the state variable table mechanism was valid as long as the amount of data to be transferred is small. That assumption is now quantified, as the volume of data affects the maximum waiting time of each task. Let W k be the worst case waiting time for any task on processor k.
Since this is waiting time and not blocking time (a waiting task is in the running state, a blocked task is suspended) W k can be added to the worst-case execution time of a task. It is computed as
where W kLO and W kHI are the maximum time the task may have to wait for a task to release the lock on a lower or higher priority processor respectively. W kLO is computed simply as the longest time any single task on a lower priority processor may hold the lock. Therefore,
where r is the number of processors.
Next, W kHI is computed. For k=1, there are no higher priority processors, thus W 1HI =0 and W 1 =W 1LO . For k>1, the potential locking of the table for all tasks on processors 1 to k-1 must be considered. Under the assumption that the volume of data is small, the bandwidth required to transfer all the data is much less than the total bandwidth of the bus. Therefore, in the worst case, all tasks on higher-priority processors may require the lock at the same time. W kHI is thus computed as the sum of the waiting time of all tasks on higherpriority processors:
The notation t I,ij is the same as t IP , where a process P is referred to by the processor number i and task ID j.
As an example, equations (5), (6) , and (7) were applied to the sample configuration that was shown in Figure 8 , with estimated locking times as were shown in Table 4 . Task periods and cycle times (before adding maximum waiting time) were arbitrarily assigned to illustrate the computations. Assuming that puma_pidg and grav_comp are on processor 1, and diff and jtball are on processor 2, The resulting computations are shown in Table 5 . The adjusted execution time should be used in a schedulability analysis. In our applications, the average case is significantly lower than the worst case. To compensate, a soft realtime scheduling algorithm can be used to schedule the less critical tasks on lower priority processors [37] , while hard real-time critical tasks should be placed on the high-priority processors. Another consideration for assigning tasks to processors is the volume of data that needs to be transferred. The computations of W k show that it is preferable for tasks producing a low volume of data to be placed on higher priority processors, since that significantly reduces W kHI for tasks on lower priority processors. A different assignment of tasks to processors can lead to very different results. A configuration manager can perform the above analysis for various configuration possibilities, in order to optimize the global allocation of processes to processors.
FRAMEWORK PROCESS FOR IMPLEMENTING PORT-BASED OBJECTS
The analysis in previous sections was performed with an underlying assumption that communication and synchronization is handled by the framework. In this section, we look at the details of the PBO, and show how we have implemented the framework as part of the RTOS, which allows a control engineer to easily implement individual software components using the PBO model.
An Inside-Out Programming Paradigm
Creating the code for PBOs is an "inside-out" programming paradigm as compared to traditional coding of real-time processes, as shown in Figure 9 . The grey area shows operating system code, while the black areas show the user code. The traditional approach is the one used by most of the current RTOS. Processes are created, each with their own main(). The process executes user code and controls the flow of the program.
It invokes the operating system, typically via a system call, whenever an OS service is required. OS services include communication, synchronization, programming timers, and creating new processes.
The PBO method, on the other hand, provides a consistent structure for every process, and thus operating system services such as communication, synchronization, scheduling, and process management are performed in a predictable manner. Only when necessary, the operating system calls a PBO's method to per- form user-defined functions. This pre-defined structure also allows the RTOS to continually time the PBO code, using an automated task profiling mechanism as described in [34] .
A PBO process is realized by creating a single, standard process, which we call the framework process (pboframe()). Every process in the system uses this same framework, and takes a PBO as an argument. The PBO defines the module-specific user code, the I/O ports and configuration constants, the type of process (e.g.
periodic process or aperiodic server), and the timing parameters such as frequency, deadline, and priority.
The Framework Process
The framework process implements a finite state machine with four states, as shown in Figure 10 . will execute the off method, followed by the kill method, then enter the NOT-CREATED state.
The framework process, as shown, evolved over several years as we designed and tested many variations, in order to obtain a common program structure for all software components, such that no exceptions existed.
The diagram represents the best structure that we found. Many design decisions are implied by the detailed PBO framework, which we discuss next.
Notes about Framework Process
Despite the seeming complexity of the framework, dissecting it into pieces shows that it is indeed rather simple. In the steady state, PBO processes are all in the ON state, and executing their cycle method once per cycle or event, going back to sleep until the next wakeup signal. Note that the only difference between a periodic process and an aperiodic server is the source of the wakeup signal. For a periodic process, the wakeup signal is received from the timer. For the aperiodic processes, the process blocks on a semaphore, message, or event, as defined by the sync method of the PBO.
Aperiodic servers can use the same fundamental structure as periodic processes, as a result of the underlying timing error detection and handling mechanism built-in to the Chimera RTOS [34] . The framework can define aperiodic processes as either a deferable or sporadic servers, and use them with either the rate monotonic static priority [14] or maximum-urgency-first dynamic priority scheduling [37] algorithms.
The remainder of the framework handles the initialization and termination, reconfiguration, and error handling for the PBO. To support dynamic reconfiguration, a 2-stage initialization and termination is used.
High-overhead initialization of a new process can be performed upon system start-up, or in the background, in preparation for being activated. The initialization includes creating a process's context, dynamically allocating its memory, creating a local table and translating I/O port symbols into pointers to the global table, and calling the user-define init method. The process then waits in the OFF state, and can be viewed as being in a standby mode for a dynamic reconfiguration. When an on signal is received, the local table is updated to reflect the current state of the system, and execution begins.
Perhaps one of the keys to supporting dynamic reconfiguration is handling the initialization of the port variables. Before calling the on method, it is necessary to read the OUTVARS in addition to the INVARS. This solution resulted after many trials with initialization, trying to determine the best way to ensure that when a process is activated, its view of the environment is correct. Since the process was not previously executing, it is possible that some other process was generating those OUTVARS. In general, a process that outputs these OUTVARS knows the same values on the subsequent cycle because they are in the local The time for a process P to be activated, C on,P is bounded, if the user-defined on method is bounded. It is,
where t IP and t OP are defined in equation (3), C´o n,P is the execution time of the user-defined on method of process P, and ∆ OS is the operating system overhead for sending a signal and performing a context switch.
Typical values for C on,P in our system were in the order of (100 µsec+C´o n,P ).
The deactivation of a process was similarly time-bounded. Thus, the time to perform a dynamic reconfiguration was the sum of the time to activate new processes and deactivate running processes which formed the differential between the old and new configurations. Since it is generally possible to activate and deactivate several processes within a millisecond, it is possible to execute one controller module one cycle, and on the next cycle to be executing a different controller, as a result of a dynamic reconfiguration in between.
As a precautionary measure in case of transient overload during a dynamic reconfiguration, an illegal configuration flag is temporarily set, which indicates that not all required output is being produced. The framework provides this mechanism primarily for use of HDICs which send output to actuators. The most critical processes in the system, which should have a priority higher than the processes being reconfigured, can test this flag, and if it is set, can choose to ignore its INVARS, and go into locally stable execution. For example, a robot interface module can select to keep the velocity of the joints constant, until the dynamic reconfiguration is complete, signalled by the illegal configuration flag being reset. Determining when it is safe to perform a dynamic reconfiguration is beyond the scope of the framework. The framework provides the mechanisms only. Developing policies which will ensure stable execution during a reconfiguration is usually application specific. In our experiments, we used a conservative approach of ensuring that the robot was temporarily at rest (i.e. velocity and acceleration were both zero before dynamic reconfiguration begins).
Further research is required in order to develop more aggressive policies.
Error detection and handling is implemented using a global error handling mechanism, as described in [35] .
Whenever an error occurs, an error signal is generated, and a user-defined error handler is called. The framework automatically initializes this error handler to be the error method of the PBO. The purpose of the error method is to attempt automated recovery. If that fails, the process goes into the ERROR state, indicating that user intervention is required. Once the user attempts to fix the problem, a clear signal is generated. Whenever one or more processes are in the error state, the illegal configuration flag described above is set.
This again indicates to critical modules that there exists at least one other module in the configuration not producing output as required, and therefore the INVARS of that process may not be properly computed.
During a dynamic reconfiguration, it is possible that a new process is created in the background, such that it updates some of the configuration constants. Before this process can be activated, some of the running processes might need to be initialized, if they have configured themselves previously based on different INCONSTS. In this case, a reinit signal is sent to those processes which have the updated OUTCONST as an INCONST. The new process is only allowed to be activated after processes have re-initialized.
Statically configurable processes are subsets of dynamically reconfigurable processes. These processes go directly to the ON state after initialization. Therefore, if the OFF state is removed, and any transition emanating from the OFF state removed, the result is a framework process that can be used for statically configurable systems. In our applications requiring only static configurability, however, we still define the modules as dynamically reconfigurable, but send an on signal immediately after the init signal.
Coding a Port-Based Object
The structure of a PBO is designed so that a control engineer can simply define module-specific code, and not be concerned with any of the details of creating a real-time process. A template can be created for any specific PBO, given an .rmod file, as was shown in Figure 2 . The control engineer then only has to fill in the blanks, which is to define the methods of the PBO to perform the module-specific functionality. As an example, Figure 11 shows the template and control engineer's code for the tball module. The regular font shows the template code for tball, given the information in Table 1 , while the bold font shows code written by the control engineer.
The encapsulated data of the object is stored in the structure called tballLocal_t. Part of this structure is generated by the framework, to include pointers to the local SVAR table. The remainder of the structure is userdefinable, so that the control engineer can place any "global" variables for their PBO. The data in this structure is made available to every method. The methods all have specific names, which is of the form xxxYyyy. xxx represents the module name, and Yyyy represents the method name, as was depicted in Figure 10 .
The rigid process structure provides strict guidelines for control engineers, telling them exactly where to put what kind of code. It removes any guesswork, reduces the amount of code they must write, and guarantees that synchronization and communication works from the beginning.
Hardware/software co-design: One form of system-level design of embedded applications uses a mixture of hardware and software components [10] . The co-design approach allows the hardware and DRRTS components to be tightly coupled throughout the design process.
Tele-configuration of services: Telecommunication companies, such as cable, satellite TV, and telephone, are increasingly transmitting control information over the data transmission medium. DRRTS components can be used to dynamically change the configuration, hence the services provided, or to perform remote upgrades and maintenance of existing services.
Evolutionary design: Complex systems may require continuous hardware and software upgrades during the lifetime of the system in response to technological advancements, environmental change, or alteration of system goals. DRRTS components are designed to undergo such evolution, as individual modules can be replaced incrementally and independently.
Flexibility for fine-tuning after implementation: A DRRTS framework offers considerable flexibility for fine tuning an application to "make it work". It can have reconfiguration options such as switching between static and dynamic scheduling algorithms, using time-based decomposition to improve CPU utilization, and easily converting interrupt handlers to aperiodic servers and vice versa.
Increased reliability through automated analysis:
The internal structure of a DRRTS component is well-defined, based on a theoretical model. This structure allows for the automation of such things as performance measurement, configuration verification, and scheduling analysis.
These additional benefits fuel our current research effort into developing advanced RTOS technology for supporting dynamically reconfigurable systems.
