Abstract-In this paper we develop a framework for integrating real-time software modules that comprise a reconfigurable multi-sensor based system. Our framework is based on the proposed concept of a global database of state information through which real-time software modules exchange information. This methodology allows the development and integration of reusable software in a complex multiprocessing environment. A reconfigurable sensor-based control system consists of many software modules, each of which can be modelled using a simplified version of a port automaton. Our new state variable table mechanism can be used in both statically and dynamically reconfigurable systems, and it is completely processor independent. Individual modules may also be combined into larger modules to aid in building large systems, and to reduce bus and CPU utilization. An efficient implementation of the state variable table mechanism, which has been integrated into the Chimera I1 Real-Time Operating System, is also described.
I. INTRODUCTION Real-time sensor based control systems are complex. In order to develop such systems, control strategies are needed to interpret and process sensing information for generating control signals. There has been considerable effort devoted to addressing this aspect of real-time control systems. However, even with robust control algorithms, a sophisticated software environment is necessary for efficient implementation into a robust system. The level of sophistication is even greater if this system is to be generalized so that it is reconfigurable and can perform more than a single task or application. Obviously, a real-time operating system (RTOS) is part of this software environment. However, it is also necessary to have a layer of abstraction between the RTOS and control algorithms that makes the implementation efficient, allows for easily expanding and/or changing the control strategies, and reduces development costs by incorporating the concept of reusable software. The development of this layer of abstraction is further motivated by the realization that real-time control systems are typically implemented in open-architecture multiprocessor environments. Several issues, such as configuring reusable modules to perform a job, 0-7803-o737-2~!22$o3.o0 1992OIEEE 325 allocating modules to processors, communicating between various modules, synchronizing modules running on separate processors, and determining correctness of a configuration, arise in this context.
In this paper we develop a framework for integrating real-time software modules that comprise a reconfigurable multi-sensor based system. Our framework is based on the proposed concept of a global database of state information through which real-time software modules exchange information. This methodology allows the development and integration of reusable software in a complex multiprocessing environment.
We define a control module as a reusable software module within a real-time sensor-based control subsystem. A reconfigurable system consists of many control modules, each of which can be modelled using a simplified version of a port automaton [22] , as shown in Fig. 1 . Each module has zero or more input ports, and zero or more output ports. Each port corresponds to a data item required or generated by the control module. A module which obtains data from sensors may not have any input ports, while a module which sends new data to actuators may not have any output ports. We assume that each control module is a separate task'. A control module can also interface with other subsystems, such as vision systems, path-planners, or expert systems.
A link between two modules is created by connecting an output port of one module to an appropriate input of another module. A legal configuration is obtained if for every input port in the system, there is one, and only one, output port connected to it. An extension of the port automata theory is presented in [12] , where asplit connector allows a single output to be fanned into multiple output ports, and ajoin connector allows multiple input ports to be merged into a single input port. The split connector replicates the output multiple times. For the join connector, a combining algorithm, such as a weighted average, is required to merge the data. lack the flexibility required for the design and implementation of reconfigurable systems. The design of these programming environments is generally based on heuristics rather than on software architecture models, and lends itself only to singleconfiguration systems. The environments also do not make clear distinctions between module interfaces and module content, thus lacking a concrete framework which would allow development of modules independent of the target application and target hardware.
In this paper, we propose a method of using state variables for systematically integrating reusable control modules in a real-time multiprocessor environment. Our design can be used with both statically and dynamically reconfigurable systems. Section II describes the design issues to be considered, and some of the assumptions we have made about the target environment. Section III gives the architectural details of our control module integration. SectionIV describes an efficient implementation of the state variable table mechanism, which has been integrated into the Chimera 11 Real-Time Operating System
[23]. Finally, SectionV summarizes the use of state variables for module integration in a reconfigurable system.
DESIGN ISSUES AND ASSUMFTIONS
In order to design a general mechanism which can be used to integrate control modules in a multiprocessor environment, some architectural knowledge of the target hardware is required. We assume an open-architecture multiprocessor system, which contains multiple general purpose processors (such as MC68030, Intel 80386, SPARC, etc.), which we call Real-Time Processing Units (RTPUs), on a common bus (such as VMEbus, Multibus, Futurebus, etc.). Each processor has its own local memory, and some memory in the system is shared by all processors.
Given an open-architecture target environment, the following issues must be considered Processor transparency: In order for a software module to be reusable, it must be designed and written independent of the RTPU on which it will finally execute, since neither the hardware nor software configuration is known upriori. Task synchronization: Sensors and actuators may be operating at different rates, thereby requiring different tasks to have different frequencies. In addition, system clocks on multiple processors may not be operating at the exact same rate, causing two tasks with the same frequency to have skewing problems. The module integration must not depend on task frequencies or system clocks for synchronization. Data integrity: When two modules communicate with each other, a complete set of data must be transferred. It is not acceptable for part of a data set to be from the current cycle, while the rest of the data set is from a previous cycle. Predictability: In real-time systems, it is essential that the communication between modules is predictable, so that worst-case execution and blocking times can be bounded. We assume that the entire global state variable has a single lock. It is possible for each variable to have its own lock, in which case the locking overhead increases to (m+n)A. The advantage of using a single lock is described in Section .A..
The bus utilization B for k modules in a particular configuration, in transfers per second, is then
We use "transfers per second" instead of CPU execution time or bus utilization time as a base measure for the resource requirements of the communication mechanism, since it is a hardware independent measurement. Thus using our state variable table design, we can accurately determine the CPU and bus utilization required for the inter-module communication within a configuration.
A configuration is legal if the following holds true:
( 7 y i j = 0 ) A ((. 7 Xi.). (. U 9 Yij)) (3) j = l , r = l 1 = 1 , t = l 1 = l , r = l
The first term represents the intersection of all output variables from all modules. If two modules have the same outputs, then a join connector is required. Modules with conflicting outputs can modify their output port variables, such that they are two separate, intermediate variables. A join connector is a separate module which performs some k i d of combiniig operation, such as a weighted average. Its input ports are the intermediate variables, while its single output port is the output variable that was originally in conflict. The bandwidth required can then be calculated by treating the join connector as a regular module. Split connectors are not required in our design, since multiple tasks can specify the same input port, in which case data is obtained from the same location within the global state variable table.
The second term in (3) states that for every input port, there must be a module with a corresponding output port.
Using state variables for module integration is processor independent. Whether multiple modules run on the same RTPU, or each module runs on a separate RTPU, the maximum bus bandwidth required for a particular configuration remains constant, as computed in (2). In the next section we give more details on typical modules within a reconfigurable sensor-based control system.
A . Control Module Library
The state variable table mechanism is a means of integrating control modules, which have been developed with a reusable and reconfigurable interface. Once a module is developed, it can be placed into a library, and incorporated into a user's application as needed. A sample control module library is shown in Fig. 3 . The classification of different module types is for convenience only. There is no difference in the interfaces of say, a robot interface module and a digital controller module. We expect that existing robot control libraries (e.g. The robot provides its own controller, to which reference joint positions must be sent. Theposition-mode robot interface is a module for this type of robot interface. Other actuators or computer controlled machinery may also have similar interface modules. The frequency of these modules is generally dependent on the robot hardware; sometimes it is fixed, other times it may be set depending on the application requirements.
The sensor modules are similar to the robot interface modules, in that they communicate with device hardware, such as force sensors, tactile sensors, and vision subsystems. In the case of a force/torque sensor, a 6-DOF forceltorque sensor module inputs raw strain gauge values and converts them into an array of force and torque values, in Newtons and Newton-meters re~pectively.~ For a visual servoing application [18], much of the reading and preprocessing of images is performed by specialized vision subsystems. These systems may generate some data, from which a new desired Cartesian position is derived, as illustrated by the visual servoing interface module. The teleoperation input modules are also sensor modules. They have been classified separately in order to distinguish user input from other sensory input. In our control module library the teleoperation modules read from a 6 DOF trackball, thus both modules are similar.
The difference is the type of preprocessing performed by each module, allowing the trackball to be used either for generating velocities (which can be integrated to obtain positions), or force, for when the robot is in contact with the environment. Trajectory generators are another way of getting desired forces or positions into the control loop. The input may come from outside the control loop, such as from the user (e.g. keyboard), from a predefined trajectory file, or from a path-planning subsystem.
Differentiator and integrator modules perform time differential and integrals respectively. For example, joint velocities may be obtained by differentiating joint positions. Only the value of the current cycle is supplied as input. Previous values are not required, as the modules aie designed with memory, and keep track of the positions and velocities of previous cycles. The current time is assumed to be known by all modules.
Digital controller modules are generally the heart of any configuration. In our sample library, we have trajectory interpolators, a PID joint position controller, aresolved acceleration controller [ Given a library of modules, several legal configurations may be possible. Fig. 4 shows one possible configuration for a teleoperated robot with a torque-mode interface. Each module is a separate task and can execute on its own RWU, or multiple modules may share the same HTPU, without any code modification. The state variable table mechanism allows the frequency of each task to be different. The selection of frequencies will often be constrained by the available hard- Digital control modules do not directly communicate with hardware, and can execute at any frequency. Generally the frequency for the control modules will be a multiple of the robot interface frequency. When using the state variable table for communication between the modules, any combination of frequencies among tasks will work. This allows frequencies to be set as required by the application, as opposed to being constrained by the communications protocol. 
B.
The primary goal of the global state variable table mechanism is to integrate reusable control modules in a reconfigurable, multiprocessor system. The previous section gave examples of control modules, and a sample configuration. In this section, we will give an example of reconfiguring a system to use a different controller, without changing the sensor and robot interface modules. The change in configurations can occur either statically or dynamically. In the static case, only the task modules required for aparticular configuration are created. In the dynamic case, the union of all task modules required are created during initialization of the system. Assuming we are starting up using configuration (a), then the inverse kinematics task is turned on immediately after initialization, causing it to run periodically. while the damped least squares and time integrator tasks remain blocked, or off. At the instant that we want the dynamic change in controllers, we block the inverse kinematics task and turn on the damped least squares and time integrator tasks. On the next cycle, the new tasks will automatically update their own local state variable table, and execute a cycle of their loop, instead of the inverse kinematics task doing so. Assuming the on and off operations are fairly low overhead (which they are in our implementations) the
Reusable Modules and Reconfigurable Systems
dynamic reconfiguration can be performed without any loss of cycles. Note that for a configuration to properly execute, the set of modules must be schedulable on the available RTPUs, as described in [24] .
Note that open-ended outputs are fine (e. 
C . Combining Modules
The model of our control modules allows multiple modules to be combined into a single module. This has two major benefits:
1. complex modules can be built out of smaller, simpler modules, some or all of which may already exist, and hence be reused; and 2. the bus and processor utilization for a particular configuration can be improved.
For maximum flexibility, every component is a separate module, hence a separate task. This structure allows any component to execute on any processor, and allows the maximum number of different multiprocessor configurations. However, the operating system overhead of switching between these tasks can be eliminated if each module executes at the same frequency on the same processor. Multiple modules then make up a single larger module, which can be defined to be a single task. The preprogram and copy statements are provided by our svar implementation. The pausing and looping are handled by the operating system. Therefore, modules can be defined as subroutine components with a standard interface, which are called at the appropriate time by the above generic framework.
A . Locking Mechanism
So far we have assumed that tasks can transfer data as needed. However, since the global state variable table must be accessed by tasks on multiple RTPUs, appropriate synchronization is required to ensure data integrity. A task which is updating the table must first lock it, to ensure that no other task reads the data while it is changing. TWO locking possibilities exist:
1. keep a single lock for the entire table 2. lock each variable separately
The main advantage of the single lock is that locking overhead is minimized. A module with multiple input or output ports only has to lock the table once before transferring all of its data. There appear to be two main advantages of locking each variable separately: 1) multiple tasks can read or write different parts of the table simultaneously, and 2) transfers of data for multiple variables by a low priority task can be preempted by a higher priority task. Closer analysis, however, shows that locking each variable separately does not have these advantages. First, because the bus is shared, only one of multiple tasks holding a per-variable lock can access the table at any one time. Second, we will show later that the overhead of locking the table, which in effect is the cost of preemption, is often greater than the time for a task to complete its transfer. A single lock for the entire table is thus recommended.
Next, an appropriate locking mechanism must be selected. Simple mechanisms like local semaphores and only locking the CPU cannot be used, because they are only valid for single-processor applications. Multiprocessor mechanisms available include spin-locks [ 151, message passing, remote semaphores [23] , and the multiprocessor priority ceiling protocol [20] .
The message passing, remote semaphores, and multiprocessor pnority ceiling protocol all require significant overhead, which is typically an order of magnitude greater than the data transfer itself. For example, the remote semaphores in Chimera11 take a minimum of 44 psec for the locking and unlocking operations, and as much as 200 psec if the lock is not obtained on the first try and forces the task to block [23] . A typical transfer, on the other hand, may consist of 6 joint positions and 6 joint velocities, for a total of 12 transfers. On a typical VMEbus system, the raw data transfer (Le. excluding all overhead) takes approximately 16 psec. The message passing and the multiprocessor priority ceiling protocol would require significantly more overhead than the remote semaphores. It is thus not reasonable to use the higher level synchronization primitives for locking the state variable table.
The simplest multiprocessor synchronization method is the spinlock, which uses an atomic resr-and-set (TAS) operation. The TAS instruction reads the current lock value from memory, then writes I into that location. If the original value is 0, then the task acquires the lock, otherwise the lock is not obtained, and the task must try again. The read and write portions of the instruction are guaranteed to be atomic, even among multiple processors. To release the lock, 0 is written to the memory location. The number of bus transfers required to acquire and release the spin-lock is A = 2r + 1, where r is the number of retries needed to obtain the lock.
If a task does not get the lock on the first try, it must continually retry (or spin, hence the name spin-lock). If it retries as fast as possible, then the task may use up bus cycles which can instead be used by the task holding the lock to transfer the data. A small delay, which we call the polling time, should be placed between each retry. The polling time can be arbitrarily set, and usually some form of compromise is chosen. A polling time too short results in too much bus bandwidth being used for retry operations, while a polling time too large results in waiting much longer for a lock than necessary, hence wasting valuable CPU cycles. In our system, the polling time is 25 psec, which has so far been satisfactory for all of our expcrimcnts.
Unfortunately using a simple locking mechanism like the spin-lock does not guarantee a bounded execution time while waiting for or holding the lock. In [ 151, several schemes are described which do offer bounded execution time. However, each of these require some form of hardware support that is not available. In particular, all methods require a round-robin bus arbitration policy. The VMEbus offers round-robin bus arbitration for a maximum of 4 bus masters (every RTPU is a bus master, and some special purpose processors and direct-memory-access (DMA) devices may also be bus masters). More than4 bus masters causes some of the bus masters to be daisy-chained priority driven. In some installations, the system controller only has single-level arbitration, and no round-robin arbitration is possible. Consequently, the bounded locking mechanisms break down. To bound the waiting time for a spin-lock, we have implemented the mechanism described below.
First, to ensure that a task is not swapped out while it holds a lock, it will disable all interrupts on its own RTPU, thus allowing it to perform the transfer uninterrupted. Considering that the resolution of the system clock is generally on the order of milliseconds, and with the assumption that transfers are relatively short (i.e. less than a few tens of microseconds), disabling preemption while the transfer is occurring will have negligeable effect on most real-time scheduling algorithms.
Interruptions in using the bus may come from other RTPUs trying to gain the lock. In the worst case, each other RTPU will perfom one TAS instruction during every polling cycle. The maximum number of interruptions is thus controllable by setting an appropriate polling time.
Without a bounded waiting time locking mechanism, it is not possible to guarantee that tasks will get the data they require on time, every time. As an alternative, a time-out mechanism is used, so that if the lock is not gained within a pre-specified time or number of retries, then the transfer is not performed. The maximum waiting time for the lock is then the time-out period, which is also equal to polling-time * m-number-of-retries. For most tasks in a control system, missing an occasional cycle is not be critical. In such a case, the value from the previous cycle still remains in the local table, and will be used during the next cycle. When using the time-out mechanism, error handlers should be installed to detect tasks that suffer successive time-out errors. Discussion on handling these errors is beyond the scope. of this Paper.
B. Performance
A summary of the performance of our mar implementation is shown in Tables I and 11 , Measurements were taken from an Ironics IV3230 single board computers [7] , with a 25MHz MC68030 processor, on a VMEbus, using a VMETRO 25 MHz VBT-321 VMEbus analyzer [27] . The bus arbitration scheme of the Ironics IV3230 is set to release-on-request. The global state variable table is stored within the dual-ported memory of a second IV3230 RTPU.
As seen from the TableI, a significant overhead is incurred in VMEbus transfers, even when using the simplest of synchronization mechanisms. The time to obtain the global state variable table lock using TAS involves a subroutine call to an assembly language routine which performs the MC68030 TAS instruction [ 161, and checking the return value for a 1 or 0. Releasing the lock involves resetting it to 0. Locking and unlocking the CPU is periormed by trapping into kernel mode, modifying the processor priority level, then returning to user mode. The subroutine call overhead involves passing one pointer argument on the stack.
The lcopy() routine is used to perform a block transfer. It is an optimized form of the standard C routine bcopy(). It can only transfer multiples of 4 bytes (the width of the VMEbus data paths). Blocks are 16 bytes (4 transfers) each. The time in Table I is the subroutine call overhead, which includes passing three arguments on the stack. If the transfer is not a multiple of the block size, then an additional 3 psec overhead results for the incomplete block, but that time is incorporated into the raw data transfer time. The raw data transfer time is the time for sending the specified amount of data. Note that each float is exactly one transfer. The 9 psec transfer time for 6 floats includes the 3 p e c overhead because the transfer is not a multiple of 16 bytes.
Our mar mechanism gives the ability to preprogram a set of variables to transfer on every cycle. Multiple variables are then transferred together as a single block, hence the lock is only acquired once per cycle. The additional overheadper variable is time to update the pointers between transfers of each individual variable. Table II gives a summary of the times for various transfer between the global and local state variable tables, using both the single-variable and multivariable transfers. When using the single-variable transfer, a subroutine call and variable locking is required for each variable. Therefore for the case 6 *fIoat [32] , the routine is called six times, and the transfer size each time is 32 floats. For the multivariable transfer, the subroutine call and locking overhead is only incurred once for all the variables. In the case of 6 *float [32] , 192 floats are sent consecutively. Note that the multivariable transfer requires a preprogram operation, which is performed during initialization. It can take anywhere from 25 psec to a few milliseconds, depending on the number of variables being programmed, and the size of the state variable table. The overhead savings of using the multivariable transfer is greatest when modules have a large number of variables with short transfer sizes.
In our experiments using this implementation, all modules use the multivariable transfer. The small loss in performance for transferring a single variable is negligeable compared to the gains of the multivariable transfer if more than one variable is transferred, and for the consistency that all modules use the same transfer mode.
V. SUMMARY
In this paper we first presented a simplified port automaton model for the definition of reusable and reconfigurable control tasks. Using this model we developed a state variable table mechanism, based on global shared memory, to integrate control modules in a multiprocessor open-architecture environment. Using the mechanism, control modules can be reconfigured. both statically and dynamically. The maximum bus bandwidth required for the interprocessor communication can be calculated exactly, based on the module definitions. The mechanism allows control tasks of arbitrary frequencies to communicate with each other without the need for any special provision. The mechanism is also robust when clocks on multiprocessors suffer skewing problems.
We showed examples of a control module library, a teleoperation control module configuration, and a reconfigurable application. The state variable table mechanism has been implemented as part of the Chimera I1 Real-Time Operating System. Several implementation issues were also considered, the most prominent being the locking mechanism used to ensure proper control module synchronization and data integrity. We chose to lock the entire state variable table with a single lock, using a high-performance spin-lock with CPU locking. Detailed performance measurements are given, highlighting the overhead versus raw data transfer execution times. The multiprocessor control module integration using state variables has proven to be an extremely valuable method for building reconfigurable systems. This method is being used at Camegie Mellon University with the Direct Drive Arm II [8] [28] , the Reconfigurable Modular Manipulator System J J [21] , the Troikabot System for Rapid Assembly [9] , and the Self-Mobile Space-Manipulator [4] , and at the Jet Propulsion Laboratory, California Institute of Technology, on a Robotics Research 7-DOF redundant manipulator [26] . These systems all share the same software framework. In many cases, the systems also share the same software modules. The sensors and control algorithms used for any particular experiment on any of these systems can be reconligured in a matter of seconds, and in some cases dynamically.
