Abstract-Digital control for embedded systems often requires low-power, hard real-time computation to satisfy high control-loop bandwidth, low latency, and low-power requirements. In particular, the emerging applications of Micro ElectroMechanical Systems (MEMS) sensors, and their increasing integration, presents a challenging requirement to embed ultralow power digital control architectures for these lithographically formed micro-structures. Controlling electromechanical structures of such a small scale, using naive digital controllers, can be prohibitively expensive (both in power and cost for portable or battery operated applications.) In this paper, we describe the potential for control systems to be transformed into a set of cooperating parallel linear systems and demonstrate, for the first time, that this parallelization can reduce the total number of instructions executed, thereby reducing power, at the expense of controlled loss in control fidelity. Since the error tolerance of linear feedback control systems is mathematically wellposed, this technique opens up a new, independent dimension for system optimization. We present a novel Computer-Aided Design (CAD) method to evaluate control fidelity, with varying timescales on the controller, and analyze the trade-off between performance and power dissipation. A CAD Metric for control fidelity is proposed and we demonstrate the potential for power savings using this decomposition on two different control problems.
I. INTRODUCTION
Digital feedback controllers make up a substantial part of modern embedded systems like ink-jet printers and anti-lock brake systems in cars. Their discrete-time realization, largely designed using classical unit-time sampling techniques, tends to dominate the design performance criteria because of realtime sampling and actuation constraints. These controllers are often implemented on micro-controllers for low bandwidth and digital signal processors or Field-Programmable Gate Arrays (FPGA) for higher bandwidth requirements. Their implementation exploits the linear systems approach for control realization by performing matrix-multiplication based updates of a set of state variables, or direct digital realization as an Infinite Impulse Response (IIR) digital filter.
In practice, however, such a strategy can lead to very inefficient, impractical, and expensive designs. For example, the performance of Atomic Force Microscopy (AFM) depends on the performance of the feedback control of the scanning cantilever, which is based on distributed FPGA implementations of the control algorithm to maintain the required loop latency. Another example is the design of portable MEMS sensors, whose diminutive physical size makes very high demands on loop-bandwidth requirements from the controller [1] , [2] . Typically, a linear size scaling of a MEMS device linearly increases the required control-loop bandwidth, and with added requirements of noise shaping and filtering, makes the control algorithms computation intensive, running at sample rates of the order of a few MHz. These strict design requirements seriously hamper the development of small, portable MEMS based systems. Indeed, all current closed-loop MEMS designs are based on analog controller designs.
A promising alternative is to use a custom low-power hardware multi-threaded architecture [3] . Such an architecture enables processes that can be interrupted and interleaved at the granularity of single instruction cycles to meet timing (jitter) obligations. Further, we can substantially reduce latency and power by partitioning the control system based on bandwidth requirements and by performing only the most critical computations on the fastest thread, letting the slower parts run on slower threads. This nontrivial and difficult to analyze technique lowers the total number of instructions executed per second, which translates directly to significant power savings.
It is also possible to attempt algebraic manipulation [4] of the control algorithm to reduce latency and power without loosing significant control fidelity. However, this technique has several problems: First, even if the transformation is algebraically correct, round-off error and numerical stability issues often lead to unstable results [5] . Second, the space of possible control algorithms, which might be suitable for control decomposition, is very large -only a minuscule fraction of which are algebraically equivalent to yield the desired control law. Instead, we posit that it should be possible to use well defined controller performance metrics to evaluate the fidelity and power dissipation of a given embedded control design. This approach effectively increases the level of abstraction in the design since a candidate controller need not be equivalent in any sense other than the evaluation of the performance metric.
In this paper, we take the first steps toward this approach by using a well defined metric for control stability and closed-loop performance. We use an exhaustive search based computer-aided design (CAD) method for exploring the design space by running a partitioned controller at different rates of computation. The metric allows us to explore tradeoffs between performance and computation power and clearly shows that we can raise the level of abstraction for control design and optimization, beyond traditional methods. Rest of the paper is organized as follow: Section II present related work; Section III introduces the H-infinity framework, a basic control design methodology and performance metric with two case studies; Section IV presents our analysis of computation cost relative to performance within a bounding envelope of deviation from an ideal continuous controller. Finally, to conclude, we also demonstrate the use of Voltage Frequency Scaling (VFS) to reduce computation power significantly.
II. RELATED WORK
Ad hoc techniques for low power multi-rate control have been used to implement a hard disk drive (HDD) controller in [6] . Ad hoc techniques for multi-rate control with interlacing of computations have been simulated and analyzed in [7] , where the results show that the control performance deteriorates marginally with significant reduction in computation energy. Multi-rate digital control for motion control applications has been demonstrated, through various examples, in [8] .
Software based power estimation for optimizing power in embedded systems is discussed in [9] , [10] . An optimizing compiler, that minimizes execution time to save power has been proposed in [11] . A multiple clock and voltage domain, power-aware, tile-based embedded multiprocessor architecture has been proposed and analyzed in [12] . Architectural trade-offs for MEMS closed-loop control have been presented in [13] , and a case study of hardware multi-threaded architectures for control has been discussed in [3] .
III. MULTI RATE DIGITAL CONTROLLER IMPLEMENTATION

A. Control Design
The H-infinity framework has been used as the basic control design methodology and the control performance metric [14] . In this type of design, one starts by determining the points at which disturbances and noise are most likely to enter the system dynamics and then designs a feedback controller that minimizes the effect of these disturbances on specific signals that one wants to regulate, typically tracking errors (i.e., the difference between a signal and its desired value) and control signals. Specifically, denoting by w the vector of exogenous noise/disturbance signals that affect the dynamics of the process, sensors, and actuators and by z the vector of signals that one wants to regulate, an H-infinity controller minimizes the worst-case root-mean-square gain
where the maximum is taken over all possible disturbance signals w. One can capture band-limited disturbances by introducing weighting pre-filters in w and one can also introduce weighting post-filters in z to capture the fact that one may only want to regulate this signal tightly over specifc bands of frequency. The value in equation (1) obtained by a specific controller is called the H-infinity norm of the closedloop system. For linear time-invariant continuous-time processes, the controller that minimizes the H-infinity norm is also linear time-invariant and operates in continuous time, thus requiring an infinite amount of computation. In practice, this cannot be implemented and thus one needs to approximate this continuous-time controller by a discrete-time controller that can be implemented digitally. Since this is not an optimal continuous-time controller, the discrete-time digital controller will not be able to achieve the minimum value for the Hinfinity norm in equation (1), but hopefully does not increase it by too much. A side effect of this analysis is the generation of a lower bound for fidelity allowing derivation of sensible trade-offs.
Traditionally, a digital controller is obtained by discretizing a continuous-time controller using the zero-order-hold (ZOH) method with a fixed sampling interval T . For sufficiently large sampling rate, the resulting closed-loop exhibits an H-infinity norm very close to that of the continuous-time controller. However, if one has computational constraints that limit the sampling rate, then the performance may degrade significantly and may result in an unstable closedloop system.
B. Multi-rate Digital Implementation
Here, we propose an alternative approach that is motivated by the observation that generally the optimal continuous-time controller exhibits a few high-frequency modes that need to be implemented at a high-sampling rate, while its lowfrequency modes can be implemented at a lower sampling rate. In particular, suppose that the transfer function of the continuous-time controller is denoted by K(s) and that we perform a partial fraction expansion to decompose this transfer function as a sum of two transfer functions as:
where K 1 (s) is a first-order transfer function whose single pole is the fastest pole of K(s) and
When the fastest pole of K(s) has a non-zero imaginary part, then K 1 (s) is a second-order transfer function with the two fastest (complex conjugate) poles of K(s). For digital implementation, we now have two degrees of freedom in selecting the sampling rate, as we can discretize and implement K 1 (s) and K 2 (s) with two different sampling intervals T 1 and T 2 , respectively.
C. Case Studies
We use two case studies to demonstrate the benefits that can be drawn from this approach: the position control of Quanser's DC servo motor [15] and the control of a MEMS tunneling accelerometer [16] . a) DC servo: Quanser's DC servo motor can be modeled by the following second-order transfer function from the input voltage to the shaft angle as:
For the purposes of H-infinity design, we considered the effect of an additive input disturbance with energy concentrated below 100 rad/s (which is roughly twice the system's bandwidth) and flat-spectrum measurement noise with an amplitude 100 times smaller than that of the disturbance. The signals to be regulated by the H-infinity controller are the shaft angle and the applied voltage. Through the use of a weighting post-filter we mostly penalized the control signal at frequencies above 1000 rad/sec to prevent high frequency noise to flow through the feedback loop. These design choices led to the following H-infinity controller: 
It is interesting to note that although the poles of K 1 (S) (with absolute value 215rad/sec) are not much faster than those of K 2 (s), this decomposition can still lead to significant computational savings, as we shall see below.
b) MEMS tunneling accelerometer: The MEMS tunneling accelerometer described in [16] is analyzed similarly. In this type of accelerometer, the feedback controller should use the applied voltage to cancel acceleration and one effectively gets the readout equation:
where Vapp(s) is the control voltage.
For the purpose of H-infinity design, the goal is to reject from the measured voltage the effect of acceleration, as well as noise. In constructing the vector w of exogenous inputs, noise terms were scaled to have unit standard deviation [16] . Additionally, a fictitious low-frequency noise term is added to the measured voltage, to make the controller insensitive to low frequency measurement biases such as 1/f amplifier noise. These design choices led to the following H-infinity controller: In this case as well, the poles of K1(s) (141881 rad/sec) are not much faster than those of K2(s) (66478 rad/sec) and yet significant computational savings can be achieved with this decomposition.
IV. MODEL FOR COMPUTATION COST
The model for computation power cost focuses on the execution of the control algorithm and conservatively assumes that the cost of data acquisition and actuation is constant. Both examples are Single Input Single output (SISO) control systems, where the decomposed controllers share the same input, and their outputs are added together to generate the control output. The two controllers run as two separate threads of execution on two similar processors with no communication except the summation of the outputs, which is done in the faster thread. For the purpose of implementing digital control, the above two example controllers are discretized using zero-order-hold (ZOH) and implemented as shown in Figure 1 .
A. Instruction Count
As a first order approximation of the computation cost, we determine the total number of processor instructions per second executed to implement a given control algorithm. For the purpose of our model it is convenient to implement the controller as cascaded, direct form II, structured second order sections (SOS), also known as Biquads (Figure 2 ) on a processor.
To evaluate instruction count, we used a Texas Instruments TMS320C2801 (Digital Signal Controller), which is a 32-bit, 8-level deep pipelined architecture, offering up to 100 MIPS of performance and an ARM7TDMI, a 32/16-bit, RISC architecture offering up to 130 MIPS of performance. The total number of instructions required to implement a biquad section on each processor is determined through their individual library assembly routines [17] , [18] . In Table I we list the total number of biquad sections needed to implement our control examples and the total number of instructions executed per sample time on the two processors. For a singlerate controller running at a frequency Fs and its decomposed version running at frequencies Fs1 (fast) and Fs2 (slow), the total number of instructions per second is determined as follows:
• Single-rate: Fs * instructions per biquad * number of biquad sections • Multi-rate: Fs1 * instructions per biquad * number of biquad sections (fast) + Fs2 * instructions per biquad * number of biquad sections (slow) Searching the sampling parameter space (sample time pairs for the fast and slow controllers) of the H-infinity framework, we generate a family of curves for the power/performance trade-off. It is then easy to determine Pareto optimal points suitable for controller realization. Figure 3 shows the computation-cost/performance trade-off for the DC servo motor control, where, the H-infinity norm is plotted against the total number of instructions/second (TMS320C2801) for a single-rate controller, and a multi-rate controller running at different rates of computation. The plot highlights the family of curves and the Pareto optimal points for single-rate and multi-rate implementation of control.
B. Power Estimation Model
The power estimation model measures power consumption of each thread, assumed to be running on individual pro- cessors in an architecture similar to Synchroscalar [12] . Although different strategies for realizing digital control would result in a different mix of instructions, and hence different power consumption, choosing the well studied biquad ensures that realistically optimized estimates can be determined.
It is important to note that by decomposing the control, we benefit from reduced computation order in addition to enabling multi-rate computation. Therefore, once we know the total number of instructions/second and the instruction distribution for a control realization, we can use the results from a cycle accurate energy consumption measurement of a processor instruction set [19] to estimate the power dissipation. Figure 4 illustrates the Pareto optimal points for the case study of MEMS accelerometer control where Hinfinity norm is plotted against the total power dissipation for an ARM7TDMI processor.
In Table II we show the computation cost for DC motor control, and, compare the results for the single-rate uniformly sampled controller against a few configurations of the multirate implementation, for near similar values of H-infinity norm. It is interesting to note that performance is nonmonotone with sample rates and that careful selection of rates can lead to higher overall fidelity than is achievable by single-rate sampling. In Table III we show the power consumption and the H-infinity norm for the MEMS singlerate controller versus a few configurations of multi-rate controller, implemented on an ARM7TDMI.
C. Voltage Frequency Scaling
In this section we postulate that an architecture similar to Synchroscalar [12] can be used to implement decomposed control algorithms to further reduce power. Each thread of computation is mapped onto a different processor core running at its optimized voltage and frequency requirements. The voltage/frequency pairs are statically scheduled after determining the real-time requirement of control loops. This [20] . Advantages of statically scheduled voltage frequency pairs are more prominent if we consider, for example, a computation architecture that needs to control a 3-Dimensional (3-D) MEMS accelerometer. From our MEMS control example, we needed a fifth-order controller for 1-Dimensional (1-D) control, therefore, for a 3-D MEMS accelerometer, we would need to execute at least three times the total number of instructions executed for the 1-D controller, within the same execution deadline. Such a controller would require a very fast processor, consuming a lot of power. Instead, we can partition each fifth-order controller, like before, into two separate threads, and run them individually as three fast and three slow threads on Synchroscalar. Table IV 
V. CONCLUSIONS
In this work, we have shown that it is possible to raise the level of abstraction in the design power analysis for linear feedback controllers in a way that directly relies on well-known metrics for the design of such controllers. This is in contrast to algebraic methods which can reach only a small fraction of the same design space. The new techniques exploit simple multi-thread or multi-core architecture tricks to allow practical implementation where dependence of the partitioned controllers is minimal. Using these techniques, very substantial power savings are obtained in a way that is independent from conventional algebraic tricks such as subexpression decomposition and related high-level synthesis techniques. Although the current approach is exhaustive, for practical scale designs it can be performed in a very reasonable amount of time. Further work in this area will likely target more efficient exploration techniques as well as alternative design metrics suitable for sensors and other related designs.
