Abstract: The goal of the PAWiS project is to develop both, efficient system architectures and the related design methodology for power aware wireless sensor and actor network nodes that allow for capturing inefficiencies in every aspect of the system. These aspects include all layers of the communication system, the targeted class of the application itself, the power supply and energy management, the digital processing unit and the sensor-actor interface. The proof of concept will be based on a prototype system that allows a future integration in a single SiP/SoC. The project is supported by Infineon Austria and started only recently, therefore the main focus of this paper is on the design approach. Copyright © 2002 IFAC 
INTRODUCTION
Sensor Network Nodes are made up of microwatt radio and digital baseband transceivers that features low-duty-cycle (<1 %) low-throughput (1 bps to 10 kbps) unifying nearly all design disciplines in one package: MEMS-based sensing technology, signal conditioning, A/D and D/A conversion, digital signal processing, protocol layers such as a power-aware Media Access Controller (MAC) and routing layer, antenna design, energy management and energy scavenging.
The topic of Wireless Sensor and Actor Networks (WSANs) has been mainly researched in academia so far, however interest from industry has grown in the recent past. Most applications that make use of these networks require energy autonomy for the complete lifetime of the network which can be many years or even decades, hence the minimization of power consumption down to a few tens of µAmperes on average for a single sensor node is compulsory. Designing such a heterogeneous, extremely efficient system is a highly challenging task that requires new approaches in many different aspects of the whole system design and even the design methodology itself. A µWatt node would enable the deployment of large maintenance-free networks with numerous nodes which do not require the replacement of batteries during the lifetime. Alternatively these nodes could run from low cost energy scavenging systems extracting energy from different environmental sources (e.g. light, vibrations).
STATE OF THE ART SENSOR NODES
Numerous research groups and companies design, publish papers and offer wireless sensor node solutions with emphasis on one or more aspects of optimization. A very low power System-on-Chip (SoC) sensor node has been built in the course of the WiseNet research project (Enz et. al., 2004) where the receiver's power consumption is only 2 mA (permanent on) at an operating voltage of 0.9-1.5 V. The technology used is a standard low cost 0.18 µm digital CMOS process. In Berkeley Wireless Research Lab different nodes have been built, from the MICA family now commercially available to the smart dust nodes (Warneke et. al., 2001) showing some future concepts far beyond state of the art. The sensor node built at our department uses only commercially available components but relies on high bit rate transceivers with a short turnaround time and a very efficient CSMA protocol for low throughput applications (Mahlknecht and Rötzer, 2004; Mahlknecht and Böck, 2004) . Within the EYES project, Infineon has developed highly efficient Wireless Sensor Network node hardware in collaboration with the project partners, particularly with TU Berlin and Universities of Ferrara and Rome (Eyes, 2005) . This hardware is based on the Infineon TDA525x radio transceiver family together with a TI MSP430 microcontroller used in most sensor node implementations.
Comparing state of the art wireless sensor nodes offered by other companies (Moteiv, 2005; Sensicast, 2004; Crossbow, 2005) or nodes for research purposes in academia (e.g. Berkeley motes) evidently commercially offered nodes are neither low cost (in the order of 100 U$) nor as low power as required to run for the whole lifetime. There are also some single chip solutions on the market including a microcontroller with analog interfaces as well as a radio transceiver, (Chipcon CC1010, Nordic nRF24E1, Chipcon CC2430, CC2530). Also these implementations are not as energy efficient as desirable (< 50 µW, routing delay 10-100 ms required in many applications). This conclusion is based on datasheet information and real world experiments. Most of these nodes combine a standard 8051 CPU core as well as a radio transceiver not taking into consideration the optimization of the overall system based on a targeted class of applications. Another weakness is the missing true wakeup receiver architecture that allows a node to remain in an ultra low power listening mode. Even though proposals have been made (Gu and Stankovic, 2004; Rabaey, 2001) , still no efficient implementation is available. A periodic wakeup is supported by on-chip hardware on the novel transceiver CC1100 from Chipcon; based on datasheet values, the receiver consumes only 15 µA with a periodic wakeup of 1s, however one second may be to long for short latency multi-hop applications. Shortening the wakeup period significantly increases the power consumption. The Tinymote sensor node developed at ICT has a similar wakeup scheme as described above. Experimental results showed that the average power consumption of a sensor node can be as low as 95 µW by forwarding packets at a rate of 10 packets per minute with a guaranteed hop-to- How to minimize it to a sub µAmpere level for a large dynamic range? • Oscillator start-up time: How to minimize settling time of an oscillator to reduce the turn-on time • Find a common denominator for a generic sensoractor interface that is flexible enough to support a number of different sensors and energy efficient enough not to compromise the overall nodes efficiency • Process technology -Leakage current: Decreasing the feature size in a semiconductor process yields higher integration densities but unfortunately also increases leakage currents for technological reasons.
Upper Layer Protocols
Security Function SiP/SoC sensor node implementation. Grey shaded rectangles indicate those blocks where we expect to achieve the largest efficiency gains in the overall system design. The SiP/SoC architecture will be designed to support different classes of applications where ad-hoc multi-hop communication is required as well as applications where a short latency (< 10-100 ms) between hops or real-time communication is demanded. The main challenge is to find a very efficient overall system architecture that is able to map application requirements to protocol requirements and down to the hardware with as little energy consumption as possible in a final design based on a large variety of design options. However, the question is how to find the optimum system architecture? The optimum system architecture can only be found by applying the proper design methodology. This includes identification of strategies for energy reduction at the application layer and maintaining low power constraints across all layers down to the hardware implementation. For this goal, it is mandatory to investigate the interdependencies between all functional units as well as between all design hierarchies.
DESIGN METHODOLOGY
In engineering and development often questions upon design decisions arise. The decisions are mostly driven by experience and according to instinct of technicians. Although each decision is made in an optimal manner, they only concentrate on details of the total problem. Thus the sum of all decisions leads to a local optimum for the total system but most probably overlooks other local optima, which would result in even better performance. To find the best local optimum within technological and/or physical restrictions (subsequently called "global optimum") another design approach is necessary. We apply a methodology to find this global optimum for a particular system. Therefore the total system is modeled in a so called virtual prototype at a very abstract level. The virtual prototype is a software simulation framework to model the system at a certain level of detailedness. It allows to simulate certain system properties depending on adjustable parameters. It is assembled of abstract and/or functional modules where each of these implements a model to provide results for their properties as accurate as possible.
Similar approaches have been introduced in (Silva et. al., 2001) , ) and (Lizhi et. al., 2004) . Where (Silva et. al., 2001 ) uses UML for abstract modeling and numerous universal tools for code and net list generation, we assume that the automatically generated results have to contain wrapper structures and other overhead which is not suitable for tiny embedded systems. ) describes the so called "platform based design" which enables heavy module reuse by a full top-down methodology. Since they don't report about restrictions and performance properties mirrored from the bottom layer back up to the top-level optimization model, this approach seems not to be able to fully utilize capabilities of the semiconductor process. (Lizhi et. al., 2004 ) uses an analytical model to describe the data-link-layer (DLL) of the network protocol stack.
Where this approach allows deep insight into the functionality and behavior, it only gives low accuracy.
In our design methodology all kinds of compositions and features can be simulated and compared easily parameterization. Different implementations of certain blocks (e.g. SAR-or dual-slope-ADC, different network protocols) as well as very flexible adjustment of the partitioning of a functional block (e.g. implementing parts of an algorithm in hardware or software, analog/digital partitioning of the transceiver) guarantees to avoid tabooing unusual or disliked solutions. This parameterization constitutes of system-, architecture-, cicuit-design and technology-specific parameters (e.g. ADC resolution, partitioning bounds, connectivity, bus communications protocol, leakage power, switching power). The proposed methodology is a true topdown approach. All design decisions are taken at the system level. This enables to change combinations of implementations as well as cross-layer optimization instead of just optimizing every module's implementation by its own. Nevertheless it is necessary to carefully treat the bottom layer (implementation) to find accurate simulation models. As shown in Fig. 2 , the possible implementations pose constraints for the architecture which have to be considered within the virtual prototype.
Fig. 2: Design Methodology
In the first phase one particular (sub-optimal) system architecture is selected and simulated. The models are built to estimate the power consumption and timing. The virtual prototype is extended to simulate the fully functional system. This includes an executing CPU, the radio transceiver, memories and sleep/wakeup modes. Several virtual prototypes are then instantiated and connected with a network simulator to simulate the complete network. To find an optimum system only relative accuracy throughout design changes is required to compare alternatives whereas absolute accuracy is secondary. Most model parameters will be taken from experience and raw (guided) estimations.
In the second phase the modules with most potential for energy saving are determined. Multiple types of every module, various combinations and alternatives of modules and architectures are simulated. The simulation models are refined and extended to simulate more functionality and behavior. Therefore we will have to "dive" deeply into some module's implementation details (e.g. analog leakage current of CMOS circuit prototypes, wakeup receiver implementation, e.g. (Gu and Stankovic, 2004) , network protocols, e.g. (El-Hoiydi, 2003; Safwat, 2003) . In this phase the system architecture is optimized even further.
In the third phase a real prototype is built. Due to financial and timing limitations, we will implement only several parts of the total system on a test chip. This chip is then mounted on a PCB which holds the residual (commercially available) parts combined with an FPGA realizing custom logic forming the total system. The presented methodology applies at the system level as suggested by (Chou, 2005) . This enables structural changes at the topmost layer and yields higher potential for improvements as opposed to optimization of the individual and predefined modules. Third party modeling and simulation frameworks will be utilized and combined to implement the virtual prototype. By forcing the design engineers to concentrate on the system level and motivating them to leave beaten tracks by introducing novel structures and architectures, we ensure a streamlined and systematic approach to achieve the overall design goal of optimization of the power consumption.
DETAILED FIGURES OF THE PROPOSED APPROACH
We propose to explore the following approach based on preliminary research results:
• Design Methodology: Optimization at system level before going into implementation details.
• Consider all components of the system.
Understand their dependencies in terms of functionality and power consumption. For this task we first plan to develop simple energy models. Where necessary (based on the relevance of the power consumption of the functional blocks) we refine the models of the single components and subcomponents to understand how these affect the overall power consumption. Our approach is then to focus on these blocks where most of the inefficiencies can be captured.
• Explore efficient partitioning between tasks (applications, sensor reading, middleware, low level protocols) and find an adequate platform for each task.. For instance a reconfigurable platform optimized for protocol processing such as the one proposed in will be considered for the lower level protocols that uses a combination of PAL (programmable array logic) and LUTs (look-up table) blocks. This represents hybrid cells each consisting of a small PAL block for control and an array of LUTs and flip-flops for data processing.
• Keep the design as simple as possible. We think that simplicity is one key strategy to reach the desired goal. The reason is that more complex systems tend to consume more power due to the number of transistors that will be switched and the increasing leakage current as chip area increases. This does not exclude the use of parallel very specialized hardware structures that can run at low clock speed and be turned completely powerless when not used.
• Switch off the main transceiver as much as possible. This should be possible with the help of a second receiver (wakeup receiver) that features less performance than the main receiver but only at a fraction of the energy of the latter. This radio may make extensive use of passive structures like MEMS. The wakeup receiver should only be capable of decoding incoming low bit rate wakeup preambles in order to decide whether to wakeup the main receiver or not. In scientific literature proposals for wakeup receivers have been made (Gu and Stankovic, 2004; Rabaey, 2001 interference. An alternative might be to use the simple modulation scheme only for the wakeup radio and make use of a very agile and more complex high bit-rate main radio transceiver that handles the actual packet transmissions and receptions. High bit-rate transceivers need to be turned on only for a very short amount of time hence higher power consumption in the active mode can be tolerated, however the turn-on time becomes a very critical parameter.
• Make use of available IP cores implementing highly energy optimized CPU cores used for application processing and integrate a very low power mode where the CPU can wakeup quickly (Wakeup based on RTC trigger, on-time: < 10µs with DCO oscillator). We will evaluate different architectures of power aware IP cores that are commercially available such as the CoolRisc from Xemics, the eCog1 from Cyan, or the 8051 and choose an appropriate core for simulation.
• Exploit parallelism at lowest-possible clock speed. A sensor node runs different concurrent tasks with widely different requirements (sensing -> low duty cycle, MAC -> real time, application -> dependent on the task). Hence, a bus-based heterogeneous architecture exploiting task-level parallelism is a natural choice. The components can either be a processor or configurable hardware blocks tuned to the respective application. Each processor/hardware block must be tuned to the application, with only the flexibility needed by the application.
• Dynamic voltage and frequency scaling. Based on performance requirements (real time) and operating temperature the voltage can be reduced to a minimum level in each operating stage. In order to control this a dedicated power management engine is proposed. In standby or for RAM retention the voltage of single functional blocks can be reduced below 1 V for 0.18 µm and 0.13 µm CMOS processes. Ultra low voltage CMOS technology based on SOI (Silicon on Insulator) has proven its feasibility for voltages down to 0.5 V (By Emmicroelectronics) but might not be applicable due to the lack of an available process.
• Make use of power domains: Detach any unused blocks from power supply.
• Use specialized small low power SRAM blocks.
SRAM is very energy and space consuming. We expect to need no more than 512 bytes of RAM. However this would probably not allow porting the widely used operating system TinyOS developed for wireless sensor network nodes. The question of how much RAM and FLASH memory is required will be determined by the class of applications and the final sensor node system architecture.
• Minimize access to the global SRAM. Use small register banks for context switch and for keeping state information. Avoid copying data packets from the communication interface at all (zerocopy architecture).
• Minimize current consumption by aggressive use of passives. This is investigated mainly for the wakeup transceiver which will have probably the highest duty cycle of all on-chip components (highest on/off ratio). For these modules, every µAmpere of additional current significantly affects the overall energy consumption. In (Ruby, 2001) passive structures based on BAW/FBAR and RF-MEMS (Clark et. al., 2000) are proposed.
• Enable the main system blocks to trigger themselves for task execution i.e. sensor-actor interface wakes up the measurement module and only notifies the application CPU when changes based on a programmable threshold have been observed.
• Use different internal voltage levels by means of on-chip DC/DC voltage down converters or low quiescent current LDOs. Different voltages are used for different power down stages and system blocks to minimize overall power consumption.
• Use a standard digital CMOS process (most probably 0.13 µm) for most or evenall of the SiP/SoC to achieve the low cost target over a long term.
• Analog components such as wakeup radio, voltage converters or mixed signal sensor-actor interface may be integrated in a second chip based on BiCMOS depending on the simulation results comparing CMOS implementations. Investigations have to find out whether this partitioning pays off in terms of price/performance/power ratio.
• Find the right trade off between analog and digital. Especially the analog/digital partitioning of the radio transceiver is no straight forward decision.
Comparison between different implementation types based on high level models will help to choose the most efficient design.
• Find a power-adaptive system-architecture with respective protocol, that is able to adapt its processing performance to the (instantaneous) available power.
OUTLOOK
In previous applied research at the Institute of Computer Technology of the Vienna University of Technology (Mahlknecht 2004 , Rötzer, 2005 as well as in the course of the now completed EU-funded project EYES (IST 2001 34734) with Infineon as industry partner, significant experience has been gathered in the area of WSANs. Together with the many valuable solutions published in papers by the very active research community, the project consortium is very confident to develop the most efficient overall system architecture and dedicated hardware solutions for µWatt sensor and actor nodes.
