Abstract -In this paper an overview of the EU-FP6 "Smart Chips for Smart Surroundings" (4S) 171 project is given. The overall mission of the 4S project is to define and develop efficient (ultra low-power), flexible, reconfigurable core building blocks, including the supporting tools, for future ambient systems. Dynamic reconfiguration offers the flexibility and adaptability needed for future ambient devices, it provides the efficiency needed for these systems, it enables systems that can adapt to rapidly changing environmental conditions, it enables communication over heterogeneous wireless networks, and it reduces risks: reconfigurable systems can adapt to standards that may vary from place to place or standards that have changed during and after product development.
I. INTRODUCTION
The overall mission of the 4S project (Smart Chips for Smart Surroundings) is to define and develop efficient (ultra low-power), flexible, reconfigurable core building blocks for future ambient systems including the supporting tools. As an application we have chosen a concrete worldwide broadcast radio application (DRM) and video that can be used in an ambient system scenario.
Ambient systems (also known as ambient intelligence or ubiquitous computing) are networked embedded systems intimately wirelessly integrated with everyday environments and supporting people in their activities. These systems will create a smart surrounding for people to facilitate and enrich daily life and increase productivity at work. It is likely that these systems will be quite different from current computer systems, and will have to be based on radically new architectures comprising a set of reconfigurable "building blocks" (IP blocks) and flexible interconnection mechanisms. These components often have conflicting requirements; they have to be flexible, adaptive as well as energy-efficient and low-cost.
Hence, the systems architecture of future ambient devices poses a lot of challenges: these devices have a very small energy budget, they are always operational (although quite often in a low-power mode), are small in size but might require a performance that exceeds the levels of current PDA PACT The overall important characteristic is the life-time of a communication stream. We aim to develop a SoC for a multimedia terminal where we can assume that the data streams are semi-static and have periodic behavior. This means that for a long period of time subsequent data items of a stream follow the same route. This will last for seconds and more, because a user will listen to its radio or has a phone conversation for a considerable time. However, the control system might change some settings of processes due to changing environmental conditions.
According to the type of services required, the following types of traffic can be distinguished in the network: * GT (guaranteed throughput) this is the part of the traffic for which the network has to give real-time guarantees (i.e. guaranteed bandwidth, bounded latency). * BE (best effort) this is the part of the traffic for which the network guarantees only fairness but does not give any bandwidth and timing guarantees. In our proposed NoC we support both GT 
IV. DESIGN TRAJECTORY
Proper development tooling is essential for programmable devices. This is a major requirement for the system engineer to program the reconfigurable device.
Reconfigurable processors substantially reduce development cycles and costs normally associated with ASIC design, including nonrecurring engineering (NRE) costs, mask sets, fabrication runs, and perhaps most importantly, respins. However, controlling the development time and costs in a reconfigurable processor design requires a comprehensive set of tools -a design environment with a graceful flow from systems design to executable files that configure the reconfigurable architecture and a run-time system that maps the processes to processors.
In fact the availability of high-level design entry tooling is critical for the viability of any reconfigurable architecture. In the 4S project we will develop methods and techniques to support the mapping of typical algorithms found in the ambient intelligence application domain onto heterogeneous reconfigurable architectures. These techniques have to identify the characteristic properties of the algorithm at hand and match these with the characteristic properties of the different target technologies (analogue, bit-level reconfigurable, word-level reconfigurable, progranumable etc.). Figure I shows the design trajectory of the 4S project, the compile-time design flow and the run-time flow (controlled by the RTOS). The design-time tool chain is based on existing tools for the tiles, with possible extensions. This allows the integration of the implementation results of the various tiles providing co-simulation, combined power estimation and performance characteristics. For each task, a set of precompiled functions with tile-specific characteristics conceming power, tile utilization and performance is provided.
At run-time the operating system (RTOS) dynamically selects the required task from the set. The decision which of the available tasks from the set is utilized is based on the actual needs of the application. The selection criteria can be current power constraints (e.g. low battery), utilization of resources of the hardware platform by other applications (e.g. the coarse-grained reconfigurable tile is currently utilized by another application) or user demands (e.g. user wishes higher audio quality). In this section we give an overview of the design methodology and of the existing and developed tools.
A. Task graph The whole software trajectory starts with a high-level tasklevel system description ofthe application. We assume that the applications are written in C/C++ in terms of task graphs consisting of functional processes with standard 4S interprocess communication primitives.
As a first step in the design methodology, the application software, written in C/C++, can be simulated and validated on a functional verification platform. The advantage of this early simulation is that the overall application structure can be verified independently of the actual functional implementation. Note that by writing applications as communicating processes the programmer automatically does the (manual) partitioning.
B. Compiling individualfunctionalprocesses to processors
Functional processes can be implemented on various hardware/software tiles (e.g. implementation of a 256pFFT on an ARM, on a Montium or on an embedded FPGA); the 4S inter-process communication primitives must be mapped onto specific NoC capabilities. The next step is that individual functional processes are compiled or synthesized to appropriate processing tiles. Individual functional processes might have functional equivalent process implementations on more than one processing platform. At design-time all these implementations will be generated. At run-time the operating systems decides which implementation will be used depending on available tiles, QoS and energy constraints. The tools for compiling processes to processing tiles are not developed in this project, but we assume they are available. However, when small hardware changes require adaption of the tools, this will be done inside of the project, but it is not the main focus ofthe 4S project.
C. Annotation ofprocess implementations
The in the previous step derived implementations of the processes will be annotated with performance characteristics (e.g. number of clock cycles, energy consumption, memory requirements, average load on a processing element). These performance figures are used by the run-time system to find the most optimal processing element for each process [9] .
For most processing tiles there are tools available to derive the performance figures, but for other processing tiles the performance figures will be derived in the 4S project (either measured or derived from datasheets). Table 1 As modeling language, SystemC is used because of its flexibility and the wide range of abstraction levels covered. It is suitable for high-level interface definition as well as lowlevel (nearly) hardware accurate modeling.
D. Run-time tools
To support run-time adaptive behaviour, trade-offs between different parameter sets should be made to determine the most optimal set for the current situation. In the 4S project we introduce a run-time control system, which is based on a model that selects at run-time a set of parameters that minimizes the cost, while satisfying the requested quality.
The run-time system consists of a collection of tools that are controlled by a distributed operating system called OSYRES [8] . The task of OSYRES is to start a new application graph by allocating processes to processing elements and application channels to NoC links. Finding the right processor for a certain process and finding the appropriate communication path is performed by the spatial mapping tool (SMIT) [9] . Based on the result of SMIT, OSYRES will install the required processes (which might mean reconfiguration or program reloading) and will initiate the right communication mechanisms.
This instantiation of an application is performed when an application is started, however, when certain events happen the mapping might be reconsidered and/or communication links might be rerouted. Events that might trigger a (re-)mapping could be: * the user starts an extra application that needs to be mapped on processing tiles, * the user decides to kill an application which frees its occupied processing tiles, * the QoS of the wireless link might change and therefore extra functionality (e.g. extra filtering) has to be performed that needs extra processing resources, * the user might want to listen to another broadcast station that happens to use another set of parameters, and therefore the baseband processing tasks have to be updated. * on a regular interval the system could do a test whether the current mapping is still sufficiently optimal. Changing of the mapping can mean that an entire application graph needs to be removed and replaced by another graph or that only a single process in a process graph is changed or moved to another tile.
V. FIRST REsuLTs AND CONCLUSION
Currently the first prototype of 4S (BCVP) is operational. The OSYRES operating system is running on the two ARM9 cores. A first FPGA board is also operational. The FPGA board can be used for functional verification of sub-modules (e.g. the Montium core and the NoC), but can also be used as an interface to the PACT/XPP. The Montium is running on the FPGA at 9 MHz, and shows the same results as predicted by the RTL simulation. The specification and design of the HiCVP chip is work in progress, and will be finalized end 2005 .
It is envisaged that in the long run, work performed within this project will lay the foundations for the development of a new range ofultra low-power components, architectures, tools, guidelines and standards that underpins the future development of ambient systems.
