Abstract
Introduction
Sensor network applications include environmental monitoring, structural sensing, battlefield communication, traffic, health, security monitoring, and other automation techniques. Consequently, sensor network research involves architecture, application optimization, communication protocol design, and developing efficient communication hardware. Small form factor, low-power budget, low-resource availability, and real-time requirements are some of the characterizing factors of sensor nodes. Each of the above is an additional constraint imposed on the designers of such networks. Given these various sites of improvement, in this paper we focus on sensor network applications, their characteristics, and explore ways in which they can influence the design of the underlying architecture.
Knowledge of the underlying hardware aids in efficient software development. A top-down approach towards software development may be well suited in scenarios where the processor architecture is already well defined and scope for optimization, if any, is just on the software front. With sensor applications and sensor network research, however, the scenario is quite different. Sensor applications require a special purpose hardware suitable to cater to a different set of requirement. Medical applications for example, need highly non-intrusive tiny sensors which are usually harmless to the human body as foreign-bodies . Whereas sensors spread on a military terrain have to be more tolerant to physical impacts and wide operating temperature range. These physical characteristics in which the sensors are placed and the difference in their utility makes it important to do research on sensor networks on a case by case basis.
Unique set of findings for a sensor application is good, but the sheer number of applications sensor networks are finding today rules out the feasibility of developing processors unique to each and every application. This brings us to a point where we will have to trade between the amount of customization available on a processor and the performancecost benefits one would like to achieve. We believe that this aspect needs to reflect on the research focus in this area. The major contributions of this paper are a) To study some of the most important applications for sensor networks, including those in TinyBench [8] and SenseBench [27] . This includes a careful profiling of application behavior and its microarchitectural implications. b) To characterize the workload that is prominently visible among sensor network applications. c) To characterize the workload that is unique to a class of sensor network applications. d) To propose optimizations to existing sensor network architectures based on the observations made in (a,b,c).
In order for the characterization to be useful, the set of applications used for the purpose should be representative of the domain being analyzed. To this end, we present a thorough survey on sensor network applications, classify them based on the core functionality an application serves, and characterize each class independently.
Microarchitectural characteristics of programs serves as an important aspect in making architectural design decisions. The code size of an application influences critical decisions made based on the footprint of a program. The execution pattern and the bottleneck region optimizations improve the response time. The memory access patterns (both spatial and temporal) provide avenues for various memory placement and memory design related optimizations. Studying the composition of the dynamic instruction stream aids in instruction set architecture optimization. Dynamic instruction execution sequences help architects not only understand the behavior of a program but also help in improving functional unit design to reduce the overall area or even increase transistor utility.
In this paper, we architecturally characterize all sensor network applications and compare the results for different class of sensor network applications. First, we collect all representative set of applications from TinyOS benchmarks [10] , TinyBench [8] , and from [28] and we build some of our own simple applications. We then classify them into various classes of applications to compare the architectural findings for different classes. Specifically, we study the following architectural characterizations and optimizations.
• Find the most frequently executed instructions • Find the most frequently executed pair and triple of instructions • Instruction-set and footprint optimization by combining frequently executed pair of instructions • Memory behavior of the applications
Related Works
Wireless sensor network (WSN) has identified its applications in many disciplines including environmental engineering, military/security applications, and civil engineering. The main challenge is to make the sensor node a low-energy device so that it can scavenge energy from various sources. From an architectural perspective, the processor used inside should be designed to consume less energy while coordinating with other components in a manner which minimizes the total energy consumption. While initial microprocessors used in a Mote are of ATMEL (AVR) family, they were synchronous processors and not specifically designed for the sensor node applications. Past research [5, 9] have shown that an asynchronous processor design is the ideal choice for microprocessor to save energy, whereas in synchronous design energy could be wasted in clocking the synchronous processor and other components. ARM cores and variations of ARM cores such as StrongARM and XScale have also been used to see the energy-performance tradeoffs for sensor node applications.
Ekanayake et al. [5] have designed a low-energy asynchronous processor that only takes 24 pJ/instruction, whereas ATMel or ARM family processors takes energy in the order of nJ/instruction. They design a new ISA, new coprocessors which includes timers, radio units, and processor core for low energy design. But they do not provide any motivation or reasoning behind selecting this instruction set that could help the architecture community to understand the ISA design better in tandem with sensor network benchmarks. Similarly, Hemstead et al. [9] have also designed an event processor along with some hardware accelerators to improve the performance and energy consumption. Nazhandali et al. [28] have designed a sub-threshold sensor network processor that can run at very low voltage and hence at very low frequency. This sensor network processor is a CISC architecture and it consumes 1.6 pJ/instruction.
While these processors are very well optimized for all sensor network applications, some key insights on designing a particular ISA for these architectures would be very helpful. Some past benchmarks such as MediaBench [25] designed for multimedia applications, NetBench [26] , and MiBench [7] for embedded applications, do exhibit an architectural characterization and provide some insight/platform to build an optimized architecture. Ideally, ISA design or other components design should result from the characterization of all sensor network applications on a base processor. There are two benchmarks for sensor network applications TinyBench [8] and SenseBench [27] for this purpose. In TinyBench, all the applications are targeted for TinyOS and does not scale well for a general study of architecture exploration. While SenseBench provides a set of generalized benchmarks, it does not cover all the applications and the architectural characterization is limited to code size, energy per benchmark and real-time performance requirement. Instead we make our benchmark more comprehensive by extensively scanning research literature and also performing a large set of architectural characterization. Architectural characterization will also vary from different class of sensor node applications as we move from security/military applications to environmental/structural monitoring applications where the computational requirements are different. Therefore, we present a complete architectural characterization of the all sensor network applications (which we call WiSeNBench) and we then group them into different classes of applications and compare our findings for these different classes of applications.
WiSeNBench: Wireless Sensor Network Benchmark
In this section, we describe our benchmark suite WiSeNBench in detail. WiSeNBench consists of a large spectrum of sensor network applications and core algorithms that are mainly used inside sensor network applications. Identifying and collecting this set of applications required non-trivial efforts due to a plethora of wireless sensor network applications and many more applications which are not yet explored. To also make sure that these applications cover many different classes of applications, we had to rigorously scan the research literature in different domains of research in wireless sensor network. Specifically, we look for various cryptographic applications [24] , security protocols [29] , digital signal processing (DSP) applications [3] , hashing techniques [36] , message digest [31] , random number generator [33] , compression techniques [30] , routing [4] , applications related to computational geometry [23] , some basic algorithms [27] , and many pertinent survey papers [1, 22] .
Based on this study, we identify the potential applications that will run on the sensor network processor and collect the optimized code for these applications. The main problem in the collection phase was to find the optimized code for one generic language instead of code written in a specialized language (such as nesC [6] ) or targeted for a very specific architecture [8] . While we could find the optimized code scans the data and then stores the data with associated count. Hash algorithms [19] A set of hash algorithms to produce a fix-length data for indexing and better search. Bloom filter [18] Bloom filter consists of a set of hash algorithms and a hash table to resolve containment queries. MD5 [31] Message digest algorithm 5 (MD5) is a powerful hash function to create a 128-bit key for integrity checking. SMAC [37] S-MAC is an energy-efficient medium access control (MAC) protocol. Routing/Radio Ad-hoc routing [34] A routing technique in a distributed multihop wireless network with a shared wireless channel. EnergyEff routing [32] Its an unidirectional level-hop routing algorithm to assign each intermediate nodes a level to reach the sink node. RC5 [16] RC5 is a fast block cipher for RSA data security and has variable key size and rounds. We consider both encryption and decryption in this case. TEA [20] Tiny encryption algorithm (TEA) is a block cipher which is very simple to design and code size is also very small. Cryptography/ Crypto [13] Crypto3 is a cryptographic technique to encrypt password in Security an Unix based system. RC6 [17] RC6 is an advanced version of RC5 for data security. SPINS [29] SPINS is a security protocol for sensor network which has two components (1) SNEP (2) TESLA Voronoi diagrams [12] Voronoi diagrams is a special decompostion of metric space using a set of distinct points.
Computational Delaunay triangulation [11]
Delaunay triangulation for a set of points is the triangulation of geometry points with some specific property. Localization [21] Localization algorithms for sensor node to approximate its position. for many applications (about 70% of those in the suite), we also develop optimized code from scratch for various applications (about 30%) for which we could not find optimized code written in a generic language (such as C). After the identification and collection phase, we accomplish a good representative set of a wide variety of sensor network applications. To characterize it better, we categorize these benchmarks into various classes: 1) Compression 2) Routing 3) Security 4) Computational geometry 5) DSP 6) Basic algorithms. Table 1 shows different classes of benchmarks with a brief description and the reference if applicable.
Although all the benchmarks can be run as a separate standalone binary, we preferred to combine all the applications into one single binary and create a unified framework. The motivation behind creating a this framework is to enable centralized control of inputs to these benchmarks, a simpler simulation platform, and easier statistics collection.
Experimental Setup
In this section, we explain our experimental setup which includes the compilation of the unified benchmark, simulation, and the statistics collection. We use the ARM SimpleScalar simulator [35] , which was extended from the original SimpleScalar simulator [2] . We setup cross-gcc suites for ARM processor to compile the benchmark and make a single static binary as SimpleScalar ARM would not handle the dynamic binary file with shared object files. Since we are only interested in the architecture of a very simple RISC processor with no out-of-order execution, and no caches, we conservatively use sim-safe simulator for our experimentation. We modify the code of sim-safe simulator to extract the related statistics which is explained in detail below along with other implementation issues.
Using the gcc cross-compilation suites we first create the binary and then make sure that we collect statistics based on various functions in the binary . Although Some benchmarks use multiple functions, we combined the results of all related functions. We use ARM disassembler in binutils toolset to disassemble the generated binary and feed this disassembled file to a PERL script which parses the disassembled file and generates a C header file containing a large structure with all the function initialization. Each function initialization mainly consists of following three entries: <function-name,start-pc,end-pc>. This header file is used with the simsafe simulator. Since we intend to do architectural characterization based on each function, we identify every function range in the sim-safe execution loop and do relevant processing required to collect the statistics. At the end of simulation, the simulator prints all the related statistics to a file. Specifically, we consider the following statistics:
• Codesize -This is the footprint of the application and a crucial factor in the design of a resource constrained sensor node processor.
• Memory accesses -We characterize the memory access behavior by finding out number of load/stores instructions executed by a particular application.
• Loads -Percentage of loads in the memory accesses is another important factor in terms of energy consumption for sensor network device.
• Frequent instructions -To make some instructions powerconscious during their execution, it is important to understand the distribution of frequently executed instruction. Architects can optimize these instructions for energy savings and performance.
• Frequent pair of instructions -We also find the frequent pair of instructions while executing a specific function. This will give us an idea about the quantitative improvement we can achieve in energy savings and performance. This can also be done statically to improve the code size by combining frequent instruction pairs into a single instruction.
Results
Having described the complete benchmark suite in Section 3 and the experimental setup in Section 4, we now present some characterization results based on the parameters discussed in the previous section.
CodeSize
Codesize or the footprint of a program is a very important parameter as it directly signifies the amount of memory required for a particular application. We statically compute the codesize for ARM ISA and present the results in Figure 1 . We can see that most of the DSP applications have much larger code size, whereas most of other applications have code size less than 500 bytes except MD5 and RC5.
Memory Accesses
A whole host of program optimization techniques are aided by the knowledge of memory access behavior of a program. We compute the total memory accesses to signify the memory intensive behavior of each benchmark. We present the results as percentage of memory accesses (load/stores) as a percentage of total executed instructions. Figure 2 shows the percentage of memory accesses for all of the benchmarks.
We can see that most of applications have 40-60% memory accesses, while some of the applications such as DSP applications, Fibonacci numbers accesses memory through more than 60% of the executed instructions.
Load accesses
To characterize the memory accesses further, we calculate the percentage of loads in the memory accesses. As we can clearly see from the Figure 3 , percentage of loads is larger than 50% for almost all applications which signifies that there are more loads than stores (which is also intuitive). We also find that basic algorithms have higher percentage of loads (specially sorting algorithms), whereas for DSP applications loads and stores are almost evenly distributed.
Frequent instructions
We find the frequent instructions for each application in WiSeNBench to get an idea if any particular instruction is suitable for optimization in terms of energy or performance. Although we gather results for all the benchmarks, due to space constraints we only present the results for two applications. These two applications represent a class of applications: 1) TEA -from cryptographic class 2) FIR -DSP class. We present the frequent instructions as the percentage of total instructions. For TEA, the results are shown in the graph on the left in Figure 4 and we find that almost 7 frequent instructions account for 95% of instructions, with load and add instructions at the top of the list. Similar results are presented for FIR application in graph on the right in Figure 4 . We find that for FIR applications instructions are widely distributed with load and branch instructions at the top of the list with about 28% and 10% respectively of total instructions .
Frequent pairs of instructions
We also find the frequent pairs of instructions to further characterize the application behavior to seek any possible optimization of combining frequent pair of instructions. Although there is a tradeoff in combining the instructions, it can certainly result in lower code size, possibly lesser energy consumption and possibly improved performance. Once again, for this analysis we present the results for only TEA and FIR applications and they are shown in Figure 5 . We see a very similar behavior found in frequent instructions analysis. For TEA applications, frequent pairs are attributed to a small number of pairs, whereas for FIR applications it is distributed to many pairs of instructions. Interestingly, the 2nd and 3rd frequent pairs < mov, load > and < load, mov > are same in both FIR and TEA applications. In TEA, we find that < add, load > instruction pair is found to be more frequent (may be due to array accesses), whereas in FIR < load, br > is found to be more frequent at about 9%.
Conclusion
Sensor networks, though classified under one umbrella, have varied requirements and utilities. To efficiently design protocols, architectures, and applications, it is important to characterize the applications and categorize them based on their effects at the microarchitectural-level. We present a new set of comprehensive benchmarks called the WiSeNBench in a unified framework. We show that WiSeNBench effectively characterizes the myriad application set sensor network often deal with and provides insights into the behavior of each of these. Architectural characterization was performed using ARM SimpleScalar Simulator. We find that the code size of MD5, RC5 and DSP applications is larger compared to other categories. On the instruction stream composition front, we find that the set of basic algorithms execute larger percentage of loads and that the DSP applications. Also, we believe that there is a potential for further research on ISA design based on the results presented on frequent instructions and frequent pairs of instructions. 
