Using the existing microcontrollers in radio frequency identification (RFID)-based passive nodes requires too much power. There is a need to explore specific instruction sets of such microcontrollers based on target applications for low power. In this context, this paper describes a low-power wireless distributed single instruction multiple data (SIMD) architecture concept. This concept involves having the data and program instructions stored on a powered interrogator providing wireless supervisory control for the passive node that has a basic processing core called the remote execution unit (REU). This paper also proposes a novel REU architecture using a minimal instruction set architecture (MISA) based on 8051 µC with the goal of reducing power consumption of a wireless passive node. Post-layout simulation results indicate significant reduction in power consumption and the area occupied by REU when compared to an existing 8051 core design. This low-power programmable REU architecture targets internet of things (IoT) and passive RFID applications.
Introduction
Internet of things (IoT) is an emerging information technology revolution for enhancing ubiquitous computing and networking. IoT enables expansion of new forms of communication from human-human to human-thing to thing-thing [also known as machine-to-machine (M2M)] (Tan and Wang, 2010) . Radio frequency identification (RFID) and sensing technologies form an integral part of IoT network allowing an intelligent way of identification, tracking, monitoring and managing things (Yan and Huang, 2008; Meng and Jin, 2011) . All forms of external sensory data are collected, processed and transmitted to internet through various network interfaces. A basic framework of the IoT is shown in Figure 1 (Tan and Wang, 2010) . This is an architecture that exploits integration of internet, associated communication networks and edge technologies such as RFID for interfacing the physical world. The RFID-based interrogator acts as a gateway device that collects object-connected data from passive RFID data carrier nodes. This information channel provides reliable and relatively detailed information at any time and place to host system or any other information management system connected to the internet for processing. This combination of IoT with RFID sensing delivers a potential wide variety of applications in many areas such as industrial control and supply chain management, consumer electronics, home automation, traffic management, space exploration, etc.
The terminal nodes or devices of the IoT network such as sensors, RFID tags, data processing and communication circuits play an important role in providing highly effective IoT services. The wireless sensor network (WSN) is an important component of the IoT network that senses or measures physical phenomenon related data and transfers this data to the sink node or interrogator. These conventional sensor nodes rely on the limited lifetime of their batteries and thus form a disposable system (Chiang et al., 2004) . High maintenance cost of replacing batteries of such numerous nodes especially in hard-to-service areas is a major challenge. Hence, the energy management of the terminal node of the IoT is an important factor in extending the lifetime of the sensor network. Wireless passive sensor network (WPSN) is a system of battery-free sensor nodes remotely powered by an RF source Fernandes et al., 2011; Isik and Akan, 2009; Bereketli and Akan, 2009) . WPSN is a cost efficient and non-disposable system that operates on the received incoming power and is based on the passive RFID concepts especially with regard to the energy harvesting techniques Nintanavongsa et al., 2012) . Figure 2 presents basic blocks of RF-based wireless passive sensor node architecture that consist of a sensing unit, a communication unit, a processing unit and a power source . One of the most important differences in node architecture of a WPSN and a WSN is the power unit. Power unit of the WSN is typically the battery and its related circuitry whereas power unit of the WPSN is basically an RF-to-DC converter-capacitor network ). This converted DC power is used to operate the node or is kept in a charge capacitor for future usage. WPSN with effective energy management of the node will further empower the IoT to be effectively applicable to a much wider application space. Table 1 presents a current overview of the power consumption of various types of RFID-based passive nodes that also include the WPSN nodes. The RFID tag-based digital processor design reported in Man et al. (2007) and Yang et al. (2010) is a conventional fixed function IC, implemented as a non-programmable state machine that responds with a hard-coded ID when queried by the interrogator. In Cho et al. (2005) and Yin et al. (2010) , the sensor integrated passive RFID tag has a fixed ID assigned to each sensor in order to support maintenance and field deployment of many sensors. The associated digital processor does not support any arbitrary computation and typically reports sensed data in addition to the RFID tag functionality. There has been a lot of research work with regard to reducing the power consumption of such nodes (Man et al., 2007; Yang et al., 2010; Sai et al., 2012a Sai et al., , 2012c Sai et al., , 2010d Sai et al., , 2012e, 2010 . Wireless identification and sensing platform (WISP) is a battery-free sensing and computation platform that uses a low power full programmable microcontroller that enhances the functionality of the RFID tag-based sensing (Joshua et al., 2006; Alanson et al., 2007) . In such platforms, the wireless passive RFID sensing design is compliant with the ultra-high frequency (UHF) RFID interrogator. Table 1 clearly illustrates the significant increase in power consumption from a typical RFID passive tag to the enhanced passive RFID sensing platforms. The RFID-based sensing platforms are known to use programmable microcontrollers for managing the passive sensor node (Joshua et al., 2006; Alanson et al., 2007) . But the use of such microcontrollers is known to consume significant amounts of power especially in the context of passive sensing. Hence, exploration of application-based microcontroller implementations that rely on a reduced instruction set architecture (ISA) is necessary. The microcontroller/processor design can be tailor-made for the specific target application to further reduce the power requirements at the node (Sai et al., 2012b; Sai, 2013) . A significant contribution towards achieving a low power sensor node processor is introduced in this paper. This research work highlights the importance of customisation of a processor based on its reduced ISA for low power. This paper is organised as follows. Section 2 describes a wireless distributed single instruction multiple data (SIMD) processor architecture concept. Section 3 introduces an 8051 minimal instruction set architecture (MISA)-based REU design. Section 4 presents the high-level computer-aided design (CAD) flow and post-layout implementation details of the REU design. This section also includes comparison of the REU and an 8051 core design with respect to chip area and power consumption. Finally, Section 5 presents the conclusions and further scope of this research.
Wireless distributed SIMD processor architecture concept
SIMDs are known to possess the ability to perform the same instruction on multiple data simultaneously for processors with multiple processing units. Applications where the same value is operated on a large number of data points can take advantage of SIMD architecture. Due to the higher level of parallelism available in SIMD architectures, instructions can be applied to all of the data in the processing units within single operation. Instead of only minimising the energy usage of a conventional processor design, its very design will be geographically distributed over multiple passive remote units for low power processing.
A typical RFID interrogator wirelessly transmits commands to the remote passive RFID sensing tag, which then executes these commands and responds back to the interrogator (Dontharaju et al., 2007) . Passive RFID tags use CMOS chip to provide logic to respond to commands from an interrogator. The commands from the interrogator can be viewed as instructions issued to a digital computer. Thus, the interrogator and the tag combination can be viewed as a complete processor or as multiple processing units. This forms the basis of the distributed concept. The low-power distributed architecture concept consists of splitting the architecture into two basic design blocks (active and passive) to support multiple remote passive processors with wireless reconfigurability in the form of the SIMD architecture. The active block acts as the central controller for the entire system. Each of the multiple passive processors is called as wireless nodes (WN) as shown in Figure 3 . The WN wirelessly executes instructions issued by the active block. As RFID offers exciting solutions since its design is distributed in nature, the data exchange between the active block and the WN is through RF signals. This system is thus viewed as a wireless SIMD architecture (Sai et al., 2012d) . In an RFID-based sensing system setup, the active block acts as the interrogator and the WN acts as the passive tag.
A conventional processor is basically distributed into the above-mentioned two blocks to support multiple remote passive processors with wireless reconfigurability. Conventional processors have its basic blocks such as control, memory and ALU connected and hard wired as a single processing unit. The conventional processor architecture is distributed into design blocks of the SIMD architecture namely the active and WN block. The active block contains major components such as the control and storage units that are larger in size and/or consume a considerable amount of power (for example: controller, RAM, ROM, etc.) and as the name suggests is always connected to the power supply. Due to the availability of continuous power supply, design of the active block is allowed the flexibility to overall be a classical von Neumann or Harvard type architecture. Commands are stored on this block that transmits the commands wirelessly to the passive block. The core of the passive block consists of a digital processor that is represented as the remote execution unit (REU). The intent is to keep the REU design as simple as possible so as to maintain low power requirements and any unnecessary complexity on the passive REU is moved onto the active powered block. In other words, the active block is an RF equipped control and storage block and the WN core block is a MISA-based REU with minimal storage capacity. In this scenario, the program to be executed by the REU is stored in active block and the commands are transmitted to the REU one at a time.
The main focus of the paper is the design and implementation of an 8051-MISA-based REU. This paper also introduces the associated elements and concepts of the proposed REU design that operates remotely and wirelessly from the interrogator.
Proposed REU
The small form factor, low-power budget and real-time requirements are the major characterisation factors of a passive REU. The choice of using an 8051 ISA is the fact that it is still one of the most popular embedded processors. This research provides design space exploration and an 8051-ISA tailored for low power applications.
8051-MISA for REU
The active block or in other words the interrogator transmits program instructions to the REU that executes instructions and returns the results back to the interrogator. The REU, for example, has the capability to perform simple arithmetic and logical functions like XOR (exclusive-or), ADD (addition), etc., that are compatible with the 8051. This interrogator and the REU together form a complete processor. Table 2 REU-8051 instruction subset (MISA) Table 2 ) as part of the REU design. The most demanding instructions like the DIV (divide), MUL (multiply) and DA (decimal adjust) is not included in the MISA keeping the REU as minimal as possible, but form a part of the interrogator's instruction set. The MISA can be further enhanced based on a case-by-case requirement for any chosen target application during the REU design process. Table 3 contains the notes for data addressing mnemonics for the 8051-instruction set used in illustrations in this section. For instance, an ADD operation: A = A + R 1 , where A, R 1 denotes registers in the REU. The interrogator sends out the A, R 1 values to load and store it in the temporary storage on the REU. Then, the ADD operation is performed by the REU and the computed result is sent back to the interrogator on request. The interrogator contains main memory that acts as the major storage area for large data items. The temporary storage on the REU is just enough to support a basic set of instructions such as load, store and in this example the ADD operation. A possible storage unit may have nine 8-bit registers representing the eight working registers (R 0 -R 7 ) and an A register. Table 3 REU-8051 data mnemonics R n Working registers (R 0 -R 7 ) #data 8-bit constant embedded in instruction A Accumulator
The two instructions highlighted only in bold (not bold-italic) illustrated in Table II form the set of instructions that have been modified to suit the REU requirements. A modified instruction with respect to an 8051 typical functionality is the MOVX (move data) instruction (MOVX at R i , A). The 8-bit instruction opcode used for this MOVX instruction is '11110010'. The functionality of the instruction was modified to suit the existing target REU core design. Upon the execution of this instruction, data available in the accumulator register is transferred to a destination register. The destination register is used to hold data that is transmitted out of the REU based on the request from the interrogator. The other modified instruction is the NOP (no operation) that can be possibly used as an external reset sent by the interrogator to the REU in order to clear up all the data from the previous set of instructions. Thus a selected set of instructions that amounts to less than half of the instructions usually supported by an 8051 contributes towards significantly reducing the power consumption of the system.
REU architecture
A high-level view of the architecture of the REU design is shown in Figure 4 . The REU architecture mainly consists of three blocks, namely, a controller, ALU and register file. The decoded input instruction frame generated from a frontend block of a node acts as an input to REU as shown in Figure 4 . The opcode is an 8-bit 8051-instruction opcode that generally includes a source register and the destination register. The 116-instruction built-in REU supports both 8-bit and 16-bit variable length 8051-instructions. The 16-bit 8051-instructions commonly has an additional 8-bit data that is represented as the data_in input port in Figure 4 . The ALU unit is basically responsible for arithmetic and logic operations on 8-bit operands and each of which is implemented as a combinational block. The register file is also implemented as a sequential block that acts as a temporary data memory, which is triggered by a clock signal. The register file consists of nine 8-bit registers that represent the eight working registers (R 0 -R 7 ) and an A register. The controller is modelled behaviourally as a sequential logic block based on a set of states for every instruction. Each state is triggered by the rising edge of the received clock signal. Under each state, a group of signals is either set or reset corresponding to the received instruction. Table 4 represents the description of each set of signals connected internally or externally to/from controller, ALU and the register file. It should be noted that the ALU computation is implemented as a combinational logic and executed in one cycle, but the initial and the final set of cycles are essential to set/reset signals of the REU and/or a potential frontend block typically used in passive nodes which are necessary at the start/end of every operation. acc_data (8) Accumulator data to be read is stored on to this register for an ALU operation when acc_rd is set.
src_cy (1) Carry bit necessary for ALU computation reg_data (1) One of the 8-register data to be read is stored on to this register for an ALU operation when reg_rd is set.
Computed ALU result is placed on to this register which later is to be stored into the accumulator or the registers of the register bank based on whether acc_wr or reg_wr is set. The main power reduction is the customisation of the REU ISA implementation targeting low power applications. As the program to be executed by the REU is stored in the interrogator side, the need for program memory at the REU is eliminated. There still may be a need for local scratch pad memory at the REU although the number of bytes is drastically reduced in order to satisfy the power requirements. The REU executes the commands wirelessly issued by the interrogator. The MISA chosen for the REU consists of about 116 instructions compatible with the 8051 ISA. The choice of MISA relies on set of instructions dependent on the nine 8-bit register (R 0 -R 7 and/or A)-based operations. The other tunable low power factors for the REU design is based on using a low operating clock frequency and allowing the application of clock gating techniques (ctr signal in Figure 4 can be used for effective clock gating for the REU).
REU implementation and results

REU design implementation using CAD tools
The REU architecture introduced in the previous section consists of three major modules: ALU, register file and a controller. Each logical module was modelled using very-high-speed integrated circuits hardware descriptor language (VHDL), an electronic design automation-based descriptor language. Each module was first simulated and verified independently using Mentor graphic's ModelSim tool. Upon successful individual verifications, the modules were combined to form the final REU design, which was again verified for the overall expected operation. On a successful verification of the entire VHDL-based REU file, a synthesised net-list for the design was generated using synopsys design compiler. The 'dc_shell' command interface provides a script execution environment based on tool command language (TCL), which includes setup environment variables, constraints, etc. The main parameter for synthesis is the specification of the clock frequency in the TCL script. During the synthesis process, the design compiler, after executing the related TCL script, reads in the synthesisable REU VHDL file and generates a synthesised cell-level net-list in Verilog along with an necessary timing constraint file called the synopsys design constraints (SDC) timing file. This generated net-list Verilog file was compiled, simulated and verified along with the target library using ModelSim. This design has been successfully synthesised using a target 45 nm PTM technology for a supply voltage of 1.1 V (Arizona State University, Predictive Technology Model (PTM)).
Cadence encounter tool was used to perform a physical place and route of the obtained design net-list of standard cells taking the SDC timing file into consideration. At the end of the final place and route REU layout process, a net-list file and a SDF (standard delay format) file was generated. This net-list file was simulated and verified for the expected functional operation using ModelSim along with the SDF file. The REU final post-layout design was successfully generated and verified for the expected operation.
Comparisons and results
The current 8051 models used in wireless biomedical sensor applications; embedded systems, etc., typically run at clock frequencies of 50 MHz or greater (Li et al., 2009; Saponara et al., 2004; Iozzi et al., 2005; Saponara et al., 2007 ). An existing 8051 microcontroller core model (Oregano Systems, MC8051 IP Core User Guide (Version 1.3)) was identified and was used as a reference model for comparison with the REU. This model and its derivatives are used in wireless sensor applications (Li et al., 2009 ). This 8051 core is a fully synchronous design compatible with the Intel 8051 µC. This architecture has a higher performance average compared to the traditional one as it executes most of the instructions in one clock cycle.
8051 core model consists of four major blocks: ALU, control unit, serial interface unit and timer-counter. This 8051-core model has been debugged for a successful compilation of the entire design. This synthesised core design and its corresponding layout have been successfully generated for a target 45 nm PTM technology. The synthesised REU design and its corresponding layout have also been successfully generated for the same target technology, libraries and supply voltage and making a good case for an accurate power comparison. Figure 5 summarises the total power consumption values for the 8051 core and the REU at 1 MHZ, 10 MHZ, 30 MHZ, 60 MHZ and 80 MHZ clock frequencies respectively and reports a comparison graph for direct data visualisation. These power values were generated by cadence encounter power option based on the input target clock frequency and a supply voltage of 1.1 V for both these designs, which were synthesised for a target 45 nm PTM technology. It should be noted that the reported 8051 core power values do not include the power consumption of ROM, internal or external RAM and the REU includes a minimal scratch pad memory of nine 8-bit registers. The area dimension without the pads of the 8051 core and REU layout as estimated by cadence encounter is 48,174 µm 2 and about 7,917 µm 2 (91 × 87) respectively. Figure 6 shows the layout of the REU design with marked dimensions.
It can be clearly seen from Figure 5 that as the frequency increases, total power consumption for both the models increases. As the frequency increases, the dynamic power consumption effectively increases in turn contributing to the overall increase in the total power consumption. There is about 79 % decrease in the leakage power consumption of the REU as compared to the 8051 core design over the 1 MHZ -80 MHZ frequency range. The total power consumption of the proposed REU design is about 78% lower and its occupied core area is 84% lesser when compared to an existing 8051µC core, both implemented using the same technology and libraries. The small-area and low-power consumption of the REU as compared to the 8051 core is mainly due to the fact that REU supports less than half the regular 8051 ISA. In addition, power comparison of various low-power 8051 µC implementations has been summarised in Table 5 . It can be seen that the REU design has a much better power saving efficiency when compared to the RFID-based passive node microcontrollers as well as the 8051 µC used in biomedical sensor nodes as shown in Tables 1 and 5 . The minimum clock frequency acceptable for a target application can be further used to optimise the REU design. (Saponara et al., 2004 (Saponara et al., , 2007 Iozzi et al., 2005) ] 
Conclusions
In summary, this paper provides the basis for a low power distributed wireless SIMD architecture concept whose instructions are based on the 8051-MISA. The proposed low power 8051-MISA-based REU has been implemented using state of the art CAD tools. The post-layout REU design results have shown significant reduction in power consumption and area occupied with respect to an 8051 core typically used in biomedical sensor applications. The incorporation of sensors with the proposed REU has the potential to lower the deployment cost enabling sensor networks to be deployed in RFID and IoT-based applications where they were previously too costly to develop a viable solution. Such a device will contribute to the continued development of the IoT, RFID and M2M.
