Abstract-Healthcare solutions through the introduction of wearable healthcare devices are benefitting from Internet of Things technology. Though these small form-factor wearable devices promise great benefits, guaranteeing long device operating lifetime is yet the biggest challenge due to high-energy consumption. In this paper, a reduced hardware architecture system-on-chip targeting digital block design was proposed higher energy efficiency. The design has been verified by synthesizing into FPGA and implemented in silicon based on Silterra 180nm process. Results show that the proposed design achieved reduction up to 24% of leakage power and 15% of dynamic power reduction over reference design. In addition, 24.3% of excessive area was reduced by using the proposed reduced hardware architecture technique.
INTRODUCTION
The emerging wireless sensor network technologies are now leading the trend of Internet of Things (loT), where the uniquely identifiable objects, devices, and things can be connected with each other through the internet [1] . loT technology have been adopted in environmental monitoring, energy and infrastructure management, building and home automation, transport systems, medical and healthcare systems and others [2] .
Wearable healthcare device typically an loT sensor node, enable continuous monitoring, actuation, and logging of patient bio-signal data, which can help medical personnel to diagnose, prevent, and respond to various illnesses [32] . Though these loT sensor nodes show great benefits, challenges remain including small form factor, affordable cost and the most critical issue is node's operating lifetime due to energy supply limitation from battery. Typically in loT configuration, data from the nodes are immediately send to loT gateway devices wirelessly, where gateway devices aggregate data from multiple nodes and process these data before sending to the cloud through network means [3] .
Different loT sensor node platforms are available, developed either for commercial or academia purposes. For instances, commercial node platforms include A VR Raven from Atmel, Adruino BT from Adruino, Shimmer from Intel [4] . In the academia research, wireless node platforms include This paper proposed a reduced hardware architecture system-on-chip (SoC) targeting sensor node's digital block in order to achieve energy efficiency on the loT healthcare sensor node. From our observation, majority of the healthcare application uses limited number of of-the-shelves sensors and connected to the nodes using mUlti-point serial communication bus. In addition, limited number of general-purpose 1I0s (GPIOs) and peripherals are utilized. Thus, by carefully reducing the peripherals and GPIOs not to affect the flexibility to configure the sensor nodes for different health care applications, the new reduced hardware architecture can reduce the sensor node's power consumption, area as well as cost.
The paper is organized as follows. Section II explained the design of a reduced hardware architecture SoC loT sensor node. Discussion on power and area comparison between the proposed and reference SoC microcontroller are presented in Section III and conclusion is made in Section IV.
II.

METHODOLOGY
This section explains on standard sensor node architecture and discusses on the hardware architecture consideration to design energy-efficient digital system architecture. The proposed design architecture will then be implemented on FPGA device for functional verification followed by ASIC for silicon implementation.
A. Sensor Node Architecture
Basic hardware modules required for a typical sensor node is shown in Fig.l which were categorized into two blocks, namely digital and analog [15] . Typically in a sensor node, analog blocks like the sensors are located in a multi-sensor board while the wireless transceiver is on the transceiver board [16] . The interface between the wireless transceiver and multi sensors board to the processor in the digital blocks are typically using serial communication namely f e, UART or GPIOs or ADe peripheral modules [16] . Note that the focus of the proposed reduced hardware architecture is on the digital block that includes processor, memory, and peripherals. [15] There are two types of sensor nodes in Wireless Body Sensor Network (WBSN) for medical and healthcare application, namely the sensing node and stimulating node [17] . The sensing node gathers information, process signal, store data and transmit wirelessly. The stimulating node typically used for medical treatments when needed, such as drug delivery and nerve stimulating. Both of the sensor nodes have different operation characteristics, where the sensing node most of the time is in the standby state and will periodically wakes up to perform sensing, minimal data processing, and data transmission as shown in Fig. 2 . Prior work in [17] optimizes these work and standby states to achieve energy-efficient sensor node. In comparison to loT gateways that collect and aggregate data from sensor nodes, perform complex signal processing before transmit to cloud; loT sensor nodes have much lesser work period as shown in Fig. 2 (a) . With work time (TlVork) is much lesser than standby time (Tslandby); the sensor node power consumption largely depends on leakage power consumed during standby state. to Phase I and Phase III. This is due to the sensing node has less data processing activity in which it may either detects any anomaly in sensed data before encapsulate in packet or immediately packet sensed data before transmitting the data wirelessly.
B. Proposed Reduced Hardware Design Architecture
Normal approach for energy saving is turning off the peripheral modules whenever they are not being used. Leakage power still exists although the peripheral modules were being tumed off [33] . The leakage power is due to the current leak through additional decoupling capacitors that are added to alleviate power-gating noise. Leakage power is dominating the total power dissipation and cannot be ignore in nanometer process technologies [33] . Therefore, elimination of the unused digital hardware from the design was proposed in order to cut-off not only dynamic power but also the leakage power. : Fig 3 (a) . Then, the hardware in the shaded blocks such as L-l numbers of UART, M-l numbers of f e, Nnumbers of SPI, K k numbers of GPIO pin are being removed (instead of power gating) from the SoC design as shown in Fig 3 (b) to achieve higher energy efficiency.
2.47%
• Processor .RAM
• Serial Comm.
• Peripherals
Fig. 4 Power Consumption on Reference Design
An example of synthesized reference design, Silicon Labs C8051F38x with 6 timers, 25 GPIO pins, 2 of UARTs, 2 f Cs, and 1 SPI shows that total 30.78% of power is consumed by serial communications, GPIOs and other peripherals with 15.77%, 2.47% and 12.54% respectively in Fig. 4 . This power consumption needed to be addressed as they contribute higher power than processor due to excessive hardware in the system. Fig. 4 shows that RAM has the highest power consumption in the design, because it was designed by using D flip-flop instead of using SRAM due to the less complexity design purpose.
In compare with the power gating technique, which also able to reduce the leakage power of the system, reduced architecture having the benefit of lOO% reduction of leakage power on the unneeded hardware. Besides, reduced architecture also helps in reducing the excessive area instead of adding overhead hardware circuitry (i.e. sleep transistors, header cells, or footer cells, decoupling capacitors) as in power gating technique. Prior experiment in [33] shows that even 100% power gated the area of the active circuit that has uniform current distribution, leakage power of the circuit could reduced by only 92% and area overhead increased by 15%.
However, reduced hardware architecture technique could increase the latency of signal processing for each sensor node. In reduced hardware architecture, multiple sensors are now required to be share on single communication channel ( f C or SPI). Therefore, they could not perform their work at the same time and polling operation is required as shown in Fig. 5 . Assuming the operation time for each sensor is equal and communicates in single 1 2 C channel, Sensor 2 only can operates after Sensor I followed by Sensor 3. This will increase the time taken for signals being fetched and processed in the system from each loT healthcare nodes.
From computing paradigm perspective, reduced architecture approach causes the SoC shifts from general-purpose system 92 into single-purpose (i.e application-specific) system. Limited number of peripherals only can target on certain applications. Thus, flexibility of the system will be affected, and in addition to the system design time for other applications.
C. Reduced Digital Hardware Implementation
In order to find out the digital hardware components essential for different healthcare applications, a review was done on microcontrollers from different companies namely Silicon Labs [18], Microchip Technology [19] , Freescale [20] , Texas Instrument [21] and Atmel [22] with several target applications. Besides, wearable healthcare devices proposed in [23] [24] [25] [26] has been studied as well. Table I Table I shows that a few sensors, indicators or actuators are needed for each applications and most of them uses multi point serial communication, GPIO or ADCs. This finding highlights the potential of having reduced hardware architecture for the sensor node.
Based on the sensor node architecture and survey did on several healthcare nodes, there are few hardware configurations must be considered on the proposed design architecture, which include processor, memory, serial communication, general purpose input/output and analog. The reduced hardware implementation is shown in Fig. 6 .
On-Chip
Off-Chip Data path, number of core, and instruction set architecture must be considered during processor selection. Larger datapath like 16-bits to 32-bits processor normally require real-time operating system which are over-required for sensor node that has less data processing activity [28] . Similarly, work in [29] indicates that multi-core processor consumes too much power for low computation need in biomedical signal processing. Another reference in [30] shows that reduced instruction set computer (RISC) processor have major advantage in power for low performance application as compared to CISC processor.
This work employs a single-core 8-bit RISC processor that is power-efficient and sufficient for low complexity data processing and signal transfer sensor node. An 8051 core with internal 128 bytes RAM for data and 128 bytes of special function register are chosen. An external ROM up to 64kB for storing application program is chosen as for convenient of flashing the program.
2) Serial Communication
Several types of serial communication modules are available such as UART, SPI, fc, USB, etc. Survey results summarized in Table I shows that one UART and two SPI are enough for the single loT health care node. Therefore, one UART was selected as one of the serial communication module. However, our reduced architecture approach lead to the selection of 1 2 C modules due to the fc has lower pin counts than SPI. As 1 2 C supports multi-point communication, one 1 2 C component is adequate.
3) General Purpose Input Output
16 GPIO pins are designed based on the earlier surveys and consideration of pin sharing architecture with other modules, and simple Design-For-Test (DFT) architecture. There are 8 pins needed to be share with inputs and outputs of UART, 12C, external interrupts and timers. Given that all 8 shared pins are used, another 8 pins are still available for connecting other sensors, indicators or actuators. Besides that, all of GPIO pins are multiplex with SFR address and data buses for DFT purposes.
4) Peripherals
The other peripherals included in the design are interrupt, timer and watchdog timer. Interrupt is included for sensor or loT gateway to trigger some event on the sensor node for special purpose such as abnormal data received. Timer was designed for periodical signal sensing, while watchdog timer needed for reset the design during malfunction occurs.
FiQ. 7
System under Test Hardware Connection
Since the proposed design architecture is mainly focused on digital blocks reduction, the analog block like ADC, power management unit, radio transceiver are not being implemented. Analog modules, namely radio transceiver and ADC for sensors can be externally attached to 1 2 C bus of the system.
D. Design Flow
The functional of the proposed reduced SoC sensor node architecture design was implemented in Verilog hardware description language (HDL). The design was then synthesized into Altera Cyclone II FPGA on DE-2 development board for functional verification and fast prototyping. A system under test was specially designed for functional verification as shown in Fig. 7 . Components used on the test system are DS1631 temperature sensor, Arduino Mega 2560 board, and two XBee Pro S2Bs as wireless transceiver. The test system is designed to read temperature data from the sensor and periodically transmit data to the gateway device based on IEEE 802.15.4 standard.
ASIC physical design flow was performed after the RTL design functionally verified on FPGA. The RTL is translated into gate-level netlist before taken through physical design flow. The layout was designed by using Silterra 180nm CMOS general process with six layer of metal. The physical design flow, which consists of floor-planning, placement, clock tree synthesis and routing were done based on timing and power driven. Sign-off validation such as design rule check (DRC), layout versus schematic (L VS), static timing analysis (STA) and gate-level simulation (GLS) were carried out at post layout with the appropriate sign-off tool. Fig. 8 shows the layout of the SoC sensor node using Silterra 180nm Generic Process. There are 64 10 pads included 4 VDD, 4 VSS, 4 10VDD and 4 10VSS. The total chip area is 1.54mm x 1.47mm with 1.04mm x 0.97mm core sizes and the core utilization is 35.6%. Since this is an 110 dominated design, the designed core size can actually be reduced to 0.5mm x 1.0mm. The proposed reduced hardware architecture is compared to commercial microcontroller that has different architecture. Silicon Labs C80S1F38x was chosen as the reference design due to the 80S 1 processor is employed in the design [31] . Table II summarizes the peripherals and sensor interface of the proposed and C80S1F38x designs. Note that in the proposed design, the numbers of digital components are much less than the C80S1F38x one UART and 1 2 C, and a total of 16 GPIOs. Power and area comparison was done on the reduced hardware architecture and C80S1F38x architecture. In order to have a fair comparison between both designs, the commercial microcontroller is synthesized by using the same process technology. In terms of area, Silicon Labs C80S1F38x is 24.3% larger in area (S.3S x10 5 um 2 ) as compared to the proposed design (4.0S x10 5 um 2 ) due to the more number of peripherals. Fig. 9 described the power consumption comparison between proposed and C80S1F38x architecture in three different configured applications; namely full utilization of blocks, pulse oximeter and fall detection systems. In an application where all the digital components are fully utilized, the proposed design dissipated around lS% less dynamic power and 24% less leakage power than C80S1F38x.
For the pulse oximeter application, the proposed design dissipated around 1.34% less dynamic power and 24% less leakage power as compared to C80S1F38x. Pulse oximeter application requires 1 UART, 1 SPIII 2 C, 1 ADC and 2 GPIOs, thus the proposed design switch off the timers and 14 GPIOs. To meet the application requirement, an external ADC chip is attached to the e C bus in our proposed design.
While in the fall detection application, the proposed design dissipated almost similar dynamic power with C80S1F38x however achieves 24% less leakage power. Motion capture application requires 1 UART, 2 SPII e C, and 1 ADC only. Our proposed design uses the e C bus as to attach the additional device at the expense of slight degradation in overall device speed.
In comparing the reference and proposed design for three applications in Fig 9, observed that total power consumption for processor and memory are constants, but the total power consumption for serial communication, GPIO and other peripherals were reduced. Total power saving for applications I, II, and III due to the power reduction on the hardware mention previously are 14.77%, 2.77% and O.l S% respectively. An reduced digital hardware architecture SoC design targeted to health care application are proposed in this paper mainly to reduce power consumption for the loT sensor nodes. The proposed SoC achieves energy-efficiency as compared to reference design due to the removing of unnecessary peripheral modules for the individual healthcare loT node. Results show that unused peripherals and components in the SoC lead to excessive power consumption especially leakage power in addition to the chip area. It is clear that the proposed reduced architecture can prolongs battery life and reduces chip size in loT healthcare nodes.
Research will be extended on different hardware used to observe the energy efficiency of the system, such as replacing the 1 2 C by SPI. Besides, essential hardware blocks namely hardware accelerator might be added instead of reduced, as the loT nodes trending towards signal processing intensive shown in some recent research works.
