Emerging embedded nonvolatile memory solution for ultra low power microcontroller systems by 林越 正紀 & Hayashikoshi Masanori
博 ⼠ 論 ⽂ 
Emerging embedded nonvolatile memory 









2018 年 9 ⽉ 

iAbstract 
This thesis reports emerging embedded nonvolatile memory solution for ultra low power 
microcontroller systems. 
Many semiconductor devices are used nowadays. In case of microcontrollers, that with 
embedded flash memory have become the mainstream and that volume is 70% of all market of 
microcontroller and the average growth rate of the market is about 16%, compared to the growth 
rate of all microcontroller market. Thus, the non-volatile memory devices typified by Flash 
memory are core technologies of all industries. Currently, Flash-MCU has become the 
mainstream of the microcontrollers. In the future, in which the semiconductor miniaturization, 
and the greening of society advance, the conventional embedded memory (SRAM, Flash 
memory) has the technology limitation such as the leakage current problem with the 
miniaturization of process technology, the next-generation nonvolatile memory (NVRAM) is 
expected. This study is intended to overcome the challenge of embedded the nonvolatile memory 
and investigate that solution for ultra-low power microcontroller systems. 
For microcontroller systems, critical issues related to embedded nonvolatile memory are pointed 
out in Chapter 2. The main issues of an embedded nonvolatile memory design are summarized as 
five limitations: low voltage operation, high endurance characteristic, high speed access, high 
density, and low leakage current design of system for low power dissipation. An explanation for 
each limitation is provided to enhance understanding of the study objective. 
For the next three parts of this paper, practical nonvolatile memory design techniques against 
each limitation are demonstrated. In Chapter 3, Capacitor-coupled EEPROM design with 
capacitor-coupled EEPROM cell and dual-mode sensing scheme for low voltage, high endurance, 
and high speed access is discussed. 
In Chapter 4, practical high density 1T-4MTJ MRAM (Magnetic Random Access Memory) 
design with 1T-4MTJ cell and voltage-offset self-reference sensing scheme for high endurance, 
high density, and high speed access is discussed.  
In Chapter 5, the zero standby microcontroller system technology with normally-off system 
architecture and its power management scheme for low voltage operation and low leakage 
current in point view of hardware and software technologies is discussed.  




Chapter 1 Introduction ..................................................................................................................... 1
1.1. Background of research area ................................................................................................. 1
1.2. Objective of this study ........................................................................................................... 2
1.3. Overview of this thesis .......................................................................................................... 4
Chapter 2 Issue of embedded nonvolatile memory for low power microcontroller systems ........... 7
2.1. Introduction ........................................................................................................................... 7
2.1.1. Flash memory ................................................................................................................. 8
2.1.2. Nonvolatile RAM ........................................................................................................... 9
2.2. Low voltage operation ......................................................................................................... 12
2.3. High reliability .................................................................................................................... 12
2.4. High speed access ................................................................................................................ 12
2.5. High density ........................................................................................................................ 13
2.6. Low standby leakage current ............................................................................................... 13
2.7. Summary ............................................................................................................................. 15
Chapter 3 A dual-mode sensing scheme of capacitor-coupled EEPROM cell ............................... 17
3.1. Introduction ......................................................................................................................... 17
3.2. Capacitor-coupled EEPROM cell ....................................................................................... 18
3.2.1. Memory cell structure .................................................................................................. 18
3.2.2. Memory cell operation ................................................................................................. 18
3.3. Dual-mode sensing (DMS) scheme ..................................................................................... 21
3.3.1. Sensing operation ......................................................................................................... 21
3.3.2. Load transistor optimization ......................................................................................... 23
3.3.3. Array architecture ......................................................................................................... 23
3.3.4. Soft error Immunity ...................................................................................................... 25
3.4. Simulated results ................................................................................................................. 25
3.4.1. Sensing speed enhancement ......................................................................................... 25
3.4.2. Endurance improvement .............................................................................................. 25
3.5. Summary ............................................................................................................................. 29
Chapter 4 A high-density and high-speed 1T-4MTJ MRAM with voltage offset self-reference 
sensing scheme ............................................................................................................................... 31
4.1. Introduction ......................................................................................................................... 31
4.2. 1T-4MTJ memory cell structure .......................................................................................... 33
4.3. Voltage offset self-reference sensing scheme ...................................................................... 36
4.3.1. Concept of read-out scheme for 1T-4MTJ cell ............................................................ 36
iv
4.3.2. SRSV read-out sense amplifier .................................................................................... 38
4.3.3. Simulation results ......................................................................................................... 40
4.4. Test chip implementation and evaluation ............................................................................ 42
4.5. Hybrid embedded MRAM solution with 1T-1MTJ and 1T-4MTJ cells ............................. 47
4.6. Summary ............................................................................................................................. 49
Chapter 5 Low-power multi-sensor system with power management and nonvolatile memory 
access control for IoT applications ................................................................................................. 51
5.1. Introduction ......................................................................................................................... 51
5.2. Normally-off architecture for low-power multi- sensor systems ........................................ 53
5.2.1. Key technology of Normally-off computing ................................................................ 53
5.2.2. Normally-of multi-sensor systems ............................................................................... 54
5.2.3. Hierarchical power gating control with activity localization ....................................... 56
5.3. Normally-off power management method .......................................................................... 58
5.3.1. Task scheduling technology ......................................................................................... 58
5.3.2. Autonomous standby mode transition technology ....................................................... 59
5.4. Nonvolatile memory access control technology ................................................................. 62
5.5. Evaluation result .................................................................................................................. 66
5.6. Summary ............................................................................................................................. 72




vList of figures 
Figure 1.1: Market overview of microcontrollers 
Figure 1.2: MCU applications and market volume 
Figure 1.3: Trend of microcontroller with embedded nonvolatile memory 
Figure 1.4: Outline of this thesis 
Figure 2.1: Comparison of microcontroller depend on embedded memory 
Figure 2.2: Operating mechanism of Flash memory 
Figure 2.3: Nonvolatile memory summary 
Figure 2.4: Operating principle of MRAM cell 
Figure 2.5: Comparison of Field-MRAM cell and STT-MRAM cell 
Figure 2.6: Trend of microcontroller with embedded Nonvolatile memory 
Figure 2.7: Trend of dynamic power and sub-threshold leakage dissipation 
Figure 2.8: Paradigm shift for power savings 
Figure 3.1: A newly proposed capacitor-coupled EEPROM cell. 
(a) A memory cell circuit, (b) A cross sectional view
Figure 3.2: The equivalent circuits in the standby state.
(a) The memory cell storing “H” data. (b) The memory cell storing “L” data.
Figure 3.3: “H” data sensing operation. 
(a) The memory cell circuit of DMS scheme. (b) The waveforms of readout voltage.
Figure 3.4: “L” data sensing operation. 
(a) The memory cell circuit of DMS scheme. (b) The waveforms of readout voltage.
Figure 3.5: The memory cell array architecture using the DMS scheme 
Figure 3.6: The clock timing diagram 
Figure 3.7: A simulated signal amplitude on the bit line. The simulation was carried out with a 
3V power supply, a cell current of 15 uA, a bit-line capacitance of 250 fF, and Cs 
capacitance of 25fF. 
Figure 3.8: The cell current dependence of the access time. 
Figure 3.9: The general endurance characteristics corresponding to the 1-Mb level EEPROM 
memory transistor. 
Figure 3.10: The estimated endurance characteristics in the case of the low voltage and short 
time programming. 
 vi
Figure 4.1: Circuit and layout diagrams of Field MRAM and STT MRAM 
Figure 4.2: Cell size trend of Field-MRAM and STT-MRAM 
Figure 4.3: Birds views of MRAM memory cell structures with conventional 1T-1MTJ and 
proposed 1T-4MTJ 
Figure 4.4: Top view SEM photograph of 1T-4MTJ in 130 nm technology 
Figure 4.5: Concept of readout scheme for 1T-4MTJ cell 
Figure 4.6: SRSV sensing circuit 
Figure 4.7: Self-reference amplifier portion of SRSV 
Figure 4.8: Data Amplifier portion of SRSV 
Figure 4.9: Simulation waveform of the readout operation 
(@MR=40%, RMTJ(data"0")=40Kohm) 
Figure 4.10: Read current of 1T-1MTJ and 1T-4MTJ MRAM 
          (x16 bits word organization) 
Figure 4.11: Block diagram of proposal embedded WWL driver 
Figure 4.12: Layout of proposal embedded WWL driver 
Figure 4.13: Array area comparison 
Figure 4.14: Block diagram of 1Mbit MRAM core 
Figure 4.15: 1Mb MRAM micrograph 
Figure 4.16: Measurement result of 1T-4MTJ cell  
(VCC versus Read time shmoo plot) 
Figure 4.17: MCU chip image of on-chip hierarchical MRAM solution 
Figure 4.18: Chip area reduction effect of on-chip hierarchical MRAM solution 
 
Figure 5.1: The relationship between the power consumption and the number of nodes 
Figure 5.2: Activity localization for efficient normally-off computing by task scheduling 
Figure 5.3: System diagram of normally-off multi-sensor node 
Figure 5.4: Concept of hierarchical power gating (PG) control 
Figure 5.5: Simulation results of proposal hierarchical power-gating (PG) control in case of fire 
alarm system 
Figure 5.6: MCU switching task scheduling 
Figure 5.7: Difference in Energy consumption of using each standby modes, which are idle mode, 
three level standby modes (deep sleep, normal sleep, light sleep), and power off 
mode of industrial available MCU [43]. 
Figure 5.8: Concept of proposed NVM architecture 
Figure 5.9: Block diagram of nonvolatile memory access controller 
Figure 5.10: Intruder detection system 
Figure 5.11: Outline of intruder detection evaluation system 
Figure 5.12: Block diagram of evaluation board 
 vii 
Figure 5.13: Photograph of evaluation board 
Figure 5.14: Task flow of intruder detection system 
Figure 5.15: Evaluation results 

 ix
List of tables 
Table 3.1: The voltage conditions in the write, standby, and read operations: 
(a) Conventional and (b) Proposed. 
 
Table 4.1: Technology feature of 1Mb-MRAM test chip 
 
Table 5.1: Available standby modes of Renesas 32-bit MCU: RX63N [62] 
Table 5.2: Each state of IR sensor and motion sensor depends on distance of approaching object 
Table 5.3: Breakdown of energy consumption at intruder system evaluation 
Table 5.4: Selected appropriate standby mode of MCU according to distance of approaching at 





Chapter 1   Introduction 
1.1. Background of research area 
Many semiconductor devices are used nowadays. In case of microcontrollers, that with embedded 
flash memory have become the mainstream and that volume is 70% of all market of 
microcontrollers. The market overview of microcontrollers is shown in Figure 1.1. The average 
growth rate of the market is about 16%, compared to the growth rate of all microcontroller 
market, it has remained at a high rate. Thus, the non-volatile memory devices typified by Flash 
memory are core technologies of all industries. Microcontroller applications and market volume 





*) MCU TAM : WSTS










































Flash memory is embedded
to almost of microcontroller 
devices.
 




1.2. Objective of this study 
As mentioned in previous chapter, nonvolatile memory is a key device for recent microcontroller 
systems. The trend of microcontroller with embedded nonvolatile memory is shown in Figure 
1.3. 
Through ’80 years from the end of the ’70s, it appeared single-chip microcontroller with 
embedded mask ROM, and it becomes possible to make the generalization with instruction sets 
architecture and real time control. Thereby, it has made a big evolution in terms of performance 
and ease of use. In the late of ’80s, it appeared the microcontroller with EPROM (Erasable 
Programmable Read Only Memory) or OTP (One Time Programmable read only memory), and it 
becomes possible to write the program data at production stage by user. Thereby, the 
development and production cost has been greatly improved. After then, in the half of ’90s, it 
appeared the microcontroller with embedded Flash memory (Flash-MCU), and it becomes 
possible to rewrite the program data after production. Thereby, a mass production setup has 
become possible at program development completion and the development period has become 
  
3
enabled to be shorten. In addition, a big change has been happened on production and distribution 
cost side because of commonization of microcontrollers. Currently, Flash-MCU has become the 
mainstream of the microcontrollers. In the future, in which the semiconductor miniaturization, 
and the greening of society advance, the conventional embedded memory (SRAM, Flash 
memory) has the technology limitation such as the leakage current problem with the 
miniaturization of process technology, the next-generation nonvolatile memory (NVRAM) is 
expected. 
This study is intended to overcome the challenge of embedded the nonvolatile memory and 


























- Write program at 
production,




- Generalization with  
instruction sets 
architecture
- Real time control
- Rewrite program 
after production,
- Enable to reduce 
the development 
time.
- Unified Memory, 
- Enable to reduce 
standby current.
- Instant on/off, 













1.3. Overview of this thesis 
Figure 1.4 presents the outline of this thesis, as visualized very simply. First, the background and 
objective of this study are described. For microcontroller systems, critical issues related to 
embedded nonvolatile memory are pointed out in Chapter 2. The main issues of an embedded 
nonvolatile memory design are summarized as five limitations: low voltage operation, high 
endurance characteristic, high speed access, high density, and low power consumption energy 
design of system for low power dissipation. An explanation for each limitation is provided to 





Challenge of embedded nonvolatile memory
for microcontrollers 
2. High endurance
3. High speed access
4. High density
Chapter 3
Approach of Capacitor-coupled Flash memory
- Capacitor-coupled cell
- Dual-mode sensing scheme
Chapter 4
Approach of high density 1T-4MTJ MRAM
- High density 1T-4MTJ cell
- Voltage-offset self-reference sensing scheme
- Hierarchical embedded MRAM architecture
Chapter 5
Approach of Zero standby microcontroller system
- Architecture with Normally-off technology
- Power management technology
- Task scheduling
- Autonomous standby mode transition control
- Nonvolatile memory access control
5. Low energy consumption 
1. Low voltage operation
 




For the next three parts of this paper, practical nonvolatile memory design techniques against 
each limitation are demonstrated. In Chapter 3, Capacitor-coupled EEPROM design with 
capacitor-coupled EEPROM cell and dual-mode sensing scheme for low voltage, high endurance, 
and high speed access is discussed. 
  
5
In Chapter 4, practical high density 1T-4MTJ MRAM (Magnetic Random Access Memory) 
design with 1T-4MTJ cell and voltage-offset self-reference sensing scheme for high endurance, 
high density, and high speed access is discussed.  
In Chapter 5, the zero standby microcontroller system technology with normally-off system 
architecture and its power management scheme for low voltage operation and low leakage current 
in point view of hardware and software technologies is discussed.  





Chapter 2   Issue of embedded nonvolatile memory for 
low power microcontroller systems 
In chapter 2, the comparison of microcontroller depends on the kinds of embedded nonvolatile 
memory and issue for realizing the future low power microcontroller systems.  
 
2.1. Introduction 
The comparison of microcontroller depends on the kinds of embedded nonvolatile memory such 
as Flash memory and Nonvolatile RAM (NVRAM) is shown in Figure 2.1. As mentioned in 
chapter 1.2, Flash-MCU has become the mainstream of the microcontrollers now, and in future, 
NVRAM is expected because of technology limitation of conventional embedded memory 
(SRAM, Flash memory) such as the leakage current problem with the miniaturization of process 
technology. In case of microcontroller with embedded NVRAM, NVRAM is used as universal 

























2.1.1. Flash memory 
The operating mechanism of Flash memory is shown in Figure 2.2. The memory transistor of 
Flash memory is formed with the control gate (CG) and floating gate (FG) through tunnel oxide. 
And threshold voltage (Vth) of memory transistor is decided with the number of electron injected 
to FG. Therefore, Flash memory needs high voltage to inject or release electron to/from FG, and 
write cycle endurance is limited because of the damage to tunnel oxide caused by injecting or 
releasing electron to/from FG. 
 
MOS Tr. Flash memory
- Vth constant












- Vth is decided by the number of 


















2.1.2. Nonvolatile RAM 
The summary of characteristics of various nonvolatile RAM is shown in Figure 2.3.  A 
Feature, which is expected in the next-generation non-volatile memory as embedded memory is a 
coexistence of high speed accessibility as RAM and non-volatility of Flash memory.  
 
PRAM FeRAM MRAM ReRAM Flash memory SRAMField STT
Cell read non-destructive destructive non-destructive non-destructive non-destructive non-destructive non-destructive




(TaO) Si Pure CMOS
Cell Area 
(F2) 6~20 20~40 10~30 6~20 ~30 20~30 120
Write 
access 0.1s~1ms <50ns <10ns < 10ns <50ns 30μs 5ns
Read 
access <20ns <50ns <10ns < 10ns <50ns 10~30ns 5ns
Write # 1012 108～1012 1016(∞) 1016(∞) 102~109 105 ∞
High 


























• Magnetic field 








• Cell area  

















Here, MRAM, which is expected for embedded NVRAM to microcontrollers, is explained 
for example. The operating principle of MRAM cell is shown in Figure 2.4. MRAM cell is 
constituted by one magnetic Tunnel Junction (MTJ) element and one transistors. MTJ element 
includes a fixed ferromagnetic layer, tunnel barrier and free ferromagnetic layer. The free 
magnetic moment is engineered to have two stable states, parallel (0) and anti-parallel (1) to the 
reference moment. The change in tunneling resistance between states is characterized by the 
parameter Magneto Resistance (MR), defined by R1=R0*(1+MR). In write operation, the 
magnetic direction of ferromagnetic film is inverted by the synthetic current magnetic field of Bit 
Line (BL) and Write Word Line (WWL). This direction should be determined by the direction of 
bit line current. In read operation, the resistance value of MTJ is detected with the difference of 
















Invert the magnetic direction of ferromagnetic film  
with the synthetic magnetic field of BL and WWL.
Write










Tunnel Film MTJ element
Read operation：






Magnetic field of 
BL (HBL)
Magnetic field of 
WWL (HWWL)
 






The comparison of Filed-MRAM cell and STT-MRAM cell is shown in Figure 2.5. Whereas 
the write operation of Field-MRAM cell is done by the synthetic current magnetic field of BL and 
WWL, the write operation of STT-MRAM cell is done by the spin torque caused by current flow 
through MTJ element. This direction of free ferromagnetic layer should be determined by the 
direction of that current flow. That current value is reduced depends on MTJ size. Therefore, 





(Magnetic field switching type)
STT-MRAM








Suitable for future 
process technology node
Suitable for current 
process technology node  












2.2. Low voltage operation 
As mentioned in chapter 2.1.1, Flash memory needs high voltage to inject or release electron 
to/from FG. Flash memory has on-chip high voltage (around 10 V) generator, which should be 
generated with power supply voltage (Vcc). However, Vcc has become lower with the 
miniaturization of process and it becomes difficult to generate high voltage level with high 
efficiency. Furthermore, other circuits in microcontroller do not need the high voltage level. 




2.3. High reliability 
The comparison of write cycle endurance is shown in Figure 2.3. As mentioned in chapter 2.1.1, 
the write cycle endurance of Flash memory is limited because of the damage to tunnel oxide 
caused by injecting or releasing electron to/from FG. Actual characteristic of write cycle 
endurance is around 104~6 cycles. So, in case of Flash memory, write cycle endurance is need to 
improve for embedded use to microcontrollers. On the other hand, the write cycle endurance of 
NVRAM such as MRAM is no limited, and the other NVRAMs such as PRAM (Phase change 
Random Access Memory), FeRAM (Ferroelectric Random Access Memory), and ReRAM 
(Resistance Random Access Memory) are under device improving. 
 
 
2.4. High speed access 
The trend of microcontroller with embedded nonvolatile memory is shown in Figure 2.6. In case 
of current volatile memory such as SRAM, though it has high speed accessibility of 5~10ns, it 
has no availability for high density integration. In case of Flash memory, access speed is not so 
high compared with NVRAM such as MRAM. For use embedded memory for microcontroller, it 





























LowIntegration density  
Figure 2.6: Trend of microcontroller with embedded Nonvolatile memory 
 
2.5. High density 
The trend of microcontroller with embedded nonvolatile memory is shown in Figure 2.6. In case 
of current volatile memory such as SRAM, though it has high speed accessibility of 5~10ns, it 
has no availability for high density integration. Therefore, cell size reduction is required for use 
of embedded memory to microcontrollers. Especially, in case Field-MRAM, cell size is larger 
than Flash memory. So, it needs cell size reduction for replacing high density ROM area. 
 
2.6. Low standby leakage current 
The trend of dynamic power and sub-threshold leakage dissipation is shown in Figure 2.7. The 
gate length becomes smaller with the miniaturization of process, and as the results, the 
subthreshold leakage power is increased with the miniaturization of process. It becomes much 
important the standby power reduction not only process level but also system level for future 





































































- No DC current during standby
 
Figure 2.8: Paradigm shift for power savings 
  
15
Paradigm shift for power savings with embedded NVMRM is shown in Figure 2.8. The 
current embedded memories used in the microcontroller such as SRAM and Flash memory have 
the leakage current of SRAM due to process miniaturization and the overhead time and power 
consumption to restore data from Flash memory at power-on period. It is possible to overcome 
the above problems by the NVRAM, and realize very low power operation with frequent 




To realize the embedded nonvolatile memory solution for ultra-low power microcontroller 
systems, it should be overcome above mentioned challenges, which are low voltage operation, 
high reliability, high speed access, high density, low standby leakage current. In this thesis, 
author’s approach to overcome the challenge of embedded the nonvolatile memory and 





















Chapter 3   A dual-mode sensing scheme of capacitor- 
coupled EEPROM cell 
This chapter describes a dual-mode sensing (DMS) scheme of a capacitor-coupled EEPROM cell. 
A new memory cell structure and a new sensing scheme are proposed and estimated. The new 
memory cell combines an EEPROM cell with a DRAM cell. The DMS scheme utilizes the 
charge-mode sensing of the DRAM cell in addition to the current-mode sensing of the EEPROM 
cell. Using this DMS technique, the sensing speed can be enhanced by 36% at a cell current of 15 
FA by virtue of the additional charge-mode sensing. Furthermore, the stress applied to the tunnel 
oxide of the memory transistor can be relieved by decreasing the programming voltage and 
shortening the programming time. Therefore, with this memory cell structure and sensing scheme, 
it is possible to realize high-speed sensing in low-voltage operation and high endurance. 
3.1. Introduction 
There have been many types of memory cell structures and memory cell array architectures for 
EEPROM’s proposed [2-5]. For use in the memory cards of hand-held computers, low-voltage 
operation and high endurance are the most important design issues for high-performance 
EEPROM’s [6-7]. In low-voltage operation, the sensing speed and sensing margin are degraded. 
For high endurance, the stress applied to the tunnel oxide of the memory transistor must be 
relieved [8-10]. Decreasing the programming voltage and shortening the programming time are 
effective in improving the endurance characteristics. However, low voltage and short time 
programming decrease the cell current. The small cell current also degrades the sensing speed 
and sensing margin. To overcome these problems, we propose a capacitor-coupled EEPROM cell 
and a dual-mode sensing (DMS) scheme [11].  
The capacitor-coupled EEPROM cell combines an EEPROM cell with a DRAM cell. The 
DMS scheme utilizes the charge-mode sensing of the DRAM cell in addition to the current-mode 
sensing of the EEPROM cell. The current-mode sensing is based on the previously proposed 
dynamic sensing schemes utilizing differential sensing and source biasing techniques [12-13]. 
In this paper, a capacitor-coupled EEPROM cell and a DMS scheme are proposed. In 
chapter 3.2, a newly proposed capacitor-coupled EEPROM cell and its operation are described. 
In chapter 3.3, the concept of the DMS scheme and its operation are described, and the memory 
cell array architecture using this scheme is shown. In chapter 3.4, the enhancement of the sensing 
speed is described, and the possibility of endurance improvement is discussed. Finally, a 
conclusion is given in chapter 3.5. 
  
18
3.2. Capacitor-coupled EEPROM cell 
 
3.2.1. Memory cell structure 
 
A memory cell circuit of the newly proposed capacitor-coupled EEPROM cell is shown in 
Figure 3.l (a). This memory cell combines an EEPROM cell with a DRAM cell. In the 
capacitor-coupled EEPROM cell, the DRAM cell enhances the sensing speed and the EEPROM 
cell holds the data stored in the DRAM cell. Thus, the refresh operation for the DRAM cell is 
eliminated. 
The memory cell is composed of a select transistor ST, a memory transistor MT of the 
floating-gate-type EEPROM cell, and a capacitor Cs. Cs is formed between the drain and the 
control gate of MT. Cs acts as the storage capacitor of the DRAM cell. The signal charge 
corresponding to the data written in MT is also stored in Cs. 
As an example, a cross-sectional view of a capacitor-coupled EEPROM cell with a 
three-dimensional structure is shown in Figure 3.l (b). By adding a storage-node polysilicon layer 
to the conventional EEPROM process, the cell area increase due to the additional capacitor can 
be minimized. The additional capacitor Cs is formed between the storage-node polysilicon layer 
and the control-gate polysilicon layer. The cell area penalty is estimated to be less than 10% 
compared with the conventional EEPROM cell by using this simple process technology. 
 
 
3.2.2. Memory cell operation 
 
The voltage conditions in the write, standby, and read operations are shown in Table I. The 
write operation is carried out in a conventional EEPROM manner. To written “H” data, bit line 
BL and word line WL are pulled up to a high voltage Vpp, control gate line CGL is grounded, 
and source line SL is set floating. Thus, the electrons are transferred from the floating gate to the 
drain through the tunnel oxide of MT, and MT goes into the depletion state. 
The equivalent circuit of the memory cell storing an“H” data in the standby state is shown in 
Figure 3.2 (a). In the standby state, SL goes to the Vcc level and CGL is grounded. As MT is in 
the depletion state, the storage node of Cs is charged to the Vcc level and kept “H” through the 
channel resistance R of MT. Thus, SL is kept at the Vcc level in contrast with the conventional 
EEPROM cell, and the “H” data stored in Cs is kept by MT in the depletion state. Therefore, 
there is no need for refreshing the data stored in Cs. 
To write “L” data, WL and CGL are pulled up to high voltage VPP, and BL and SL are 
  
19
grounded. Thus, the electrons are transferred from the drain to the floating gate through the 
tunnel oxide of MT, and MT enters the enhancement state. The MT drain is grounded through BL, 
and the storage node of Cs is discharged to the ground level. 
The equivalent circuit of the memory cell storing “L” data in the standby state is shown in 
Figure 3.2 (b). As MT is in the enhancement state, the storage node of Cs is kept “L” floating. 
During the standby state. the “L” data is kept by the leakage current to the substrate as usual 
DRAM operation. Therefore, the capacitor-coupled EEPROM cell is a kind of DRAM cell 





Figure 3.1: A newly proposed capacitor-coupled EEPROM cell. 














Table 3.1: The voltage conditions in the write, standby, and read operations: 





Figure 3.2: The equivalent circuits in the standby state. 





3.3. Dual-mode sensing (DMS) scheme 
 
3.3.1. Sensing operation 
 
The proposed DMS scheme suitable for the capacitor-coupled EEPROM cell is described. 
The memory cell circuit of the DMS scheme and its waveforms of readout voltage in the case of 
sensing “H” data are shown in Figure 3.3 (a) and (b), respectively. In consideration of 
low-voltage operation, Vcc is defined as 3 V in the DMS operation. In the read operation, SL is 
kept at the Vcc level and CGL is grounded. In the standby state, BL is precharged to the Vcc/2 
level, which is equal to 1.5 V. This precharge level of BL is low enough to prevent the decrease 
of the charge stored in the MT floating gate caused by the inverse electric field applied to the 
tunnel oxide. Thus, this read disturbance problem can be avoided by the Vcc/2 precharging of BL 
in the DMS scheme. As MT is in the depletion state, the storage node of Cs is kept “H” by MT as 
described previously. In the active cycle, WL is selected and BL is pulled down through the load 
transistor LT. The charge stored in Cs is transferred to BL, and BL goes high rapidly. This 
operation is the charge-mode sensing. Simultaneously, BL is charged through MT, and goes still 
higher. This operation is the current-mode sensing. 
The DMS operation has two modes of sensing. The solid line in Figure 3.3 (b) shows the 
waveform of readout voltage in the DMS scheme, and the broken line shows that in the 
current-mode only sensing. The signal amplitude in the DMS scheme is increased compared with 
the current-mode only sensing. The shadowed part corresponds to the improvement of the signal 
amplitude by virtue of the additional charge-mode sensing. 
The memory cell circuit of the DMS scheme and its waveforms of readout voltage in the 
case of sensing ”L“ data are shown in Figure 3.4 (a) and (b), respectively. As MT is in the 
enhancement state, the storage node of Cs is kept in the “L” floating state. In the active cycle, 
WL is selected. The charge stored in the BL capacitance is transferred to Cs and BL goes low 
rapidly. Simultaneously, BL is discharged through LT, and goes still lower. As well as the case of 
sensing “H” data, the shadowed part corresponds to the improvement of the signal amplitude by 
virtue of the additional charge-mode sensing. Therefore, this charge-mode sensing increases the 








Figure 3.3: “H” data sensing operation. 
(a) The memory cell circuit of DMS scheme. (b) The waveforms of readout voltage. 
 
 
Figure 3.4: “L” data sensing operation. 




3.3.2. Load transistor optimization 
 
The optimization of the LT size is a key point in the ideal DMS operation. In the standby 
state, BL is precharged to the Vcc/2 level. When the “L” datum is sensed, MT is in the 
enhancement state, and BL is pulled down to the ground level through LT. When the “H” data is 
sensed, MT is in the depletion state, and BL is pulled up to the Vcc level through MT and ST. 
However, LT is also in the ON state, and the “H” level of BL is determined by the ratio of the cell 
current to the current drive capacity of LT. The transistor size of LT should be optimized to 
equalize the “H” sensing speed to the “L” sensing speed. Here, the current drive capacity of the 
memory cell and LT are denoted as Ids and ILT, respectively, as it is shown in Figure 3.3 (a). The 
transistor size of LT is optimized so that ILT is equal to Ids/2. 
 
 
3.3.3. Array architecture 
 
The memory cell array architecture using the DMS scheme is shown in Figure 3.5, and the 
clock timing diagram is shown in Figure 3.6. In Figure 3.5, VBL is equal to the Vcc/2 level. 
Sense amplifiers SA1 and SA2 are located alternately at both ends of bit lines BL1 and BL2 to 
relax the layout pitch of sense amplifiers. The bit-line precharge transistors PTl and PT2 and the 
load transistors LTl and LT2 are connected to bit lines BL1 and BL2, respectively. 
In the read operation, BLT2 goes “L,” and BL1 and BL2 are disconnected from SA2 and 
SAl, respectively. This architecture is similar to the open bit-line scheme of the DRAM. The 
datum written in the memory transistor MT1 is sensed and latched by the sense amplifier SA1, as 
well as the datum in MT2 by the sense amplifier SA2. 
In the differential sensing cycle, BLTl goes “L,” and BLl and BL2 are disconnected from 
SA1 and SA2, respectively. Thus, the datum written in the memory transistor is sensed by the 
isolated sensing manner to enhance the sensing speed. 
In the restore cycle, BLTl goes “H,” and BLl and BL2 are reconnected to SA1 and SA2, 
respectively. In this cycle, when the memory transistor is in the depletion state, the storage node 
of the capacitor is charged to the Vcc-Vth level through the sense amplifier and then it is charged 
to the full Vcc level by the memory transistor in the depletion state. Thus BLTl does not have to 
be boosted to write full Vcc level data. 
This architecture offers a practical implementation of the DMS scheme, which is suitable for 






Figure 3.5: The memory cell array architecture using the DMS scheme 
 
 
Figure 3.6: The clock timing diagram 
  
25
3.3.4. Soft error Immunity 
 
In the case of the DRAM cell, the alpha-particle-induced soft error problem is the important 
issue. Using the proposed capacitor-coupled EEPROM cell, the decrease of the charge stored in 
Cs caused by collecting the alphaparticle-induced charge is compensated by MT. Therefore, the 
cell mode of the soft error does not occur, and the soft error rate is improved compared with usual 
DRAM operation. 
Furthermore, the bit-line mode of the soft error is important in the dynamic sensing scheme. 
Using this DMS technique, the signal amplitude on the bit line is increased by virtue of the 
additional charge-mode sensing. Therefore, the bit-line mode of the soft error is also improved. 
 
 
3.4. Simulated results 
 
3.4.1. Sensing speed enhancement 
 
A simulated signal amplitude on the bit line is shown in Figure 3.7. This simulation was 
carried out with a 3V power supply, a cell current of 15 uA, a bit-line capacitance of 250 fF, and 
Cs capacitance of 25 fF. The signal amplitude in the DMS scheme is increased by 100 mV at 5 ns 
after word-line selection, which is a 120% improvement compared with the current-mode only 
sensing. Thus, the high-speed sensing is realized. 
The cell current dependence of the access time is shown in Figure 3.8. Here, TSE is defined 
as the period to obtain the signal amplitude of 200 mV after the word line is activated. In the 
proposed DMS scheme, TSE is improved by 36% at the cell current of 15 uA by virtue of the 
additional charge mode sensing. 
This DMS scheme has two merits. One is the enhancement of the sensing speed at the same 
cell current of the memory transistor. The other is the decrease of the cell current to obtain the 
same sensing speed compared with the current-mode only sensing scheme. 
 
 
3.4.2. Endurance improvement 
 
In consideration of above merits, the possibility of endurance improvement is discussed. As 
  
26
both programming and erasing operations are carried out through the tunnel oxide, the endurance 
characteristics of the memory transistor are very important. The general endurance characteristics 
corresponding to the 1-Mb level EEPROM memory transistor are shown in Figure 3.9. Here, the 
horizontal axis shows the erase-program cycles. The upper vertical axis shows the threshold 
voltage of the memory transistor in the enhancement state, and the lower vertical axis shows the 
cell current of the memory transistor in the depletion state. In Figure 3.9, the window narrowing 
caused by the charge trapping in the tunnel oxide of the memory transistor appears at a higher 
cycle limit. This window narrowing degrades the sensing margin and the sensing speed. It was 
defined that the sensing operation is achieved normally until TSE is equal to 10 ns. The cell 
current required for the current-mode only sensing should be more than 17 uA. Thus, endurance 
is limited to l05 cycles. On the other hand, using the DMS technique, the sensing speed is hardly 
degraded with the decrease of the cell current. Thus, the endurance characteristics are improved 
compared with those in the current-mode only sensing. Assuming that an intrinsic breakdown of 
the tunnel oxide can be extended, the DMS scheme has a possibility to improve the endurance 




Figure 3.7: A simulated signal amplitude on the bit line. The simulation was carried out with a 







Figure 3.8: The cell current dependence of the access time. 
 
 
Next, the possibility of endurance improvement is discussed from another point of view. For 
high endurance, the stress applied to the tunnel oxide of the memory transistor must be relieved. 
Decreasing the programming voltage and shortening the programming time are effective in 
improving the endurance characteristics. The estimated endurance characteristics in the case of 
low voltage and short time programming are shown in Figure 3.10. It is defined that TSE for 
sensing successfully is more than 8 ns. Using the DMS technique, the cell current can be 
decreased by 12 uA at TSE of 8 ns, which is a 55 % improvement compared with the 
current-mode only sensing. The threshold voltage can be also decreased while the leakage current 
through the memory transistor is prevented. Thus, the stress of the tunnel oxide is relieved 
compared with that in the current-mode only sensing scheme. Therefore, the endurance 
















A capacitor-coupled EEPROM cell and a dual-mode sensing (DMS) scheme are proposed 
and estimated. This memory cell combines an EEPROM cell with a DRAM cell, and the cell area 
penalty is estimated to be less than 10% compared with the conventional EEPROM cell. Using 
this DMS technique, the signal amplitude on the bit line is increased by 120% at 5 ns after 
word-line selection, and the sensing speed is enhanced by 36% at the cell current of 15 uA by 
virtue of the additional charge-mode sensing. Furthermore, the cell current can be decreased by 
55% compared with the current-mode only sensing scheme. Therefore, the stress applied to the 
tunnel oxide of the memory transistor can be relieved by decreasing the programming voltage 
and shortening the programming time, and it is possible to improve the endurance characteristics. 
With this memory cell structure and sensing scheme, it is possible to realize high-speed 
sensing in low-voltage operation and high endurance. The capacitor-coupled EEPROM cell and 







Chapter 4   A high-density and high-speed 1T-4MTJ 
MRAM with voltage offset self-reference 
sensing scheme 
A 1-Transistor 4-Magnetic Tunnel Junction (1T-4MTJ) memory cell has been proposed for field 
type of Magnetic Random Access Memory (MRAM). Proposed 1T-4MTJ memory cell array is 
achieved 44% higher density than that of conventional 1T-1MTJ thanks to the common access 
transistor structure in a 4-bit memory cell. We also proposed a self-reference sensing scheme, 
which can read out with write-back in four clock cycles. A 1-Mbit MRAM test chip is designed 
and fabricated successfully using 130-nm CMOS process. By applying 1T-4MTJ high density 
cell and partially embedded wordline driver peripheral into the cell array, the 1-Mbit macro size 
is 4.04 mm2 which is 35.7% smaller than the conventional one. Measured data shows that the 
read access is 56 ns at 1.5 V typical supply voltage and 25C.  Combining with conventional 
high-speed 1T-1MTJ caches and proposed high-density 1T-4MTJ user memories is an effective 
on-chip hierarchical non-volatile memory solution, being implemented for low-power MCUs and 




The demand for low-power and cost-effective microcontroller (MCU) and system-on-a-chip 
(SoC) are rapidly increased for Internet-of-Things (IoT) applications. Such MCUs and SoCs for 
sensor nodes, which are composed by CPU, external/embedded memories, RF for connectivity, 
and sensors, are strongly needed for low-standby power for intermittent operation with long-term 
waiting period [14]. Embedded SRAMs are widely used for MCUs and SoCs as caches and data 
memories. However, embedded SRAMs consume the standby power to retain the stored data. In 
sensor node systems in IoT, external non-volatile memories (NVM) are combined with embedded 
SRAMs for zero standby power with keeping the stored data. In that case, there are power 
overheads at storing/loading the data to/from the external NVM. In order to reduce the power 
overhead and remove the memory transfer operations, embedded flash solutions for MCUs/SoCs 
have been reported [15-17]. These embedded flash systems have advantages in the zero standby 
and much dense compared to the embedded SRAM, but many additional process steps are needed 
to make a floating gate structure and support over 10 V high voltage tolerant. Embedded flash is 
  
32
also needed charge pump circuit, consuming area overhead and power overheads at power-on/off. 
Other emerging NVM memories, such as Resistive RAM, Phase-Change RAM, and Ferroelectric 
RAM, have been reported [18-23], but those memories including embedded flash memories 
cannot achieve 1016 read/write operations, that are requested alternative SRAM/DRAM solutions 
with high endurance point of view. 
Meanwhile the magnetic RAM (MRAM) with non-volatility can be achieved high 
endurance with over 1016 read/write operations. The fundamental device of MRAM is the 
Magnetic Tunnel Junction (MTJ), which has stacked structure with a fixed or reference 
ferromagnetic layer, tunnel barrier, and free ferromagnetic layer [24]. The free magnetic moment 
is engineered to have two stable states, parallel and anti-parallel to the reference moment. Many 
papers regarding to the MRAM have been published [25-44]. These MRAMs are divided into 
two groups according to the writing mechanism. One is the field type MRAM and the other is 
spin-torque transfer (STT) MRAM. Both type of MRAMs have favorable attributes for a 
universal memory with non-volatility, low-voltage operation, and high write/read endurances, to 
replace the embedded SRAM/DRAM or flash memories. There are a significant area advantage 
compared to the SRAM, but still has disadvantage compared to the flash memory. Field type of 
MRAM in 130 nm has been reported [25]. After that, to realize a high-density field type of 
MRAM array, a cross-point MRAM cell has been proposed [26, 27]. However, the sneak current 
between bitline (BL) and adjacent BL degrades the signal noise margin, resulting in a significant 
slow readout speed. 
On the other hand, spin torque transfer (STT) MRAM has been developed for advanced 
technology nodes [28-31]. The Field MRAM MTJ is written by the synthetic magnetic field of 
BL and write wordline (WWL).  Whereas the STT MRAM MTJ is written by tunnel bias, 
required much larger current than that of readout [32]. To stably write the “0” data, which 
corresponds to the parallel state, it needs sufficient electrons for tunneling from the reference 
layer to free one. The opposite polarity, which corresponds to the anti-parallel state, are stored 
data “1”. Thermal energy disturbs the free moment, crucially initiating the precession write 
process but causing disturbs during the data retention and readout operation [33]. Therefore, an 
STT MRAM design must accommodate a certain level of soft errors, occurring during write, 
storage and read.  
Now, we think of the embedded MRAM for MCUs and SoCs. Major production node of 
low-end and middle-range MCUs is 90 nm to 180 nm. In these case, Field MRAM will be best 
candidate. The cost effective embedded MRAM solutions for MCUs and SoCs at 90 nm and 
previous process node are discussed. 
The paper is organized as follows. In section 4.2, we first discuss the 1T-4MTJ memory cell 
expected for cost effective embedded MRAM at 90 nm and previous process. In the next section, 
we introduce the proposed voltage offset self-reference sensing scheme for feasible readout 
operation. In the section 4.4, we show the design and evaluation results of our test chips 
  
33
fabricated on 130 nm CMOS technology with 1T-4MTJ elements. In section 4.5, the hybrid 
embedded MRAM solution with 1T-1MTJ and 1T-4MTJ cells as the cost-effective approach with 
keeping CPU performance is described. A brief summary is given in section 4.6.  
 
 
4.2. 1T-4MTJ memory cell structure 
 
We propose 1T-4MTJ cell for non-volatile memory solution, which can replace embedded 
flash. A 1T-4MTJ cell consists of one transistor and 4MTJ elements connected in parallel 
alongside one BL. 1T-4MTJ cell solves the sneak current problem, and realizes a smaller cell size 
as well. It also achieves fast read access by a low memory cell resistance from the 
parallel-connected cell structure.  The comparison of cell circuit and layout of field MRAM 































The cell size trend of Field MRAM and STT MRAM is shown in Figure 4.2. In 90 nm and 
previous process node, the cell size of STT-MRAM become much bigger than that of field 
MRAM, because the transistor size of STT MRAM cell should be larger than that of field 
MRAM cell to ensure an enough cell current at write operation. Therefore, field MRAM is 
effective at 90 nm and previous process node and that has an advantage as the process cost point 
of view. Major production node of low-end and middle-range MCUs is 90 nm to 180 nm. So, the 
cell size reduction of flied MRAM is much important for embedded MRAM applied to MCUs. 
Therefore, 1T-4MTJ MRAM is expected for cost effective embedded MRAM solutions for 


































Figures 4.3(a) and 4.3(b) show bird's-eye views of proposed 1T-4MTJ and conventional 
1T-1MTJ cells respectively. In the conventional 1T-1TMTJ cell, each local interconnect layer (LI) 
and local internal connect via (LV) are required to bypass a WWL and BL for each cell, resulting 
in a larger cell area. Meanwhile the proposed 1T-4MTJ cell is placed four MTJ elements on an 
extended LI layer in parallel, reducing the area penalty by sharing the four MTJs. In case of 130 
nm CMOS process, the area of proposed 1T-4MTJ cell is reduced by 44% compared to that of 
1T-1MTJ cell. Figure 4.4 is the top view SEM photograph of the proposed cell fabricated by 130 








WWL  : Write Word Line
WL     : Word Line




























4.3. Voltage offset self-reference sensing scheme 
 
4.3.1. Concept of read-out scheme for 1T-4MTJ cell 
 
In the single-ended bitline structure, the differential sense amplifier scheme with reference 
bitline is often used for high speed read access. In that case, the small differential voltage or 
current between reference bitline and readout bitline are sensed by differential sense amplifier. 
Conventional 1T-1MTJ could be introduced such differential current sense amplifier for readout 
operation. However, proposed 1T-4MTJ cell cannot use the differential sense scheme with 
reference cells, because the four 4MTJ elements are connected with a same bitline and wordline 
through the common access transistor  and there is  24 (equal to 16) bitline current 
combinations according to the stored data. Then, we propose a new self-reference sensing scheme 
(SRSV), which can readout with write-back in four clock cycles. Figure 4.5 illustrates the 




































Stored data=0 Stored data=1
1 1 - - -- - -
Cell current comparison
Dout=0, if ΣIc0,1,2,3(1st read)>ΣIc0,1,2,3(2nd read)Dout=1, if ΣIc0,1,2,3(1st read)=ΣIc0,1,2,3(2nd read)
Ic0 Ic1 Ic2 Ic3
 
Figure 4.5: Concept of readout scheme for 1T-4MTJ cell 
 
 
Proposed SRSV readout scheme has the following 4 steps with four clock cycles. 
 
(1) 1st step: 1st readout operation is performed for making a self-reference voltage level 
according to the stored data combinations of 4MTJ elements. When the selected 
wordline is activated, each current Ic1, Ic2, Ic3 and Ic4 flows in each MTJ element, and 
sum of each current flows the corresponding bitline. The readout current, which has 24 
multi-levels, is converted to the voltage level and temporally hold it in the proposed 
sense amplifier after the wordline is inactivated. The detail function is described in the 
next sub-section.  
(2) 2nd step: In the second clock cycle, the data 1 is written in the target MTJ regardless of 
its stored data. 
(3) 3rd step: In the third clock cycle, the 2nd readout operation is performed as a differential 
sensing with 1st readout data. If the original stored data before 2nd write operation is 0, 
there is differential readout current between 1st readout and 2nd readout. Conversely 
there is no difference between 1st and 2nd readout current if the original stored data is 1. 
Thus the proposed sense amplifier compares difference between 1st and 2nd readout 
operations, and output the data 0 or 1 whether the difference is exit or not. 
  
38
(4) 4th step: When the data 0 is readout at 3rd step, data 0 is written back on the target cell in 
the 4th clock cycle. If the stored data is 1, the write-back operation is omitted, because 
there is no needed to over write the 1 data in the target MTJ element. 
 
 
4.3.2. SRSV read-out sense amplifier 
 
A self-reference sensing has been reported for 1T-1MTJ cell [36], using unbalanced 
transistors of the readout amplifier. This self-reference sensing is effective for 1T-1MTJ, but it 
cannot apply for 1T-4MTJ cell. Figure 4.6 illustrates the schematic diagram of proposed 
self-reference sense amplifier with voltage offset (SRSV) by using the charge sharing scheme. 
The SRSV has two stage amplifiers for generating reference voltage and comparison the data. In 
the first stage, the current sense amplifier are utilized to sensing the bitline current, and converted 
to the voltage level as a reference using the charge sharing circuit. In the second stage, the 
different voltage sense amplifier compares the 2nd readout data and generated self-referenced 
data.  Figures 4.7 and 4.8 are shown the detailed operations of proposed SRSV circuit. During 
pre-read period, a VSA node is constant voltage (VCC>VSA), and the voltage level of BL is kept 
at VSA-Vth intermediate level. 1T-4MTJ cell current is transformed to Vgs voltage on QND 
transistor gate, corresponding to Ids current (= cell current of stored data "0" or "1"). 
Capacitances C1 and C2 are charged to Vgs. Next, f1 is turned off and 2 is turn on just before 
the end of pre-read. Tuning voltage is applied to C3 when 1 is on. The charge sharing between 
C2 and C3 is proceeded when 2 is on (1 is off). Hereby, the voltage of Vgs-1/2V is induced 
on RSAO. Exact voltage offset is achieved by adjusting the tuning voltage as shown in Figure 4.8. 
V is the voltage difference on SAO between data "1" and data "0". Read is performed after Hi 
write cycle. Ic(1) current (= cell current of data"1") flows through QND transistor, which gate is 
biased at Vgs. Vgs voltage defined at pre-read is hold by C1. Then, the drain voltage of QND 
transistor (=SAO) is varied according to stored data "0" or "1". The relation between stored data 
and SAO/RSAO is summarized below. 
Stored data "0": SAO=Vgs-V < RSAO=Vgs-1/2V 













Self-reference Amp. Data Amp.
 















































Figure 4.8: Data amplifier portion of SRSV 
 
 
4.3.3. Simulation results 
 
Figure 4.9 shows the simulated waveform for the proposed SRSV readout operation at 
typical process condition, 1.5 V typical supply voltage and 25C. The cell characteristics are 
MR=40%, RMTJ (data"0")=40K ohm. Simulation result shows that the read-out access (tAC) is 
56 ns and readout cycle time (tC) achieves 80 ns at 50 MHz clock frequency. The actual sensing 
time in the 2nd readout is obtained 16.2 ns as shown in Figure 4.9. 
Figure 4.10 shows the estimated read current of 1T-1MTJ and 1T-4MTJ MRAM (x16 bits 
word organization). In 1T-4MTJ cell, read operation is performed with 4 steps which are 1st read, 
“1” write, 2nd read, and write back. So, there is the penalty of read current because of extra steps. 
The read current of 1T-1MTJ is 17.7 mA (tC=20 ns), and that of 1T-4MTJ is 18.9 mA in case of 
stored data “1”, and 28.2 mA in case of stored data “0” (tC=80 ns). The read current penalty of 
1T-4MTJ read cycle is estimated of +7% or +59% depends on stored data “1” and “0”, 
respectively. Though read current of 1T-4MTJ is larger than 1T-1MTJ because of extra steps, 
1T-4MTJ is expected for cost effective embedded MRAM because of smaller cell size. 
The hybrid embedded MRAM solution with 1T-1MTJ and 1T-4MTJ cells as the 
























































Figure 4.10 Read current of 1T-1MTJ and 1T-4MTJ MRAM 
 (x16 bits word organization) 
  
42
4.4. Test chip implementation and evaluation 
 
A 1-Mbit MRAM test chip is designed and fabricated using 130-nm bulk CMOS process 
with four copper metal layers. In this test chip design, we propose an embedded WWL drivers in 
the cell array to reduce the area overhead as shown in Figure 4.11. In the conventional 1T-1MTJ 
memory cell, the active area including source and drain diffusion and poly gate in the 
front-end-of-line (FEOL) is effectively utilized. Whereas the proposed 1T-4MTJ memory cell has 
some dead spaces in the FEOL because the only one access transistor is placed in the 4-bit cells 
with 4-MTJ elements. To use such open space effectively, the WWL driver circuits are embedded 
in that regain. The layout image of proposal embedded WWL driver is shown in Figure 4.12.  
The area comparison of 1-Mbit macros of conventional 1T-1MTJ, 1T-4MTJ macros with 
and without embedded WWL driver are indicated in Figure 4.13. 1T-4MTJ macro realizes 23.4% 
reduction compared to the conventional 1T-1MTJ one due to the small memory cell. With 

































































SA & Direct peri.
WWL & WL Driver
BL Driver &Y gate
Memory cell
– 23.4% – 35.7%
– 12.3%
without I/O & Peri.
 
Figure 4.13: Array area comparison 
  
44
The block diagram of 1-Mbit MRAM macro is depicted in Figure 4.14 and the die 
micrograph of the test chip is displayed in Figure 4.15. The sub-block size is 64-kbit and built-up 
1-Mbit macro with hierarchical BL and WL structured. The features of 1-Mbit 1T-4MTJ MRAM 
test chip is summarized in Table 1.  The measured shmoo plot of the supply voltage VCC versus 
readout time is shown in Figure 4.16. At 1.5 V typical supply voltage and 25C, 2nd readout time 






Global Column Select Line 


























64k bit sub array
 


























1.700V    +. . . . . . . ***************************+
1.650V    !.         .   ***************************!
1.600V    !.         .    **************************! 
1.550V    !.         .   ***************************!
1.500V    +.         .    **************************!
1.450V    !. . . . . . . **************************+
1.400V    !.         .    **************************!
1.350V    !.         .    **************************!
1.300V    !.         .     *************************!
1.250V    !.         .     ***********************!
1.200V    +. . . . . . . . . ***********************+
1.150V    !.         .        **********************!
1.100V    !.         .         *********************!
1.050V    !.         .         *********************!
1.000V    !.         .         *********************!
950.0MV   +. . . . . . . . . . . . . . . . . . . . .+
900.0MV   !.         .         .         .         .!
+---------+---------+---------+---------+-








Figure 4.16: Measurement result of 1T-4MTJ cell 













4.5. Hybrid embedded MRAM solution with 1T-1MTJ 
and 1T-4MTJ cells 
 
Another advantage of the proposed 1T-4MTJ cell is that it is completely compatible with 
1T-1MTJ cell in the fabrication process. It means that both 1T-1MTJ cells and 1T-4MTJ cells can 
be implemented in a die without any additional process steps. However, the proposed 1T-4MTJ 
MRAM needs 4-clcok-cycle to readout the data, having the disadvantage for operating speed. 
Meanwhile the conventional 1T-1MTJ MRAM can read out within a 1-clock cycle and no need to 
write back, achieving 5.6x faster read access time. Then, we propose an on-chip hierarchical 
MRAM solution as shown in Figure 4.17 and chip area reduction effect of on-chip hierarchical 
MRAM solution is shown in Figure 4.18. An example of MCU design indicates a 20% chip size 
reduction with keeping its performance, by incorporating data memory of a 32-Mbit 1T-4MTJ 
MRAM macro and fast program memory of a 4-Mbit 1T-1MTJ MRAM macro. In this estimation, 
it is assumed that the ratio of memory area of the total of cache size and user data memory size is 
70%, and the ratio of CPU and total memory area is 30%. 
The bank (sub-array) interleave operation (pipe line operation) such as data transfer on same 
bank and same word line and data transfer on pre-activated different bank is effective method to 






















































A 1T-4MTJ memory cell has been proposed for field type of MRAM. Proposed 1T-4MTJ 
memory cell array is achieved 44% higher density than that of conventional 1T-1MTJ thanks to 
the common access transistor structure in a 4-bit memory cell. We also proposed a self-reference 
sensing scheme, which can read out with write-back in four clock cycles. A 1-Mbit MRAM test 
chip is designed and fabricated successfully using 130-nm CMOS process. By applying 1T-4MTJ 
high density cell and partially embedded wordline driver into the cell array, the 1-Mbit macro 
size is 4.04 mm2 which is 35.7% smaller than the conventional one. Measured data shows that 
the read access is 56 ns at 1.5 V typical supply voltage and 25C.  Combining with conventional 
high-speed 1T-1MTJ caches and proposed high-density 1T-4MTJ user memories is an effective 
on-chip hierarchical non-volatile memory solution, being implemented for low-power 







Chapter 5   Low-power multi-sensor system with 
power management and nonvolatile 
memory access control for IoT applications 
 
 
We propose the low-power multi-sensor system with power management and nonvolatile 
memory access control for IoT applications, which achieves almost zero standby power at the 
no-operation modes. A power management scheme with activity localization can reduce the 
number of transitions between power-on and power-off modes with rescheduling and bundling 
task procedures. In addition, autonomously standby mode transition control selects the optimum 
standby mode of microcontrollers, reducing total power consumption. We demonstrate with 
evaluation board as a use case of IoT applications, observing 91 percent power reductions by 
adopting task scheduling and autonomously standby mode transition control combination. 
Furthermore, we propose a new nonvolatile memory access control technology, and estimate the 
possibility for future low-power effect. 
 
5.1. Introduction 
The extensive sensor nodes have a significant role to get a lot of real-time information for 
Internet of Things (IoT) era. The number of sensor nodes is rapidly increased along with the 
device technology scaling and wireless communication progress. The cyber-physical system [45], 
which combines the large-scale data processing by the server and cloud on the internet and the 
sensor nodes controlling sensors and actuators by the network, is a figure of next generation 
embedded systems aim. Future, information equipment, human and things are connected in the 
network, and the large amount of data are handled for the realization of advanced services. Thus, 
sensor nodes are used extensively to gather real-time information in the social environment and 
natural environment, and these volumes will be increased with the development of cyber-physical 
systems. And a lot of advanced services of big data are provided by using real-time information 




The relationship between the power consumption and the number of nodes for various 
applications is shown in Figure 5.1. Especially in IoT applications, it becomes important how to 
reduce the power consumption of huge sensor nodes without reducing the processing 
performance. 
As technology scaling, the static power increases rapidly due to the large amount of 
leakage of the MOSFETs, getting comparable to dynamic power consumption [46]. So far, some 
power reduction techniques, such as clock gating, power gating [47], [48], [49] and dynamic 
voltage frequency scaling (DVFS) [50], [51], [52], [53] have been proposed. However, there were 
focusing on the dynamic power reduction. From the software development point of view, the 
design aid of multi-core embedded systems with energy model has been proposed [54], but it is 
still reducing the dynamic power. In the sensor nodes realizing advanced services in the big data 
era, it is needed to reduce the both dynamic and standby powers in the multiple sensors, and 
higher processing performance of microcontroller units (MCUs). Usually at the waiting mode for 
sensor sampling, sensors and MCU in the senor nodes consumes standby powers, which should 
be reduced as much as possible. 
In this paper, we demonstrate low-power effect of the combination of task scheduling and 
autonomously standby mode transition control in multi-sensor system without reducing the 
processing performance. In the use case in evaluation, optimal combination of previously 
proposed task scheduling algorithms [55], [56] and optimal standby mode is applied [57]. 
Furthermore, we propose new nonvolatile memory access control technology, and estimate the 



















Figure 5.1: The relationship between the power consumption and the number of nodes   
  
53
5.2. Normally-off architecture for low-power multi- 
sensor systems 
 
In this section, we describe the key technology of normally-off computing and normally-off 
computing sys-tem for multi-sensor systems.  
 
 
5.2.1. Key technology of Normally-off computing 
 
Basic concept of normally-off computing is zero standby power at the no operation mode, 
while the dynamic and static powers are only consumed at the duration of the task workloads 
involved. To reduce the total power to improve the energy efficiency of the sensor network 
system, it is important to consider the breakeven time (BET). Otherwise it is unfortunately 
happened that normally-off computing systems consume much power than conventional systems 
in some cases. Because, typical conventional systems consume additional powers in the wake-up 
and shutdown transition phases for each electrical component. The key challenge to reduce the 
total power in a normally-off computing system is to maximize the no operation period and/or to 
minimize the BET by following two approaches. One is to maximize the task workloads 
efficiency by re-scheduling or some other architecture techniques. The other is to reduce the 
power overheads at wake-up and shutdown transition phases by introducing a new type of 
embedded non-volatile memory [58] instead of conventional embedded/external flash memories. 
To maximize the task workloads efficiency, we pro-pose the activity localization by task 
re-scheduling for achieving a normally-off computing [59], [60]. Figure 5.2 shows an example of 
activity localization for efficient normally-off computing by task scheduling. According to the 
application, the operation sub-sets (op1-op8) are specified with processing sequences assigned by 
unit-A and unit-B. The sub-set of op5 in unit-A is executed after op3 procedure in unit-B. 
Originally the waiting period between op1 and op5 in unit-A has less BET so that it should be 
keeping in the power-on mode. By introducing the activity localization with task rescheduling to 
enhance the waiting period between op1 and op5 in unit-A, it can be transit to power-off mode 
effectively for power energy saving because the waiting time is extended over the BET. 
Another approach to reduce the total power in normally-off computing is to reduce the 
power overheads at wake-up and shutdown transition phases. It is expected by new types of 
embedded non-volatile memory. Current embedded/external flash memories need a charge pump 
circuit to make a high-voltage internally for writing and erasing the data, consuming a large 
amount of wake-up and shutdown power overhead. Meanwhile non-volatile memories have 
  
54
advantage with less power overhead at power-on/off transition because there are no charge pump 
circuits with small writing power. These types of new non-volatile memories can be shorter the 
BET, achieving the fine-grained power gating in normally-off computing. 
 
 
5.2.2. Normally-of multi-sensor systems 
 
Figure 5.3 illustrates a system diagram of multi-sensor node. The conventional sensor node 
consists of sensor modules, and microcontroller (MCU). Power supply is always delivered to all 
sensor modules and MCU. Sampling and averaging of the sensor data are performed periodically 
at MCU according to the requirement from the system. After that, the processing data are sent out 
to sensor networks through the sensor-net interface. In the conventional system, the standby 
power is consumed in both active mode and standby (waiting) mode. Meanwhile, a system 
diagram of normally-off sensor node [42] is illustrated in Figure 5.3 (b). It is composed by a 
power management unit with real-time clock (RTC), a sensor module connected with several 
types of sensors, MCU, in which has a wireless network interface unit. The sensor module has 
each data interface connected to each sensor, sensor controller (MCUlow) and sensor data buffers 
to store the sensed data from multi-sensors. The power supplies Vcc are independently connected 
to sensors, sensor controller, sensor data buffers, and MCU respectively. The roles of these 













(BET:Break Even Time) 
op1 op5 op6 op8













If t<BET, No chance for Power-off If t>BET, Chance for Power-off
 











































































(b) Normally-off system 
 
Figure 5.3: System diagram of normally-off multi-sensor node   
  
56
(a) Power Management Unit 
Power Management Unit controls power on/off in all units according to the stored data of 
Power Statement Register. It manages task level scheduling for activity localization and 
autonomous standby mode transition. 
 
(b) Sensor controller 
In proposed normally-off system, sensor controller is newly added for simply executing 
the data sampling of sensors instead of MCUhigh. Contrary it is done by main MCU in the 
conventional system. 
 
(c) NVRAM buffer 
Sensor sampling data are stored in this buffer accumulatively along with the Sensor 
Controller. Accumulated and bundled sampling data are forward to the MCUhigh where the data 
processing is executed. The frequent number of power on/off transitions can be reduced by this 
NVRAM buffer, reducing the power overhead at transitions. 
 
The target of normally-off power management in this work is to achieve optimal power 
control in consideration with break even time (BET) in sensor nodes. We propose a hierarchical 
power gating and an autonomous standby mode transition control combining with the activity 
localization by task scheduling. By these ideas, it becomes possible of energy optimization in a 
multi-sensor network system and of usability improvement by ease of application software 
development. The details of the two ideas are described in the following sub-sections. 
Here, sensor modules and microcontroller are normally-off and/or intermittent, and power 
management unit and real-time clock (RTC) are always on. 
 
 
5.2.3. Hierarchical power gating control with activity localization 
Here, the basic idea of activity localization is explained. The concept of hierarchical power 
gating (PG) control is shown in Figure 5.4. In the conventional PG control, the MCU is activated 
when every data sampling process of sensors is performed (Figure 5.4 (a)). Therefore, total of 
energy consumption is large with increase of number of power-on/off cycles of MCU. In the 
hierarchical PG control, sensing data is buffered in sensor data buffer temporally, and after then 
MCU is activated and performed the process in bundle (Figure 5.4 (b)). Therefore, it is possible 
to optimize the number of power on/off cycles and much decrease total of energy consumption. 
Figure 5.5 shows the sampling period for temperature sensor vs. energy consumption of 
conventional and proposed PG control for instance, in case of fire alarm system. The energy 














Figure 5.5: Simulation results of proposal hierarchical power-gating (PG) control  






















(a) Conventional PG control             (b) Hierarchical PG control 
 
Figure 5.4: Concept of hierarchical power gating (PG) control   
  
58
5.3. Normally-off power management method 
 
The target of normally-off power management in this work is to achieve optimal power 
control in consideration of two ideas are adopted. One is a task scheduling technology and the 
other is an autonomous standby mode transition technology. By these ideas, it becomes possible 
of energy optimization in a multi-sensor network system and of usability improvement by ease of 
application software development. 
 
 
5.3.1. Task scheduling technology 
 
In most embedded systems, tasks periodically arrive and are then executed. Input interval 
length is defined as the distance of task arrival times of successive tasks. In recent IoT 
applications, processors are required to execute wide variety of tasks with different input interval 
and execution time as shown in Figure 5.6 (a). Each task consists of periodic tasks. In Figure 5.6 
(b), there are two processors; MCUhigh, which is a high performance MCU and MCUlow, which 
is an energy efficient MCU. In this figure, y-axis shows the relative power consumption. 
However, ASAP (As soon as possible) scheduling, which is the simplest algorithm, only use 
MCUhigh and cannot take the advantage of heterogeneous multi-core at all. Therefore, adaptive 
task scheduling, which includes execution timing management and dynamic active core selection, 
is indispensable for low power embedded systems. To cope with this challenge, some energy 
efficient algorithms have been proposed.  
In case the deadline is longer than the input interval period, a lumped scheduling which 
execute multiple tasks continuously are adopted as shown in Figure 5.6 (c). To realize this 
execution, a task is not executed just after it is available. The execution is postponed within 
deadline constraint can be met. After several tasks are ready, the execution is started. After that 
we can execute several tasks continuously at each MCUs [55]. 
More effective approach is the task scheduling algorithm that can take advantage of the longer 
deadline as shown in Figure 5.6 (d). In this execution, the usage of energy efficiency core 
(MCUlow) is maximized to maximize energy efficiency during execution and the number of core 










5.3.2. Autonomous standby mode transition technology 
 
In some applications, it may not be possible to power-off because of shorter sampling period. 
To maximize low-power effects, the autonomous standby mode transition technology is proposed, 
which is to make select and transition the optimum standby mode of MCU autonomously to 
minimize power consumption by quantifying the constraints of hardware and software. 
Understanding of the constraint condition of hardware side and software side is needed to 
select the optimal standby mode. The constraint of the hardware side is the breakeven time (BET), 
which is derived from standby power consumption specific to each device and the energy 
overhead of the state transition of the standby and active. The constraint of the software side is 
















1 2 3 4 5
MCUlow














1 2 3 4 5MCUlow




 Minimal execution of MCUhigh so as not to cause a deadline violation
Scheduling at each Task




Required MCU performance: 
Task-A: MCUlow, 
Task-B: MCUhigh
Scheduling between all Tasks

















Industrial available MCU for sensor network applications supports for several standby 
modes. For instance, available standby modes of Renesas 32-bit MCU: RX63N [62] are shown in 
Table 5.1. Figure 5.7 illustrates schematically the difference in energy consumption of using each 
standby mode. In this case, the standby modes have three levels: deep sleep (Mod-3), normal 
sleep (Mod-2), and light sleep (Mod-1). In the deep sleep mode, the standby power is much 
reduced than normal and light sleep modes, however, the wakeup time is longer and power 
overhead of transition between active mode and standby mode is much higher than others. 
Conversely the light standby mode has advantage of small power overhead in the transition, but 
the effect of standby power reduction less than other modes. There is trade-off between standby 
power reduction effect and transition time/power overheads. The intersections of each graphs 
show the breakeven time (BETn) of each standby mode, it is found that there is optimal standby 
mode to minimize energy consumption. In case of Figure 5.7, the optimal standby mode is that 
IDLE when task cycle is shorter than BET1, Mod-1 when BET1 to BET2, Mod-2 when BET2 to 
BET3, Mod-3 when BET3 to BET4, and Power off when longer than BET4. It should be 
dynamically selected optimum standby mode according the task workloads in use cases to 

































Task cycleBET1 BET2 BET3 BET4
Energy Consumption
while active state
BETn: Break Even Time at each standby modes.
Power off
Mod -1 Mod -2 Mod -3 Power offIDLE





Figure 5.7: Difference in Energy consumption of using each standby modes, which are idle mode
three level standby modes (deep sleep, normal sleep, light sleep), and power off mode of 
industrial available MCU [62]. 
  
62
5.4. Nonvolatile memory access control technology 
 
Furthermore, in order to effectively make the system normally-off system, it is necessary 
to make the pro-gram memory and the data memory with nonvolatile. However, in general, the 
nonvolatile memory (NVM) is said to have more current at read / write period than that of 
volatile memory. 
Here we consider a normally off control method aiming at power reduction by cutting off 
the power supply that is not directly related to the current access of NVM. 
Although a low power embedded nonvolatile memory technology has been proposed [63], 
[64], the memory area is composed of a plurality of blocks, and each block is not subjected to 
power control considering power-on overhead energy and time. 
Figure 5.8 shows the concept of proposed NVM architecture. Figure 5.8 (a) is the 
conventional NVM architecture, which has memory access all the time during CPU operation and 
the whole memory is powered on. Figure 5.8 (b) shows NVRAM divided into the blocks and 
independently control the power supply, and only the blocks necessary for access are turned on. 
Unnecessary blocks are turned off immediately. By reading the access address exchanged 
between the CPU and the NVM and the necessary block by reading the read code, it is possible to 




























(a) Conventional                             (b) Proposed 
 
Figure 5.8: Concept of proposed NVM architecture 
  
63
Figure 5.9 shows the block diagram of newly proposed nonvolatile memory (NVM) 
access controller. NVM controller has below function for memory power control. 
 
 
(a) Control register 
 
A hardware setting register for setting the type and capacity of the NVM, a memory use 
area setting register for setting whether or not the mounted memory is used for each task, and 
various registers for setting the operation of the power control function in detail. 
 
 
(b) Instruction address judgement 
 
The current instruction address can be obtained from the value of the control register 
designating the storage destination of the instruction program, the start address of each block, and 
the address accessed from the CPU. When the address of the memory accessed by the CPU 
matches the start address of the storage destination block of the instruction program, the access 
address becomes the current instruction address. 
Control register



























Block 16  
 
Figure 5.9: Block diagram of nonvolatile memory access controller 
  
64
In the block in which instruction address judgment is set valid, power control is 
performed according to the following conditions. 
1) Turn on the block power including the current instruction address 
2) Turn on the block power at the address added with the value of the register which determines 
the address of what address to be turned on at the current instruction address 
3) Set the block power supply that does not correspond to the address obtained by adding the 




(c) Instruction decode judgment 
 
In the block in which instruction decode judgment is set to valid, power control is 
performed according to the following conditions. 
1) When a branch instruction is read, all the blocks in the standby mode in the block of the 
instruction program storage and specified in the memory use area are turned on, and after the 
elapse of the time to execute the instruction set in the control register Set to standby mode except 
for the block currently accessed. 
2) When there is a memory access command, turn on the power supply of all the blocks in the 
data storage destination and in the standby mode by specifying the memory use area. The power 
is turned on until the time until the instruction execution set in the control register elapses and 
then the reading of the instructions other than the memory access instruction continues for the 
specified number of times or more. Set the power supply to the standby mode. 
 
 
(d) BET Judgement 
 
The breakeven time (BET) is defined as the time during which the energy consumption 
becomes equal when the power is controlled and when it is not controlled. BET is determined by 
a preset power parameters which are the ratio of the power-on overhead energy and standby 
(idling) energy. Only when the idle time is longer than BET, the power supply of the memory 
block is turned off. 
 
 
(e) Power setting (Memory block control) 
 
Provide registers for controlling the power supply of the specified NVM to always be on 
and displaying the power setting status of the NVM. Whether or not to turn ON the power supply 
  
65
of the NVM of each block is determined by each setting register and the access state. There are 
the following two factors for setting the power supply state. 
 
1) ON setting by instruction address determination 
2) ON setting by instruction decode judgment 
 
 
(f) Wait control 
 
When the NVM power supply is not turned on, if there is an access from the main board 
to the NVM, since it can not respond immediately, the wait control to assert the wait signal to the 
main board is performed. The time required for wait is set in the register according to the 
characteristics of the device. By waiting until the internal counter reaches the value of the setting 
register when the power supply is turned on again, it is possible to access after the power supply 
state is stabilized. 
 
 
(g) Memory use area designation 
 
For each task in NVM from the application program, set OFF or standby mode in the 
memory use area specification register. The control to turn on the power supply of each NVM 



















5.5. Evaluation result 
We demonstrate with above task scheduling and autonomous standby mode transition 
technology in case of the intruder detection monitor system as a use case of IoT applications 
(Figure 5.10). The outline of intruder detection evaluation system is shown in Figure 5.11. Here, 
the evaluation board with multi-sensors are developed and demonstrated. The block diagram and 
photograph of evaluation board are shown in Figure 5.12 and Figure 5.13 respectively. Here, 
Renesas MCUs of RL78 (16 bits, max. freq. = 32 MHz, average current = 4 mA) and RX63N (32 



















In conventional case, 
- The sampling period of infrared sensor is 
no depends on the distance of approaching object. 
(Predetermined to 50ms by user program)
- The intruder detection is performed at every sampling
of motion sensor.  
Figure 5.10: Intruder detection system 
 





















































Figure 5.13: Photograph of evaluation board 
  
68
Task flow of intruder detection system is shown in Figure 5.14. In this use case, the 
sampling period of the infrared ray (IR) sensor and standby mode of microcontroller are 
determined with the distance of approaching object by using the autonomous standby mode 
transition technology, and the judgement of intruder detection is relaxed with the distance of 
approaching object by using the task scheduling technology, though the sampling period of 
motion sensor is constant to maintain the detection accuracy as shown in Table 5.2. 






Prop. 500ms 100ms 50ms
Motion Sensor
sampling period
Conv. Sampling = 30ms fixed.
(Judgement= 30ms fixed.)
Prop. 30ms fixed.
(Judgement) (500ms) (100ms) (50ms)
 
Data sampling at IR sensor
(Period=50ms, fixed)
Start






from IR sensor data
Data sampling at IR sensor
(Period=50/100/500ms)




Set IR sampling period
and Intruder detection period
 
Figure 5.14: Task flow of intruder detection system 
  
69
In this use case, it is assumed that an intruder has approached once every 1000 s. As the 
result, around 91 percent power reductions are obtained by adopting the task scheduling and 
autonomously standby mode transition control technology combination (Proposed-1), where the 
evaluation period is 1,000 s, and the detection intruder time of 3 s per evaluation period of 1,000 
s. (Figure 5.15) In the demonstrated condition, “conventional” corresponds to ASAP scheduling 
in Figure 5.6 (b), and “proposal” corresponds to Lumped scheduling in Figure 5.6 (c). 
The basic idea of normally-off power management in this work is below. 
-  Lumped scheduling the tasks as much as possible within the deadline period and increasing 
the standby (idle) period with task scheduling technology. 
- Select the optimum standby mode in the extended standby period with autonomous standby 
mode transition technology. 
The energy consumption is more improved with larger number of bundled tasks. Both functions 












Evauation period = 1,000sCondition = Detection intruder period of 3s per 1,000s


















The breakdown of energy consumption at this evaluation is shown in Table 5.3 and the 
selected appropriate standby mode of MCU according to the distance of approaching at this use 
case is shown in Table 5.4. In conventional case, as IR sensor operates at sampling period of 50 
ms, and motion sensor operates at sampling period of 30 ms, these sensors and MCU cannot be 
powered off. In proposed case, the sampling period of IR sensor and the judgement of intruder 
detection is relaxed by using the autonomous standby mode transition and task scheduling 
technologies which are controlled by power manager. The sampling period of IR sensor is 




MCU (RX63N) 161.1 6.9 6.9
MCU (RL78) - 6.9 6.9
IR sensor 306.5 14.7 14.7
Motion sensor 32.7 11.2 11.2
ZigBee 56.7 6.7 6.7
Power Manager - 0.2 0.2
NVRAM - 2.0 1.6
Total 557.0 48.6 48.2
Energy Consumption of each elements   [J]
 
Table 5.4: Selected appropriate standby mode of MCU according to distance of  


















extended to 500 ms when the distance of approaching object is over 10 m, and 100 ms when that 
is 5~10 m. Although the sampling period of motion sensor is fixed to 30 ms to maintain the 
detection accuracy and these sampling data are stored in NVRAM. However, the judgement of 
intruder detection is also relaxed to 500 ms when the distance of approaching object is over 10 m, 
and 100 ms when that is 5~10 m. It means the dead-line period of motion sensor data is 500 ms  
at > 10 m, and 100 ms at 5~10 m, and the number of bundled tasks is increased and the idle time 
can be extended. As the results, IR sensor, motion sensor and MCU can be powered off at 
standby state and energy consumption can be much reduced as shown in Table 5.3. In this 
evaluation, the power manager is programed with FPGAs. 
In addition, we estimated the low-power effects by using newly proposed nonvolatile 
memory (NVM) access control technology. In case of this demonstration, the sensing data of IR 
and motion sensors are stored in nonvolatile memory blocks. The information on which data are 
stored in which NVM block, is stored in the memory use area designation. Thereby, NVM block 
is turn on only when accessing controlled by NVM access controller. In this demonstration case, 
the NVM access power reduction of around 18 percent is estimated by using simulation with 
above function. Here, NVM memory area is divided into 16 blocks and each block is individually 
power controlled as shown in Figure 5.9. As the result, the total power reduction of around 1 
percent is expected in this intruder detection system. In this application, the power consumption 
is relatively large except for the memory, so the effect is small, but this proposal is a very 
important technology in the IoT application with low power consumption. For applications that 
frequent perform memory access such as image recognition, energy improvement by proposed 
NVM access control technology can be expected. 
Here, other use cases are considered. As mentioned above, the energy consumption is 
more improved with larger number of bundled tasks. In case of Gas sensing, the sampling period 
is usually set to 20 s. By JIA (The Japan Institute of Architects) inspection regulations, when gas 
leakage is detected, it is necessary to warn within 60 s. Therefore, the deadline period for task 
scheduling is set to 60 s, and the number of bundled tasks is 3. So, the energy improvement is 
estimated to around 20~30 percent which is not so much compared with intruder detection 
system. Here, in case of intruder detection system, the maximum number of bundled tasks is 16 










The low-power multi-sensor system with task scheduling and autonomous standby mode 
transition control are proposed, and the possibility of power reduction of 91 percent are 
demonstrated in the use case of the intruder detection monitor system. Furthermore, the 
low-power effect of around 1 percent by using newly proposed nonvolatile memory access 
control technology is estimated. These ideas are key candidates applied to IoT applications that 
require low power consumption. 
  
73
Chapter 6   Conclusion 
 
This thesis reports emerging embedded nonvolatile memory solution for ultra low power 
microcontroller systems. This study is presented as measures to address the following three main 
points: 
1. A dual-mode sensing scheme of capacitor-coupled EEPROM cell (Chapter 3) 
2. A high-density and high-speed 1T-4MTJ MRAM with voltage offset self-reference 
 sensing scheme (Chapter 4) 
3. A Power Management Scheme with Hierarchical Power Gating and Autonomous  
Standby Mode Transition Control for Normally-Off Multi-Sensor Network  
Applications (Chapter 5) 
 
In Chapter 1, the background and objective of this study were prefaced. To realize the 
embedded nonvolatile memory solution for ultra-low power microcontroller systems, it should be 
overcome above mentioned challenges, which are low voltage operation, high reliability, high 
speed access, high density, low standby leakage current. In this thesis, author’s approach to 
overcome the challenge of embedded the nonvolatile memory and investigate that solution for 
ultra-low power microcontroller systems. 
In Chapter 3, a dual-mode sensing (DMS) scheme of capacitor-coupled EEPROM cell for 
high speed accessibility and high reliability of write cycle endurance are proposed and estimated. 
This memory cell combines an EEPROM cell with a DRAM cell, and the cell area penalty is 
estimated to be less than 10% compared with the conventional EEPROM cell. Using this DMS 
technique, the signal amplitude on the bit line is increased by 120% at 5 ns after word-line 
selection, and the sensing speed is enhanced by 36% at the cell current of 15 uA by virtue of the 
additional charge-mode sensing. Furthermore, the cell current can be decreased by 55% 
compared with the current-mode only sensing scheme. Therefore, the stress applied to the tunnel 
oxide of the memory transistor can be relieved by decreasing the programming voltage and 
shortening the programming time, and it is possible to improve the endurance characteristics. 
With this memory cell structure and sensing scheme, it is possible to realize high-speed sensing 
in low-voltage operation and high endurance. The capacitor-coupled EEPROM cell and the DMS 
scheme are promising candidates for high-performance EEPROM’s. 
In Chapter 4, a high-density and high-speed 1T-4MTJ MRAM with voltage offset 
self-reference sensing scheme is proposed as the method for chip area reduction. A 
1Mbit-MRAM with world’s first 1T-4MTJ cell structure has been verified by a 130nm CMOS 
process technology. By the self-reference sense amplifier with voltage offset scheme, high-speed 
  
74
read access operation of 50MHz@4cycle and tAC=56nsec (Simulation result) has been achieved. 
In 1T-4MTJ, the active area space in memory array is effectively utilized. The array area 
reduction of -35.7% become possible with the embedded WWL Driver architecture. By using 
1T-1MTJ and 1T-4MTJ cells, on-chip hierarchical memory architecture composed of fast 
1T-1MTJ cell for cache memory and small 1T-4MTJ cell for large-capacity memory is feasible. 
An example of microcontroller design indicates a 20% chip size reduction with keeping a 
microcontroller performance, by incorporating data memory of 32Mb by 1T-4MTJ cell and fast 
program memory of 4Mb by 1T-1MTJ cell. 
In Chapter 5, the low-power multi-sensor system with power management and nonvolatile 
memory access control for IoT applications, which achieves almost zero standby power at the 
no-operation modes, are presented. A power management scheme with activity localization can 
reduce the number of transitions between power-on and power-off modes with rescheduling and 
bundling task procedures. In addition, autonomously standby mode transition control selects the 
optimum standby mode of microcontrollers, reducing total power consumption. We demonstrate 
with evaluation board as a use case of IoT applications, observing 91 percent power reductions 
by adopting task scheduling and autonomously standby mode transition control combination. 
Furthermore, we propose a new nonvolatile memory access control technology, and estimate the 
possibility for future low-power effect. These technologies are the most promising candidates for 





[1] T. Kono, “Embedded Non-Volatile Memory Design for Highly Reliable Applications,” Forum 
on ISSCC, Feb. 19, 2012. 
[2] Y. Terada et al., “A new architecture for the NVRAM – An EEPROM backed-up dynamic 
RAM,” IEEE J. Solid-State Circuits, vol. 23, no. 1, pp. 86-90, Feb. 1988. 
[3] T. Nakayama et al., “A 5-V-only one-transistor 256K EEPROM with page-mode srase,” IEEE 
J. Solid-State Circuits, vol. 24, no. 4, pp. 911-915, Aug. 1988. 
[4] M. Momodomi st al., “An experimental 4-Mbit CMOS EEPROM with NAND-structured 
cell,” IEEE J. Solid-State Circuits, vol. 24, no. 5, pp. 1238-1243, Oct. 1989. 
[5] Y. Terada et al., “120-ns 128K x 8-bit / 64K x 16-bit CMOS EEPROM’s,” IEEE J. Solid-State 
Circuits, vol. 24, no. 5, pp. 1244-1249, Oct. 1989. 
[6] T. Nozaki et al., “A 1-Mb EEPROM with MONOS memory cell for semiconductor disk 
application,” IEEE J. Solid-State Circuits, vol. 26, no. 4, pp. 497-501, Apr. 1991. 
[7] D. Cioaca st al., “A million-cycle CMOS 256K EEPROM,” IEEE J. Solid-State Circuits, vol. 
SC-22, no. 5, pp. 684-692, Oct. 1987 
[8] V. N. Kynett et al., “A 90-ns one-million erase/program cycle 1-Mbit flash memory,” IEEE J. 
Solid-State Circuits, vol. 24, no. 5, pp. 1244-1249, Oct. 1989 
[9] M. McConnell et al., “An experimental 4-Mb flash EEPROM with sector erase,” IEEE J. 
Solid-State Circuits, vol. 26, no. 4, pp. 484-491, Apr. 1991. 
[10] M. Momodomi et al., “A 4-Mb NAND EEPROM with tight programmed Vt distribution,” 
IEEE J. Solid-State Circuits, vol. 26, no. 4, pp. 492-496, Apr. 1991. 
[11] M. Hayashikoshi et al., “A dual-mode sensing scheme of capacitor coupled EEPROM cell 
for super high endurance,” in Dig. Tech. Papers, 1991 Symp. VLSI Circuits, May 1991, pp. 
89-90. 
[12] K. Kobayashi et al., “A high-speed parallel sensing architecture for multi-megabit flash 
E2PROM’s,” IEEE J. Solid-State Circuits, vol. 25, no. 1, pp. 79-83, Feb. 1990. 
[13] Y. Terada et al., “High speed page mode sensing scheme for EPROMs and flash EEPROMs 
using devided bit kine architecture,” in Dig. Tech. Papers, 1990 Symp. VLSI Circuits, June 
1990, pp. 97-99. 
[14] T. Nakada et al., “Normally-off Computing for IoT Systems,” ISOCC, 2015, pp. 147-148. 
[15] T. Kono et al., “40nm embedded SG-MONOS flash macros for automotive with 160MHz 




[16] Y. Taito et al., “A 28nm embedded SG-MONOS flash macro for automotive achieving 
200MHz read operation and 2.0MB/s write throughput at Tj, of 170°C,” ISSCC, 2015, pp. 
132-133. 
[17] H. Mitani et al., “A 90nm Embedded 1T-MONOS Flash Macro for Automotive Applications 
with 0.07mJ/8kB Rewrite Energy and Endurance Over 100M Cycles Under Tj of 175°C,” 
ISSCC, 2016, pp. 140-141 
[18] M. Ueki, et al., “Low-Power Embedded ReRAM Technology for IoT Applications,” in 
Symp. On VLSI Technology Dig. Tech Papers, pp. T108-T109, 2015. 
[19] Y. Hayakawa, et al., “Highly reliable TaOx ReRAM with centralized filament for 28-nm 
embedded application,” in Symp. On VLSI Technology Dig. Tech Papers, pp. T14-T15, 
2015. 
[20] Y. Choi, et al., “A 20nm 1.8V 8Gb PRAM with 40MB/s Program Bandwidth,” in ISSCC 
Dig. Tech. Papers, pp. 46-48, 2012.  
[21] C. Villa, et al., “A 45nm 1Gb 1.8V Phase-Change Memory,” in ISSCC Dig. Tech. Papers, pp. 
270-271, 2010. 
[22] R. Ogiwara, et al., “Highly Reliable Reference Bitline Bias Designs for 64 Mb and 128 Mb 
Chain FeRAMs,” in Solid-State Circuits, IEEE Journal, pp.1324-1331,2015. 
[23] M. Qazi, et al., “A low-voltage 1Mb FeRAM in 0.13μm CMOS featuring time-to-digital 
sensing for expanded operating margin in scaled CMOS,” in ISSCC Digest of Technical 
papers, pp. 208-210, 2011. 
[24] M. Durlam, et al., “Nonvolatile RAM based on magnetic tunnel junction elements,” in 
ISSCC Digest of Technical papers, pp. 130-131, 2000. 
[25] T. Tsuji, et al., “A 1.2V 1Mbit Embedded MRAM Core with Folded Bit-Line Array 
Architecture, ” in Symp. VLSI Circuits Dig. of Tech. Papers, pp. 450-453,2004. 
[26] N. Sakimura et al, “A 512Kb cross-point cell MRAM,” ISSCC Digest of Technical Papers, 
pp. 278-279,2003. 
[27]  Y. Asao et al, “Design and Process Integration for High-Density, High-Speed, and 
Low-Power 6F2 Cross Point MRAM Cell,” in IEDM Technical Digest, pp. 571-574, 2004. 
[28]  M. Jefremow et al, “Time-differential Sense Amplifier for sub-80mV Bitline Voltage 
Embedded STT-MRAM in 40nm,” ISSCC, 2013, pp. 216-217. 
[29] T. Andre et al, “ST-MRAM Fundamentals, Challenges, and Applications,” CICC, 2013, pp. 
1-8. 
[30]  C. Kim et al, “A Covalent-Bonded Cross-Coupled Current-Mode Sense Amplifier for 




[31] H. Noguchi et al, “A 3.3ns-Access-Time 71.2μW/MHz 1Mb Embedded STT-MRAM Using 
Physically Eliminated Read-Disturb Scheme and Normally-Off Memory Architecture,” 
ISSCC, 2015, pp. 136-137. 
[32] J. C. Slonczewski, “Current-driven Excitation of Magnetic Multilayers,” J. Magn. Magn. 
Mater., vol. 159, pp. L1-L7, 1996. 
[33] W. H. Butler et al, “Switching Distributions for Perpendicular Spin-Torque Devices Within 
the Macrospin Approximation,” IEEE Trans. Magn., vol. 48, pp. 4684-4700, 2012. 
[34] J. DeBrosse et al., “A Fully-Functional 90nm 8Mb STT MRAM Demonstrator Featuring 
Trimmed, Reference Cell-Based Sensing,” Custom Integrated Circuits Conference (CICC), 
pp. 1-3, 2015. 
[35] H.Tanizaki et al., “A high-density and high-speed 1T-4MTJ MRAM with Volt-age Offset 
Self-Reference Sensing Scheme,” in A-SSCC, pp. 303-306, Nov.2006. 
[36] T. Y. Kim et al., “A 75MHz MRAM with Pipe-Lined Self-Reference Read Scheme for 
Mobile/Robotics Memory System,” in A-SSCC, pp. 117-120, Nov.2005. 
[37] H. Kano et al., “MRAM with improved magnetic tunnel junction material,” in INTERMAG 
Conf., p. BB-04, 2002. 
[38] R. Scheuerlein et al., “A 10 ns read and write non-volatile memory array using a magnetic 
tunnel junction and FET switch in each cell,” ISSCC Digest of Technical Papers, pp. 
128-129, 2000. 
[39] B. N. Engel et al., “A 4-Mb Toggle MRAM Based on a Novel Bit and Switching Method,” 
in IEEE Transactions Vol. 41, Issue 1, pp. 132-136, 2004. 
[40] H. Honigschmid et al., “Signal-Margin-Screening for Multi-Mb MRAM,” in Digest of 
Technical Papers, pp. 467-476, 2006. 
[41] T. Ohsawa et al., “1Mb 4T-2MTJ nonvolatile STT-RAM for embedded memories using 32b 
fine-grained power gating technique with 1.0ns/200ps wake-up/power-off times,” in VLSI 
Circuits (VLSIC), pp. 46-47, 2012. 
[42] K. Tsuchida et al., “A 64Mb MRAM with Clamped-Reference and Adequate-Reference 
Schemes,” in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 
258-259, 2010. 
[43] C. J. Lin et al., “45nm Low Power CMOS Logic Compatible Embedded STT MRAM 
Utilizing a Reverse-Connection 1T/1MTJ Cell,” in Electron Devices Meeting (IEDM), pp. 
1-4, 2009. 
[44] S. Chung et al., “Fully integrated 54nm STT-RAM with the smallest bit cell dimension for 




[45] B. H. Krogh, et al., “Cyber-Physical Systems Executive Summary,” CPS Steering Group, 
IEEE Int. Conf. Cyber-Physical Syst., Mar. 6, 2008. 
[46] R. Puri, L. Stok et al., “Keeping Hot Chips Cool,” Proceedings of the 42nd Annu. Design 
Autom. Conf., 2005, pp. 285–288. 
[47] L. Zhao et al., “Gayser-2: The second prototype CPU with fine-grained run-time power 
gating,” in Proc. 16th Asia and South Pacific Design Autom. Conf., 2011, pp. 87–88. 
[48] K. Usami et al., “Design and Control Methodology for Fine Grain Power Gating based on 
Energy Characterization and Core Profiling of Microprocessors,” in proc. 19th Asia and 
South Pacific Design Autom. Conf., 2014, pp. 843-848. 
[49] L. Benini et al., “A survey of design techniques for system-level dynamic power 
management,” IEEE Trans. Very Large Scale Integr. Syst., vol. 8, no. 3, pp. 299–316, Jun. 
2000. 
[50] M. Weiser, et al., “Scheduling for Reduces CPU Energy,” in Proc. 1st USENIX Conf. 
Operating Syst. Des. Implementation, 1994, Art. no. 2. 
[51] F. Yao, et al., “A scheduling model for reduced CPU energy, Foundations of Computer 
Science,” in Proc. 36th Annu. Symp., 1995, pp. 374–382. 
[52] W. Huang, et al., “An optimal speed control scheme sup-ported by media servers for low 
power multimedia applica-tions,” Multimedia Syst., vol. 15, no. 2, pp. 113–124, 2009. 
[53] M. E. T. Gerards, et al., “Optimal DPM and DVFS for Frame-based Real-time Systems,” 
ACM Trans. Archit. Code Optim., vol. 9, no. 4, pp. 41:1–41:23, 2013. 
[54] T. Nakada, et al., “Design Aid of Multi-core Embedded Systems with Energy Model,” IPSJ 
Trans. Adv. Comput. Syst., vol. 7, no. 3, pp. 37–46, 2014. 
[55] T. Nakada, et al., “An Adaptive Energy-Efficient Task Scheduling under Execution Time 
Variation based on Statistical Analysis,” in Proc. IFIP/IEEE Int. Conf. Very Large Scale 
Integration Conf., 2016, pp. 1–7. 
[56] T. Nakada, et al., “Energy-Efficient Continuous Task Scheduling for Near Real-Time 
Periodic Tasks,” in Proc. IEEE Int. Conf. Data Sci. Data Intensive Syst., 2015, pp. 675–681. 
[57] M. Hayashikoshi, et al., “Low-Power Multi-Sensor System with Task Scheduling and 
Autonomous Standby Mode Transition Control for IoT Applications,” in Proc. IEEE Symp. 
Low-Power High-Speed Chips, Apr. 2017, pp. 1–3. 
[58] H. Nakamura, “Normally-Off Computing Project: Challenges and Opportunities,” in Proc. 
19th Asia South Paciﬁc Des. Autom. Conf., Jan. 2014, pp. 1–5. 
[59] M. Hayashikoshi, et al., “Normally-Off MCU Architecture for Low-power Sensor Node,” in 
Proc. 19th Asia South Paciﬁc Des. Autom. Conf., Jan. 2014, pp. 12–16. 
  
79
[60] M. Hayashikoshi, et al., “Normally-Off MCU Architecture and Power Management Method 
for Low-Power Sensor Network,” in Proc. 12th Int. SoC Des. Conf., Nov. 2015, pp. 
151–152. 
[61] M. Hayashikoshi, et al., “Low-Power Multi-Sensor System with Normally-off Sensing 
Technology for IoT Applications,” in Proc. 13th Int. SoC Des. Conf., Oct. 2016, pp. 
195–196. 
[62] Renesas RX63N User’s manual: Hardware, pp. 293–295, Apr. 2014. [Online]. Available: 
http://www.renesas.com/en-us/doc/products/mpumcu/doc/rx_family/r01uh0041ej0180_rx63
n631.pdf 
[63] T. Kawahara, et al., “Scalable Spin-Transfer Torque RAM Technology for Normally-Off 
Computing,” IEEE Des. Test Comput., vol. 28, no. 1, pp. 52–63, Jan.-Feb. 2011. 
[64] C. Layer, et al., “Low-Power Hybrid STT/CMOS System-on-Chip Embedding Non-Volatile 






1) M. Hayashikoshi, H. Hidaka, K. Arimoto, and K. Fujishima, “A dual-mode sensing scheme 
of capacitor-coupled EEPROM cell,” IEEE J. Solid-State Circuits, vol. 27, no. 4, pp. 
569-573, 1992. 
2) Y. Terada, K. Kobayashi, T. Nakayama, M. Hayashikoshi, Y. Miyawaki, N. Ajika, H. Arima,  
T. Matsukawa, T. Yoshihara, “120-ns 128 K*8-bit/64 K*16-bit CMOS EEPROMs,” IEEE J. 
Solid-State Circuits, Vol. 24, no. 5, pp. 1244-1249, 1989. 
3) K. Kobayashi, T. Nakayama, Y. Miyawaki, M. Hayashikoshi, Y. Terada, T. Yoshihara, “A 
high-speed parallel sensing architecture for multi-megabit flash E2PROMs,” IEEE J. 
Solid-State Circuits, vol. 25, no. 1, pp. 79-83, 1990. 
4) H. Hidaka, K. Arimoto, K. Hirayama, M. Hayashikoshi, M. Asakura, M. Tsukude, T. Oishi, 
S. Kawai, K. Suma, Y. Konishi, K. Tanaka, W. Wakamiya, Y. Ohno, K. Fujishima, “A 34-ns 
16-Mb DRAM with controllable voltage down-converter,” IEEE J. Solid-State Circuits, vol. 
27, no. 7, pp. 1020-1027, 1992. 
5) M. Tsukude, K. Arimoto, H. Hidaka, Y. Konishi, M. Hayashikoshi, K. Suma, K. Fujishima, 
“Highly reliable testing of ULSI memories with on-chip voltage-down converters,” IEEE 
Design & Test of Computers, vol. 10, no. 2, pp. 6-12, 1993. 
6) T. Nakada, K. Okamoto, T. Komoda, S. Miwa, Y. Sato, H. Ueki, M. Hayashikoshi, T. 
Shimizu, H. Nakamura, “Design Aid of Multi-core Embedded Systems with Energy 
Model,” IPSJ Transactions on Advanced Computing Systems, Vol. 7, No. 3, pp. 1 - 10, Aug., 
2014. 
7) T. Nakada, T. Hatanaka, H. Ueki, M. Hayashikoshi, T. Shimizu, and H. Nakamura, “An 
Energy-Efficient Task Scheduling for Near-realtime Systems with Execution Time 
Variation,” IEICE Trans. on Information and Systems, Vol. E100-D, No. 10, pp. 2493-2504, 
Oct. 2017. 
8) M. Hayashikoshi, H. Ueki, H. Kawai, T. Shimizu, K. Nii, and Y. Matsuda, “Low-Power 
Multi-Sensor System with Power Management and Nonvolatile Memory Access Control for 







1) Y. Terada, K. Kobayashi, T. Nakayama, M. Hayashikoshi, Y. Miyawaki, N. Ajika, H. Arima, 
T. Matsukawa, T. Yoshihara, “120-ns 128 K*8b/64 K*16b CMOS EEPROMs,” Solid-State 
Circuits Conference, 1989. Digest of Technical Papers. 36th ISSCC., 1989 IEEE 
International, pp. 136-137, 1989. 
2) K. Kobayashi, T. Nakayama, M. Hayashikoshi, Y. Miyawaki, Y. Terada, H. Arima, T. 
Matsukawa, T. Yoshihara, “A self-timed dynamic sensing scheme for 5V only multi-Mb 
flash E/sup 2/PROMs,” IEEE Symposium on VLSI Circuits, Dig. Tech. Papers, pp. 39-40, 
1989. 
3) Y. Terada, T. Nakayama, K. Kobayashi, M. Hayashikoshi, S. Kobayashi, Y. Miyawaki, N. 
Ajika, T. Yoshihara, “High speed page mode sensing scheme for EPROMs and flash 
EEEPROMs using divided bit line architecture,” IEEE Symposium on VLSI Circuits, Dig. 
Tech. Papers, pp. 97-98, 1990. 
4) M. Hayashikoshi, H. Hidaka, K. Arimoto, and K. Fujishima, “A Dual-mode Sensing Scheme 
Of Capacitor Coupled EEPROM Cell For Super High Endurance,” IEEE Symposium on 
VLSI Circuits, Dig. Tech. Papers, pp. 89-90, 1991. 
5) K. Arimoto, H. Hidaka, M. Hayashikoshi, M. Asakura, K. Fujishima, T. Yoshihara, “A 34ns 
16MbDRAM with controllable voltage down convertor,” Solid-State Circuits Conference, 
ESSCIRC '91. Proceedings - Seventeenth European, vol. 1, pp. 21-24, 1991. 
6) M. Tsukude, K. Arimoto, H. Hidaka, Y. Konishi, M. Hayashikoshi, “A Testing Technique for 
ULSI Memory with On-chip Voltage Down Converter,” Test Conference, Proceedings., 
International, p. 615, 1992. 
7) H. Tanizaki, T. Tsuji, J. Otani, Y. Yamaguchi, Y. Murai, H. Furuta, S. Ueno, T. Oishi, M. 
Hayashikoshi, H. Hidaka, “A high-density and high-speed 1T-4MTJ MRAM with Voltage 
Offset Self-Reference Sensing Scheme,” IEEE Asian Solid-State Circuits Conference, pp. 
303-306, 2006. 
8) H. Tanizaki, T. Tsuji, J. Otani, Y. Yamaguchi, Y. Murai, M. Hayashikoshi, H. Hidaka, “A 
1Mb High-Density Toggle-MRAM with Symmetrical Read/Write Operations,” 22nd IEEE 
Non-Volatile Semiconductor Memory Workshop, pp. 63-65, 2007. 
9) M. Hayashikoshi, Y. Sato, H. Ueki, H. Kawai, and T. Shimizu, "Normally-off MCU 
architecture for low-power sensor node," in Proc. 19th Asia and South Pacific Design 
Automation Conference (ASP-DAC), pp. 12-16, Jan. 2014. 
10) T. Nakada, T. Shigematsu, T. Komoda, S. Miwa, H. Nakamura, Y. Sato, H. Ueki, M. 
Hayashikoshi, T. Shimizu, “Data-aware power management for periodic real-time systems 
with non-volatile memory,” IEEE Non-Volatile Memory Systems and Applications 
Symposium (NVMSA), pp. 1-6, 2014. 
11) M. Hayashikoshi, H. Ueki, H. Kawai, and T. Shimizu, " Normally-Off MCU Architecture 
and Power Management Method for Low-Power Sensor Network," in Proc. 12th 
  
83
International SoC Design Conference (ISOCC), pp.151-152, Nov. 2015. 
12) T. Nakada, K. Yanagihashi, H. Ueki, T. Tsuchiya, M. Hayashikoshi, H. Nakamura, 
“Energy-Efficient Continuous Task Scheduling for Near Real-time Periodic Tasks,” The 8th 
IEEE International Conference on Internet of Things, 2015. 
13) T. Nakada, T. Hatanaka, H. Ueki, M. Hayashikoshi, T. Shimizu, H. Nakamura: "An adaptive 
energy-efficient task scheduling with energy model," Annual Meeting on Advanced 
Computing System and Infrastructure (ACSI), Jan. 28 2015. 
14) M. Hayashikoshi, H. Noda, H. Kawai, and H. Kondo, “Low-Power Multi-Sensor System 
with Normally-off Sensing Technology for IoT Applications,” in Proc. 13th International 
SoC Design Conference (ISOCC), pp. 195-196, Oct. 2016. 
15) T. Nakada, T. Hatanaka, H. Ueki, M. Hayashikoshi, T. Shimizu, H. Nakamura: "An 
Adaptive Energy-Efficient Task Scheduling under Execution Time Variation based on 
Statistical Analysis," IFIP/IEEE International Conference on Very Large Scale Integration 
(VLSI-SoC) (poster), 7 pages, Sep. 2016. 
16) M. Hayashikoshi, H. Noda, H. Kawai, K. Nii, and H. Kondo, “Low-Power Multi-Sensor 
System with Task Scheduling and Autonomous Standby Mode Transition Control for IoT 
Applications,” in Proc. IEEE Symposium in Low-Power and High-Speed Chips, pp. 1-3, 
Apr. 2017. 
17) T. Nakada, H. Yanagihashi, K. Imai, H. Ueki, T. Tsuchiya, M. Hayashikoshi, H. Nakamura:, 
"Energy-aware Task Scheduling for Near Real-time Periodic Tasks on Heterogeneous 
Multicore Processors," IFIP/IEEE International Conference on Very Large Scale Integration 




















1) T. Nakada and H. Nakamura, "Normally-Off Computing," Springer Japan, 
978-4-431-56503-1, 2017. 
M. Hayashikoshi is in charge of Chapter 3: Non-volatile Memories, Chapter 5: Technologies 
for Realizing Normally-Off Computing, and Chapter 6: Research and Development of 

































United States Patents 
1) 5,554,868  Semiconductor nonvolatile memory device 
2) 5,297,096  Oscillation switch device  
3) 4,970,727  Semiconductor memory device 
4) 5,132,928  Semiconductor nonvolatile memory device 
5) 5,321,646  Semiconductor dynamic memory device 
6) 6,011,428  Power supply circuit and semiconductor device 
7) 6,097,180  Power supply circuit and semiconductor device 
8) 5,835,419  Semiconductor memory device to prevent malfunction due to disconnection of  
column select line and word line 
9) 5,825,694  Semiconductor memory device to prevent malfunction due to disconnection of  
column select line and word line 
10) 5,986,915  Semiconductor memory device to prevent malfunction due to disconnection of  
column select line and word line 
11) 6,304,503  Semiconductor memory device 
12) 6,483,761  Semiconductor memory device 
13) 5,233,610  Semiconductor memory device 
14) 6,088,819  Semiconductor dynamic memory device and test method 
15) 5,956,281  Semiconductor memory device with shallow substrate voltage with disturb test  
mode and self refresh mode 
16) 6,337,814  Semiconductor memory device with substrate voltage generator  




1) 特開平 4-222994  不揮発性半導体記憶装置  
2) 特開平 4-14871   不揮発性半導体記憶装置及びその消去及び書き込み方法  
3) 特開平 4-208566  不揮発性半導体記憶装置   
4) 特開平 4-159694  不揮発性半導体記憶装置  
5) 特開平 4-82091   不揮発性半導体記憶装置 
6) 特開平 1-105396  不揮発性半導体記憶装置  
7) 特開平 1-130394  半導体記憶装置  
8) 特開平 2-35692   電気的に書換え可能な不揮発性半導体メモリ 
9) 特開平 2-78099   半導体記憶装置  
10) 特開平 2-66798   不揮発性半導体記憶装置 
11) 特開平 3-14272   不揮発性半導体記憶装置 
12) 特開平 3-5995    不揮発性半導体記憶装置 
13) 特開平 5-62461   半導体記憶装置 
14) 特開平 11-328951 半導体記憶装置 
15) 特開平 6-259150  電圧供給回路および内部降圧回路 
16) 特開平 6-60648   パルス信号発生回路および半導体記憶装置 
17) 特開平 6-21377   半導体記憶装置 
18) 特開平 7-30386   出力回路 
19) 特開平 10-199296 ダイナミック型半導体記憶装置およびそのテスト方法 
20) 特開平 11-86536  半導体記憶装置 
21) 特開 2002-237197 半導体記憶装置 
22) 特開 2010-044460 電源制御装置、計算機システム、電源制御方法、電源制御プログ
ラムおよび記録媒体 
23) 特開 2010-097655  ＭＲＡＭ制御装置およびＭＲＡＭ制御方法 
24) 特開 2011-060082  メモリ制御装置 
25) 特開 2013-239220  半導体装置 
26) 特開 2012-094201  半導体装置 
27) 特開 2012-074110  半導体装置 5503480 
28) 特開 2014-203185  情報処理装置および情報処理方法 
29) 特開 2014-203148  メモリ制御回路 
30) 特開 2014-203192  車両運行管理システム、端末装置、制御装置、および車両運行管
理方法 
31) 特開 2015-011474  半導体装置 
32) 特許 2569895 不揮発性半導体記憶装置及びその消去及び書き込み方法  
33) 特許 2542110 不揮発性半導体記憶装置  
34) 特許 2634089 不揮発性半導体記憶装置  
  
87
35) 特許 2641602 不揮発性半導体記憶装置 
36) 特許 2656280 不揮発性半導体記憶装置  
37) 特許 2082173 半導体記憶装置  
38) 特許 2670094 電気的に書換え可能な不揮発性半導体メモリ 
39) 特許 3118239 半導体記憶装置  
40) 特許 2621411 不揮発性半導体記憶装置 
41) 特許 2504831 不揮発性半導体記憶装置 
42) 特許 2072578 不揮発性半導体記憶装置 
43) 特許 3373169  半導体記憶装置 
44) 特許 2851767 電圧供給回路および内部降圧回路 
45) 特許 2787639 パルス信号発生回路および半導体記憶装置 
46) 特許 2912498 半導体記憶装置 
47) 特許 3708561 出力回路 






I would like to express my gratitude to Visiting Professor Koji Nii of Kanazawa University for 
providing me the great opportunity to study in his laboratory and his appropriate guidance and 
valuable advice for this research. I am grateful as well to Professor Yoshio Matsuda, Professor 
Junichi Akita, Associate Professor Masayuki Miyama, Associate Professor Takeshi Kawae and 
Visiting Associate Professor Hideyuki Noda of Kanazawa University, who gave me fruitful advice 
related to this research. 
This thesis comprises the results of experiments that have been undertaken since the author 
entered the LSI Research Laboratory, Mitsubishi Electric Corp. in 1986, and continued on to 
Renesas Electronics Corporation, which was founded after the merger with semiconductor sector of 
Mitsubishi Electric Corp. and Hitachi Ltd. in 2003, and NEC Electronics Corp. in 2010 including 
results obtained from experiments carried out at Kanazawa University, Graduate School of Natural 
Science and Technology as a doctoral student since 2012. 
I am exceedingly grateful to Fellow of Renesas Electronics Corporation, Dr. Hideto Hidaka, 
for providing me an opportunity to study as a doctoral student and for his encouragement. My 
thanks also go to successive Vice President of Automotive Control Project Management Division, 
Renesas Electronics Corporation, Dr. Tadaaki Yamauchi and Vice President of Shared R&D 
Division 1, Renesas Electronics Corporation, Mr. Hiroyuki Kondo.  
I have enormous appreciation for useful advice and active discussions of design technologies 
from Dr. Tsutomu Yoshihara, Dr. Kazuyasu Fujishima, Dr. Kazutami Arimoto, Dr. Toru Shimizu, 
Dr. Hiroyuki Kawai, Mr. Osamu Yamamoto, Mr. Takaharu Tsuji, Mr. Hiroaki Tanizaki, Mr. 
Yasumitsu Murai, Mr. Hiroshi Ueki, Mr. Kohei Sato, Mr. Kenji Noguchi, Mr. Tetsuichiro Ichiguchi, 
Mr. Takashi Kono, Mr. Takashi Ito and my colleagues at Mitsubishi Electric Corp., currently 
Renesas Electronics Corporation. 
I have enormous appreciation for useful advice and active discussions of process technologies 
from Dr. Osamu Tsuchiya, Dr. Masahiro Shimizu, Dr. Motoi Ashida, Dr. Shuichi Ueno, Dr. 
Kiyoshi Kawabata, and my colleagues at Mitsubishi Electric Corp., currently Renesas Electronics 
Corporation. 
And in my research in chapter 5, I have joined at Normally-Off Computing Project of the 
New Energy and Industrial Technology Development Organization (NEDO) in Japan from 2011 
to 2016. I have enormous appreciation for useful advice and active discussions from NEDO. My 
thanks also go to successive Dr. Kazuhiro Ohnishi of Renesas Electronics Corporation, and 
Professor Hiroshi Nakamura, Dr. Shinobu Miwa, Dr. Takashi Nakada of University of Tokyo, and 
Professor Hitoshi Matsubara, Professor Keiji Hirata, Professor  Masashi Toda, Associate 
Professor Takeshi Nagasaki, Professor Masaaki Wada of Future University Hakodate, and 
Professor Masafumi Kimata, Professor Takeshi Fujino, Assistant Professor Masayoshi Shirahata, 
  
90
Dr. Toshio Kumamoto of Ritsumeikan University, and Assistant Professor Tetsuya Hirose of Kobe 
University, and Professor Kazutami Arimoto, Professor Yoichiro Satoh, Dr. Tomonori Yokogawa of 
Okayama Prefectural University.  
I am grateful to many researchers and engineers in Renesas Electronics Corporation, for their 
valuable advice and discussions. Finally, I would like to thank the many researchers who 
participated in lively discussions with me at international conferences and symposiums. 
 
 
