Low power processor architecture and multicore approach for embedded systems by 大谷 寿賀子 & Otani Sugako
学 位 論 文 要 旨 
Low power processor architecture and 























“IoT” or “Internet of things” has been absolutely essential to our society and its infrastructures. 
Devices are linked to networks from anywhere in the world and will be mutually controlled while 
information is being exchanged.  A microcontroller is one of the important elements of IoT. The 
microcontroller designers are strongly urged to achieve both high performance computation and low 
power consumption, which is a hybrid technology with powerfulness of computing and friendliness to 
the environment.                                                       
This thesis focuses on the development of efficient microcontroller architecture for IoT. The basis for 
the argument is the key of a low power processor architecture is how effective handle on chip memories. 
Furthermore, collaboration of software and hardware on multicore architecture can provide dependable 
and secure networks.    
To test our hypothesis, we introduced RX processor core which is suitable for IoT. The RX processor 
Instruction set architecture (ISA) and its microarchitecture can achieve lower power consumption and 
boost performance. We presented eight-core communication SoC with PCI Express interface. The 
multicore SoC can realize a high performance, power-aware, highly dependable network. We also 
demonstrated a secure multimedia system by using heterogeneous multicore SoC and software 
virtualization.   
2 
Chapter 1 Introduction 
“IoT” or “Internet of things” formerly known as “ubiquitous computing” has been absolutely 
essential to our society and its infrastructures. Devices are linked to networks from anywhere in the 
world and will be mutually controlled while information is being exchanged.  A microcontroller is one 
of the important elements of IoT. The microcontroller designers are strongly urged to achieve both high 
performance computation and low power consumption, which is a hybrid technology with powerfulness 
of computing and friendliness to the environment. Furthermore, while network services are gaining 
popularity, dependability and security of network are more important. A key solution to meet these 
demands is a compact and low power processor core and multicore technology.  
This thesis focuses on the development of efficient microcontroller architecture for IoT. The basis for 
the argument is the key of a low power processor architecture is how effective handle on chip memories. 
Furthermore, collaboration of software and hardware on multicore architecture can provide dependable 
and secure networks. 
1.1 Thesis Contributions 
The main contributions of this dissertation are the following: 
An RX processor core which is suitable for IoT. The RX processor Instruction set architecture (ISA) 
and its microarchitecture can achieve lower power consumption and boost performance.  
An eight-core communication SoC with PCI Express interface. The multicore SoC can realize a high-
performance, power-aware, highly dependable network.  
A secure multimedia system that uses heterogeneous multicore SoC and software virtualization. 
1.2 Thesis Outline 
The outline of the remainder of this thesis is as follows. 
Chapter 2 provides the background and motivation for this work. It discusses the characteristics and 
requirements of IoT by presenting  four key IoT technologies. 
3 
 
Chapter 3 introduces the RX processor core with a low-power processor architecture. The RX 
processor instruction set architecture (ISA) and its microarchitecture can achieve lower power 
consumption and boost performance. RXv2 reaches 4.5 Coremark per MHz and the RXv2 processor 
delivers approximately more than 2.2 – 5.7x the power efficiency of the previous work. The RXv2 
processor delivers 1.9 – 3.7x the cycle performance of previous work in digital signal applications. 
Chapter 4 presents an eight-core communication SoC with PCI Express interface. PEACH with four 
PCI Express ports realizes high-performance communication of 4 x 20Gbps and power efficiency of 
0.04W/Gbps. The power efficiency of InfiniBand 4X (Commodity network devices) is 0.083W/Gbps. 
Thus, PEACH provides 51.5% better power efficiency than InfiniBand 4X. We also evaluate the 
PEARL network system and demonstrate its fault-tolerant ability. 
Chapter 5 demonstrates a secure multimedia system by using a heterogeneous multicore SoC with SiP 
and software virtualization. The multicore hypervisor virtualizes hardware resources and prohibits 
operating systems and applications from accessing hardware resources directly. 
Finally, Chapter 6 concludes the thesis and suggests directions for future work. 
 
Figure 1. Thesis outline 
4 
Chapter 2 Background and Motivation 
2.1 Four Key Technologies that support IoT 
There are four key technologies that supports  IoT, 1) network technology to link one device to 
another, 2) technology to control sensors, motors and other devices, 3) low power consumption 
technology to raise energy efficiency and 4) security technology (Figure 2).  
With an increase in the number of devices on networks, power consumption has become a major 
issue. Sensing modules must always be active to collect information and be long-lived in infrastructures. 
In IoT applications, it is vital to consider how to link applications and microcontrollers, how to 
communicate for people with electronics devices. 
 
2.2 Research Goals 
Given the applications and systems requirements, we consider four key technologies for an efficient 
microcontroller architecture for 
IoT systems:  
 Network technology 
 Security technology  
 Technology to control 
sensors, motors and other 
devices 
 Low-power technology 
The above features of the 
architecture and microarchitecture 
techniques are presented in the 
following chapters.  
 




Chapter 3 Low-Power MCU Processor 
Architecture  
The basic strategy of reducing power consumption is to lower the operating current and shorten the 
operating time. Figure 3 shows the difference in power consumption of a low-power microcontroller 
with another microcontroller. The blue bar represents an energy-saving microcontroller with lower 
operating current and higher performance. The low-power microcontroller completed the same task in 
much less time, which also enables it to stay in low-power sleep mode longer. This intermittent 
operations strategy of low-power microcontrollers enables batteries to last a long time.  
Design highlights of a low-power processor architecture are instruction set architecture, processor 
microarchitecture and memory access mechanism. These three items are vital to achieve high 
performance. Instruction set architecture and memory access mechanisms contribute to low operating 
current. The most effective way to achieve low operating current is reducing the number of instruction 
memory accesses, because memories in microcontroller consume  a large amount of power.  
Application fields of microcontrollers have spread to building automation, medical devices, motor 
control, e-metering, and home appliances. The demand for such highly intelligent systems has increased. 
To meet the demand, the scale and complexity of software has begun to rise. The rapid growth of 
memory capacity and the advance of microcontroller functions have led to the higher frequency and 
higher processing performance of embedded 
 




Figure 4. RX CPU block diagram 
6 
processors. Furthermore, many embedded systems still have severe cost, power consumption, and space 
constraints. In order to meet users’ demands for these requirements, we have developed a new RX 
processor core (RXv2) architecture (Figure 4). 
RXv2 is the new generation of RX processor architecture for microcontrollers with high-capacity 
flash memory. An enhanced instruction set and pipeline structure with an advanced fetch unit (AFU) 
provide an effective balance between power consumption performance and high processing 
performance. Enhanced instructions such as the DSP function and floating point operation, and a five-
stage dual-issue pipeline synergistically boost the performance of digital signal applications. The RXv2 
processor delivers 1.9 – 3.7x the cycle performance of the RXv1 in these applications. The decrease of 
the number of Flash memory accesses by AFU is a dominant determiner in reducing power 
consumption. The AFU of RXv2 benefits from adopting a branch target cache, which has a 
comparatively smaller area than that of a typical cache systems. High code density delivers low power 
consumption by reducing instruction memory bandwidth. The implementation of RXv2 delivers up to 
46% reduction in static code size, and up to 30% reduction in dynamic code size relative to RISC 
architectures. RXv2 reaches 4.5 Coremark per MHz and operates up to 240MHz. The RXv2 processor 
delivers approximately more than 2.2 – 5.7x the power efficiency of the RXv1.  
The RXv2 microprocessor achieves the best possible computing performance in various applications 
such as building automation, medical, motor control, e-metering, and home appliances which lead to 
higher memory capacity, frequency and processing performance. 
Chapter 4 PEACH: A Multicore 
Communication SoC with PCI Express 
I/F 
The eight-core communication SoC, code-named “PEACH”, with four 4x PCI Express rev.2.0 ports, 
realizes a high performance, power-aware, highly dependable network. The network uses PCI Express 
not only for connecting peripheral devices but also as a communication link between computing nodes.  
This approach opens up new possibilities for a wide range of communications. Recent trends in using 
computing clusters point to a growing demand for high-compute-density environments in various 
application fields such as server appliances including distributed Web servers.  Distributed Web servers 
need many server nodes and low-latency and high-bandwidth network for operating a massive amount 
of Web services, including distribution of high-definition movies. In these computing clusters, power 
7 
 
consumption and system cost have increased. Therefore, it’s vital to downsize computing cluster 
without losing high dependability, including fault tolerance.   
To realize high-performance, power-aware, and highly dependable network, we have proposed a 
small computing cluster for embedded systems, called PEARL (PCI Express Adaptive and Reliable 
Link).  
Commodity network devices such as Gigabit Ethernet (GbE) and InfiniBand aren’t sufficient for 
small computing clusters. InfiniBand is a switched fabric communication link used in high-performance 
computing and enterprise data centers. It achieves high reliability but power consumption is relatively 
high.  GbE is a cost and power rival of InfiniBand. However, GbE does not match InfiniBand’s 
transmission performance. 
To achieve both high performance and low power consumption, PEARL uses PCI Express, a high-
speed serial I/O interface standard in PCs, not only for connecting peripheral devices but also as a 
communication link between computing nodes. To implement PEARL, we’ve developed a 
communication device called PEACH (PCI Express Adaptive Communication Hub), which acts as a 
switching device (Figure 5). PEACH with four PCI Express ports realizes high-performance 
communication of 4 x 20Gbps and power efficiency of 0.04W/Gbps. The power efficiency of 
InfiniBand 4X (Commodity network devices) is 0.083W/Gbps. Thus, PEACH provides 51.5% better 
























Figure 5. The communication link, PEARL, connects computing nodes with a PCI Express 
external cable.  
8 
Chapter 5 A Heterogeneous Multicore 
SoC for Secure Multimedia 
Applications 
 
Digital content protection standards such as DTCP-IP, Windows Media DRM (Janus) and Broadcast 
Flag have been established.  A vulnerability arises in which an encryption key can be disclosed or code 
can be easily modified to access data without authorization. 
In a secured accounting system, we need to develop a system that processes the decoding and the 
payment atomically. In a conventional system, the decryption and decoding operations are performed 
individually on different chips. When the encrypted contents are delivered, they are decrypted and 
restored to their original plain data format using the decryption key. Subsequently, the video data is 
decoded and images and audio are sent to audio/video output. 
However, we currently have a system problem that decryption key and decrypted contents are at risk 
for being stolen. Because decryption software is executed on non-secure hardware, the decryption key 
and decrypted contents could be disclosed without authorization. 
To realize a secure system, the best solution is to integrate all components in one chip. But, this is 
difficult to achieve with current silicon-process technology to at a reasonable cost. 
To solve these security and cost problems, we have developed a multicore SoC with SiP technology 
and an evaluation system 
 
The proposed concept of the secure media system consists of the following. 
1. Atomic operation of payment and viewing 
2. Multicore SoC and SiP for faster communication and decryption 
3. Hardware / software virtualization for strong security 
1) Atomic operation of payment and viewing 
The problem with a conventional system is that payment, decryption and image processing are 
themselves large monolithic side-attack targets. Atomic operation of these processes eliminates 
9 
 
problems of payment omission and copyright infringement from the illegal copying of data. In addition, 
the multicore SoC with SiP provides both tamper resistance and high performance because all 
communication routes are wired in the chip. 
2) Multicore SoC, DRAM, and Flash memory in one package (SiP) for faster communication and 
decryption 
Faster communication between external devices and faster decryption are indispensable when dealing 
with digital contents including motion video formats like MPEG. A multifunction motion video decoder 
is integrated on the heterogeneous multicore SoC to be compatible with MPEG-2/H.264/VC-1 on DTV 
(digital television) and DVD (digital video disc). A symmetric-key cryptography accelerator for 
decoding multimedia contents and a public key encryption IP for payment and user confirmation are 
also integrated. 
3) Hardware and software virtualization for strong hardware/software security 
To achieve a secured system, the multicore hypervisor virtualizes hardware resources and an OS 
(Operating System) and applications are prohibited from accessing hardware resources directly. To 
isolate the secure media block and the application block effectively, we set up a firewall between the 
secure and the application blocks using software (Figure 6). 
 
 





























2. 論文提出者 (1）所 属 電子情報科学専攻
ふり がな おおたに す が
(2）氏 名 大谷 寿賀子
3. 審査結果の要旨（600～650字）




ーし＿2.2あ盈？..ー空論玄ーは〉ー ー ー：その重要怠．構成．要素＿（［）＿ _ _ tコゴごあ．る＿Mi巳r_9__ C_9P.t!.9.U旦＿V.r..j_t(MQillJ三閉す
る研究成果をまとめたものである。具体的には、まずMC_V.の低消費電力化を図る新しいアーキ
テク ＿f:＿主主提案，」－な旦＿jJ車Jを．電力＿＜！）＿；＆ 主乏占治．る．内ー蔵.2＿三j＿＿を： －�－点王ー！J. を翠l墾的に扱三＿＿Qp_lJ_ ＿＿とー命
合＿2.壬ー立T.機構ーによ ーι－既－存技術l三比ーイミヱ ー�：－�ーで�.l?.Jd音＿（［）＿高;f._芽ルそでー翠l穿ーま実現＿l,＿た．？．泳J＿：三 �－J!:{f.
ゴ三重要ーとなる，:1:-.：.／.＿上叉ご－：＿2＿技筋．とん：I�・・＿$.＿ヨ＿z.三＿J?.CJ台＿ゴ－－�－タ.2.�.τ7.:-.乏搭載－」Z乙通信用－－�QC.. f. 
提案し一tと 9一一：！）I(＿ヨ？．コ．ヱ．；三」：．；奇最適捌f開 1三よ．ロ．、．．．主：＇／. _ - - ヨ．士.2.りj言－額1主内）：__ 既．存t支貧iJ三比．イ？？丈＿ 2
倍の芸会｝.伝:f�.効．李主案支！S1.ぷJ＿；こ主．実証.kt;?..最後．に、…土手堅旦 － 「J.T..1..技Jti.と ．し．主－ � － － －�－T ロジニ
アスヌルT.ヨヱ＿＿f?g_Q ＿＿と去の＿$＿W＿ ＿化．による．耐タ之β践の．向よ乏実現 ーしたーしソ..7..J.2.�.Z ：仮想化と
連携玄ゑ三．と.I�課金処理F主ヨ ーk宏之：'J..保護ーの－�Z.ヨ�＿,4＿＇）＿＿リー主三＿；！.＿ヨー�乏提案.tt�旦． ．．．． ．
．．．．以よーのよ：）J三.！..本研ー究ーはーMCVー の.z ；三三r::f.2._f:士 ーと.3'.l.�f:＿コ＿z_技術J三＿開＿k:I婁．裏な知見を与え
る.t.V.ヱあーり.！..案周的価値J立来賞．に高t� ？＿Jlt_フ..'Iλ一本論玄は1専．土．ーー〔エ竺）_ _J三館主ーるー と．判定する 。ー
不合格(1）判定（いずれかにO印）Q合
(2）授与学位
tを
学）士（工博
4. 審査結果
