On-chip Network Based Multiprocessors Design and Its Key Technology Research by 罗闳訚
 
学校编码：10384                                       分类号  密级    










博  士  学  位  论  文 
   
 
基于片上网络的多核微处理器设计及其关键技术研究 








专  业 名 称：电路与系统 
论文提交日期：2014 年 5 月 
论文答辩时间：2014 年 7 月 




答辩委员会主席：         
评    阅    人：         
 
 


































另外，该学位论文为（                            ）课题（组）
的研究成果，获得（               ）课题（组）经费或实验室的










































（     ）1.经厦门大学保密委员会审查核定的保密学位论文，
于   年  月  日解密，解密后适用上述授权。 







                             声明人（签名）： 




















需求。同时，作为 CPU 私有缓存的高速缓存（cache）的数量也随着 CPU 核数






































Large-scale chip-multiprocessors contain many CPU cores. These CPU cores 
work tegother to deal with the same task. Given that shared buses scale poorly on 
performance and P2P links scale poorly on area and energy. With the increasing of the 
on-chip CPU cores number, the conventional bus or P2P connection are unsuitable for 
large-scale chip-multiprocessors. On the other side, since chip-multiprocessors use 
cache as the private data buffers, the number of cache also increases with the 
increasing of the on-chip CPU cores number. The conventional cache is also 
unsuitable for large-scale chip-multiprocessors. For dealing with these key technical 
problems, this paper researches the low-power cache technology, the on-chip network 
technology, and the reconfigurable on-chip network technology. The main research 
content and key innovation are described as following: 
(1) The power consumption model for the memory subsystem of microprocessor 
is proposed. Based on this power consumption model, the dynamic relocation cache 
(DR cache) is proposed. DR cache uses trace based storage scheme to store 
instructions. For every incoming instruction, the DR cache assigns a new trace 
address according to the instruction execution sequence, then maps the trace address 
with the compiling address by the dynamic address mapping module. Furthermore, 
the “First in First replace” strategy is used to make sure the trace based storage 
scheme is still used even though the cache memory is full. 
(2) The hybrid circuit-switched (HCS) on-chip network is proposed. By mapping 
the AMBA (Advanced Microcontroller Bus Architecture) protocol onto the on-chip 
protocol through an AMBA wrapper, the HCS network is AMBA compatible. The 
HCS network is composed of bufferless switches, pipeline channels, and network 
interfaces. Furthermore, packets are transferred in a hybrid transmission scheme. If a 
message has only one packet, the transmission scheme for this message is packet 
switching. Conversely, if a message contains multiple packets, the transmission 














(3) An on-chip network with regular reconfigurable topology (RROCN) is 
proposed. Through the constructive algorithm and reconfiguration scheme, the 
RROCN can dynamicly construct any regular topology under the maximum topology 
constrains. Then, by using the modified XY routing algorithm with self-adaptive 
feature, the packets can correctly reach its destination in any reconfiguration topology 
structure. 
Finally, based on the technologies and schemes proposed in this paper, a 64-core 
chip-multiprocessor with on-chip network is design, and then implemented in a Xilinx 
Virtex 7 FPGA. 
 















目  录 
第一章 绪 论 ............................................................................................1 
1.1 多核微处理器概述 ............................................................................................. 1 
1.2 关键技术及其研究进展 ..................................................................................... 3 
1.2.1 低功耗 cache 技术 ...................................................................................... 3 
1.2.2 片上网络技术 .............................................................................................. 6 
1.2.3 可重构片上网络技术 .................................................................................. 8 
1.3 关键问题及其研究内容 ................................................................................... 10 
1.4 本论文章节安排 ............................................................................................... 12 
第二章 基本概念与模型 ........................................................................13 
2.1 微处理器及其内存子系统 ............................................................................... 13 
2.1.1 标量微处理器 ............................................................................................ 13 
2.1.2 超标量微处理器 ........................................................................................ 16 
2.1.3 多核微处理器 ............................................................................................ 18 
2.1.4 内存子系统 ................................................................................................ 20 
2.2 计算机网络简介 ............................................................................................... 22 
2.2.1 网络分层模型 ............................................................................................ 23 
2.2.2 电路交换传输 ............................................................................................ 25 
2.2.3 包交换传输 ................................................................................................ 27 
2.3 相关数学模型 ................................................................................................... 28 
2.3.1 线性规划 .................................................................................................... 28 
2.3.2 高斯整数及其网络建模 ............................................................................ 29 
2.4 本章小结 ........................................................................................................... 32 
第三章 动态重定位 cache ......................................................................33 
3.1 引言 ................................................................................................................... 33 
3.2 功耗模型 ........................................................................................................... 34 
3.3 硬件设计方案 ................................................................................................... 37 
3.3.1 架构 ............................................................................................................ 37 
3.3.2 动态重定位策略 ........................................................................................ 39 
3.3.3 地址映射模块 ............................................................................................ 41 
3.3.4 功耗分析 .................................................................................................... 46 
3.4 实验结果与分析 ............................................................................................... 48 
3.4.1 仿真实验环境 ............................................................................................ 48 
3.4.2 运行性能 .................................................................................................... 50 
3.4.3 面积与功耗 ................................................................................................ 52 














3.5 本章小结 ........................................................................................................... 59 
第四章 基于混合电路交换的片上网络 ................................................60 
4.1 引言 ................................................................................................................... 60 
4.2 AMBA 协议到片上网络的映射 ...................................................................... 61 
4.2.1 片上系统及其通讯协议 ............................................................................ 61 
4.2.2 映射方案 .................................................................................................... 63 
4.3 硬件设计方案 ................................................................................................... 66 
4.3.1 架构 ............................................................................................................ 66 
4.3.2 流水线通道和流量控制方案 .................................................................... 67 
4.3.3 交换机和混合传输方案 ............................................................................ 70 
4.4 实验结果与分析 ............................................................................................... 73 
4.4.1 仿真实验环境 ............................................................................................ 73 
4.4.2 物理实现结果 ............................................................................................ 74 
4.4.3 网络延迟 .................................................................................................... 78 
4.4.4 大吞吐率 ................................................................................................ 81 
4.4.5 吞吐率效率 ................................................................................................ 83 
4.5 本章小结 ........................................................................................................... 86 
第五章 基于规则拓扑的可重构片上网络 ............................................87 
5.1 引言 ................................................................................................................... 87 
5.2 可重构网络的数学模型 ................................................................................... 88 
5.3 硬件设计方案 ................................................................................................... 90 
5.3.1 架构及可重构拓扑 .................................................................................... 90 
5.3.2 硬件电路及可重构策略 ............................................................................ 92 
5.3.3 网络构造算法 ............................................................................................ 94 
5.3.4 自适应路由算法 ........................................................................................ 96 
5.4 实验结果与分析 ............................................................................................... 97 
5.4.1 仿真实验环境 ............................................................................................ 97 
5.4.2 物理实现结果 ............................................................................................ 98 
5.4.3 网络延迟 .................................................................................................. 101 
5.4.4 大吞吐率 .............................................................................................. 103 
5.4.5 吞吐率效率 .............................................................................................. 106 
5.5 本章小结 ......................................................................................................... 107 
第六章 基于片上网络的 64 核微处理器及其 FPGA 设计 .............. 109 
6.1 引言 ................................................................................................................. 109 
6.2 硬件设计方案 ................................................................................................. 110 
6.3 内存一致性 ..................................................................................................... 113 














6.5 本章小结 ......................................................................................................... 125 
第七章 总结与展望 ............................................................................. 126 
7.1 工作总结 ......................................................................................................... 126 
7.2 今后的研究方向 ............................................................................................. 128 
参考文献 ................................................................................................ 130 
附 录 硕博连读期间科研成果 ........................................................... 142 














Table of Contents 
Chapter 1 Introduction .............................................................................1 
1.1 Background of multiprocessors ............................................................................ 1 
1.2 Key techniques and research progress ................................................................. 3 
1.2.1 Low-power cache ............................................................................................ 3 
1.2.2 On-chip network ............................................................................................. 6 
1.2.3 Reconfigurable on-chip network ................................................................... 8 
1.3 Key problems and synopsis of our works .......................................................... 10 
1.4 Main work of this thesis ...................................................................................... 12 
Chapter 2 Basic definitions and models ................................................13 
2.1 Microprocessor and memory subsystem ............................................................ 13 
2.1.1 Scalar microprocessor .................................................................................. 13 
2.1.2 Superscalar microprocessor ......................................................................... 16 
2.1.3 Multi-core microprocessors .......................................................................... 18 
2.1.4 Memory subsystem ....................................................................................... 20 
2.2 Computer networks ............................................................................................. 30 
2.2.1 Hierarchical Network Models ...................................................................... 23 
2.2.2 Circuit switching ........................................................................................... 26 
2.2.3 Packet switching ............................................................................................ 27 
2.3 Related mathematical models ............................................................................. 29 
2.3.1 Linear programming .................................................................................... 29 
2.3.2 Gaussian integer and network modeling .................................................... 30 
2.4 Conclusions ........................................................................................................... 32 
Chapter 3 Dynamic relocation cache ....................................................33 
3.1 Introduction .......................................................................................................... 33 
3.2 Power model ......................................................................................................... 34 
3.3 Hardware design scheme ..................................................................................... 37 
3.3.1 Architecture ................................................................................................... 37 
3.3.2 Dynamic relocation scheme .......................................................................... 39 
3.3.3 Address mapping module ............................................................................. 41 
3.3.4 Power consumption analysis ........................................................................ 46 
3.4 Experimental results and analysis ...................................................................... 47 
3.4.1 Simulation environment ............................................................................... 47 
3.4.2 Runtime performance ................................................................................... 49 














3.4.2 Power breakdown ......................................................................................... 56 
3.5 Conclusions ........................................................................................................... 59 
Chapter 4 Hybrid circuit-switched on-chip network ..........................60 
4.1 Introduction .......................................................................................................... 60 
4.2 Mapping of AMBA protocol onto on-chip network .......................................... 61 
4.2.1 System-on-chip and communication protocol ............................................ 61 
4.2.2 Mapping scheme ............................................................................................ 63 
4.3 Hardware design scheme ..................................................................................... 66 
4.3.1 Architecture ................................................................................................... 67 
4.3.2 Pipeline channel and flow control scheme .................................................. 68 
4.3.4 Switch and hybrid transmission scheme ..................................................... 70 
4.4 Experimental results and analysis ...................................................................... 73 
4.4.1 Simulation environment ............................................................................... 73 
4.4.2 Implementation results ................................................................................. 74 
4.4.3 Latency ........................................................................................................... 78 
4.4.4 Maximum throughput .................................................................................. 81 
4.4.5 Throughput efficiency ................................................................................... 83 
4.5 Conclusions ........................................................................................................... 86 
Chapter 5 On-Chip network with regular reconfigurable topology ..87 
5.1 Introduction .......................................................................................................... 87 
5.2 Mathematical model ............................................................................................ 88 
5.3 Hardware design scheme ..................................................................................... 91 
5.3.1 Architecture and reconfigurable topology .................................................. 91 
5.3.2 Reconfiguration scheme ............................................................................... 93 
5.3.3 Netwok Constructive algorithm ................................................................... 94 
5.3.4 Self-adaptive routing algorithm ................................................................... 97 
5.4 Experimental results and analysis ...................................................................... 97 
5.4.1 Simulation environment ............................................................................... 97 
5.4.2 Implementation results ................................................................................. 98 
5.4.3 Latency ......................................................................................................... 101 
5.4.4 Maximum throughput ................................................................................ 103 
5.4.5 Throughput efficiency ................................................................................. 106 
5.5 Conclusions ......................................................................................................... 107 
Chapter 6 64-core NOC and the FPGA implementation ................. 109 
6.1 Introduction ........................................................................................................ 109 
6.2 Hardware design scheme ................................................................................... 110 














6.4 Experimental results and analysis .................................................................... 118 
6.5 Conclusions ......................................................................................................... 125 
Chapter 7 Summary and future work ............................................... 126 
7.1 Summary ............................................................................................................. 126 
7.2 Future work ........................................................................................................ 128 
References ............................................................................................. 130 
Appendix  publication list during post graduation ........................ 142 






























































CMPs 在单芯片上集成了多个 CPU 核，各 CPU 可以相对独立的运行程序及
访问外设，并通过片上通讯结构进行快速的消息传递。美国斯坦福大学 Hydra
课题组是较早进行 CMPs 研究的课题组之一，并在 1996 年研究开发了 Hydra 处
理器[12]。Hydra 处理器集成了 4 个 MIPS CPU 核，每个 CPU 核拥有私有的一级
指令和数据高速缓存（cache），并通过总线结构共享二级 cache。随后，CMPs
也逐渐被商业公司所重视，SUN，IBM 等公司将 CMPs 的思想引入到高性能商
用服务器的微处理器设计中。其中，IBM 与 Sony、Toshiba 于 2001 年合作推出




系列微处理器[14]以及 AMD 公司的 Athlon 系列微处理器[15]均采用 crossbar 作
为通讯结构。 
当片上多核微处理器的 CPU 核数目增大到一定程度时（如 64 核或更多），
无论是共享总线还是交叉开关都无法在合理的面积和功耗约束下满足系统对通
讯带宽的需求[16-17]。因此，大规模 CMPs（如 Intel SCC[18]和 TILEPro64[19]）
需要一种更加灵活且高效的互联结构来满足多核之间的通讯需求。在另一方面，
大规模 CMPs 中集成了大量的 CPU 核，而 CPU 核中的 cache 将会消耗大量的能
量[20-22]。在嵌入式系统的低功耗需求以及芯片的封装散热限制下，大规模CMPs


















Degree papers are in the “Xiamen University Electronic Theses and Dissertations Database”. Full
texts are available in the following ways: 
1. If your library is a CALIS member libraries, please log on http://etd.calis.edu.cn/ and submit
requests online, or consult the interlibrary loan department in your library. 
2. For users of non-CALIS member libraries, please mail to etd@xmu.edu.cn for delivery details.
厦
门
大
学
博
硕
士
论
文
摘
要
库
