マイクロ ドウサ レベル セッケイ シヨウ キジュツ カラノ パイプライン プロセッサ ゴウセイ by イトウ, マキコ et al.
Osaka University
Title Pipelined Processor Synthesis from Micro-operation LevelSpecification
Author(s)Itoh, Makiko
Citation
Issue Date
Text VersionETD
URL http://hdl.handle.net/11094/410
DOI
Rights
Pipelined Processor Synthesis from 
Micro-operation Level Specification 
Makiko Itoh 
J組U訂y， 2001
。
雲
宍oaoRno
一O『no
コマ
o一勺
ωRYmm
∞一COOMRW
コ
O
『@O
コ〈包
Egz
-《oaωRosv
、ωnω
一。
:?
問
ω 
4匙
-. 
問
∞ 
Pipelined Processor Synthesis from 
Micro-operation Level Speci五cation
Doctoral Dissertation 
by 
Makiko Itoh 
Department of Informatics and Mathematical Science 
Graduate School of Engineering Science 
Osaka U niversity 
ti--
つム
A『AA
ヨハ
b
氏
U
Introduction 
1. 1 Background . 
1.2 ASIP Development 
1.3 Objective 
1.4 Approach 
1.5 Contribution of the Research. . 
1.6 Organization of the Thesis . . 
Contents 
1 
QυQdQdnU1
ムつ』つリ
ー』ム
41
ム
1l
ム『
iム
Related Work 
2.1 HW/SW cかdesign in ear ly years 
2.2 Recent ASIP Development System 
2.2.1 Prepared Processor based Approach . . . . . . 
2.2.2 Software Development Tool Generation Systems . 
2.2.3 Processor Synthesis Systems . 
Problems of Existing Processor Descriptions 
2 
FbPOGO--η
，，白つ剖
AH&A
宝。。。
δQU
??????
PEAS-III: Processor Design Environment 
3.1 Characteristics of Modern Processor Architecture 
3.2 Design Methodology 
3.2.1 Flexible Hardware Model. . 
Micrかoperation Level Processor Specification 
3.3.1 Design Goal and Architecture Parameter Setting. . 
3.3.2 Resource Declarations 
3.3.3 Instruction Format Definitions . 
3.3.4 Interrupt Condition Definitions 
3.3.5 Interface Definitions 
3.3.6 Micro Operation Descriptions 
2.3 
3 
3.3 
qdq
、υ
凋告。。口。
???
Processor Model 
4.1 Processor Clωs 
4.2 Processor organization 
4.3 Datapath Model. . 
4.4 Controller Model 
4 
4.4.1 Instruction Decoder . . . 
4.4.2 Pipeline Stage Controller . 
4.4.3 Interrupt Controller . 
????qJA
斗ARU
A Grammar of Micro-operation Level Processor Specification 
A.1 Organization of Micrかoperation Level Speci五cation
A.2 Architecture Parameter. . 
A.2.1 Design Goal . 
A.2.2 Pipeline Processor Parameters . 
A.3 Interface Definition . . . . . . . . . 
A.4 Instruction Format Definition 
A.4.1 Instruction Type DefiniもlOn
A.4.2 Instruction De五nition . 
A.5 Resource Declaration . 
A.6 Interrupt Definition . . . . 
A.7 Micrかoperation Description 
A.7.1 Variable 
A.7.2 Constant. 
A.7.3 Storage 
A.7.4 Operands 
A.7.5 Function. 
A.7.6 Assignment statement 
A. 7. 7 If-statement 
A.7.8 Decode statement . 
5 Processor Synthesis 
5.1 Datapath Synthesis . 
5.1.1 DFG generation . . . . . 
5.1.2 Basic Datapath Synthesis 
5.1.3 Signal Confiicts Resolution . 
5.1.4 Pipelining. 
5.2 Controller Synthesis 
5.2.1 Control Signal Extraction 
5.2.2 lnterlock Condition Extraction 
5.2.3 Branch Condition Extraction 
5.2.4 Instruction Decoder Synthesis 
5.2.5 Stage Controller Synthesis . 
5.2.6 Interrupt Controller Synthesis 
qoqοropO
ハhU
丹、
u
ハhu
円hu
口δQunU
円4A
せ
???????
6 Experiments 
6.1 Objective of the Experiments 
6.2 Basic RISC Processor. . . . 
6.3 PEAS-I Processor Core . 
6.4 Embedded RISC Controller 
6.5 Pipeline Stage Tuning 
6.5.1 Changing the Number of Pipeline Stages 
6.5.2 Clock Frequency Improvement . 
6.6 Design Space Exploration for DSP Application . 
6.6.1 Customization of PEAS R3K 
7
ウ
d
公
unu
つム
4&vb
ウ
t
父
UQd
円i
ヴ
t
ウtQ000000
。。。。。。。
B Processor Specification of PEAS R3K 
C Synthesis Result of PEAS R3K Processor 
C.1 VHDL Descriptionf of PEAS R3K Datapath 
C.2 VHDL Descriptionf of PEAS R3K Controller . 
6.6.2 Pipeline Stage τ'uning for Derivative Processors . . . . . . . 91 
6.6.3 Results of Design Space Exploration for DSP Applications .. 91 
7 Discussion 
7.1 Design Space 
7.2 Design Time and Design Space Exploration Time 
7.2.1 Design Time for New Processors . 
7.2.2 Design Time for Derivative Processors 
7.3 Design Quali ty 
97 
97 
99 
99 
. 100 
. 101 
8 Conclusions and Future Work 
8.1 Conclusion........... 
8.2 Future Work. . . 
8.2.1 Design Space Expansion 
8.2.2 Design Exploration Time Reduction . 
8.2.3 Improvement of the Design Quality 
103 
. 103 
. 104 
. 104 
. 105 
. 106 
1 III 
113 
. 113 
114 
. 114 
. 115 
. 115 
. 116 
. 116 
. 117 
. 117 
. 118 
. 119 
. 120 
. 120 
. 120 
. 121 
. 121 
. 121 
. 122 
. 122 
123 
139 
. 139 
. 150 
Abstract 
In an embedded system area, application specific instruction set processors (ASIPs) 
provide better performance, lower power consumption and a smaller chip area than 
general purpose processors. However, the design time of ASIPs becomes longer with 
the growth of the design scale. Higher abstraction level processor design method 
is required more than a traditional register transfer level (RTL) processor design 
method. The processor designs at RTL require a long design time because the deｭ
signer has to design datapath and controller structures while considering the assignｭ
ment of registers, functional units, interconnects among them, and the organization 
of the finite state machine of the controller. Designing processor organization at RTL 
台om instruction set architecture level processor specification is an error-prone and 
time-consuming task. In addition, the modification of 七he processor specification 
requires a long time for re-design of datapath and controller at RT level. Thereforeヲ
comparing with several design candidates for specific application in a short design 
time is di伍cult.
In this thesis ヲ micro-operation level pipelined processor specification and a proｭ
cessor sy瓜hesis method from a micrかoperation level processor description are prcト
posed for the improvement of design productivity of the ASIP development. The 
higher abstraction level than RTL contributes to the easiness of design and design 
modification for ASIP design. The ease of specification and modification of prか
cessor architecture enables architectural exploration of a large design space in a 
short design time. The designer only spec泊ed clock based instruction behavior in 
micrかoperation level specifications. Datapath and controller of the processor are 
synthesized from the behavioral description of instructions. 
The design space of micrかoperation level processor specification is large enough 
V 
for practical straightforward pipelined processors. Exploration of larger design space 
enables the designer to select a more suitable architecture for the target application. 
The target architecture of micrかoperation level processor speci五cation includes the 
following features: user-defined pipeline organization in terms of the number of 
pipeline stages, the number of delayed branch slots and the role of each pipeline 
stage; clock based behavioral representation of instructions and interrupts; utiｭ
lization of parameterized hardware modules; and user-defined instruction format , 
processor interface ports and external interrupt conditions. 
Processor synthesis from the micrかoperation level processor speci白cation inｭ
cludes datapath synthesis and controller synthesis. The synthesis of datapath and 
controller allows the designer to concentrate on instruction set design and evaluate 
various arcmtecture candidates in a short time. In datapath synthesis, data fiow 
graph generation 仕om micro-operation description, signal confiicts resolution and 
insertion of pipeline registers are performed. In controller synthesis, instruction deｭ
coder) pipeline control logic such as pipeline stall and pipeline fiush , and external 
interrupt control are synthesized. 
From experimental resultsう the effectiveness and feasibility of the proposed proｭ
cessor synthesis method were evaluated. Examples in experiments are a MIPS R3000 
compatible processor, DLX, PEAS-I core, a sirnple RISC controller, and a cusｭ
tomized MIPS R3000 processor for DSP application. The amounts of processor 
design time and design modification time were drastically reduced compared with 
that of conventional RT level manual design. Processor synthesis time was about 
two minutes for the processor, which has 52 instructions. The design space of pracｭ
tical processors was explored at an architecture level in a short design time. In the 
design quality of synthesized processors and manually design processors, the clock 
frequencies are almost the same. The area of synthesized processors is about 20% 
larger than that of manually designed processors. Though the area is inferior to 
manual design, the advantage of e紅白tive design space exploration has an impact 
on the total design quality. The effectiveness of the micro-operation level processor 
specification and processor synthesis for architectural design space exploration is 
con五rmed.
vl 
The proposed processor synthesis method enables the designer to explore a large 
design space at an architectural level. By the architectural exploration of a large 
design space, design productivity for application specific processors is drastically 
improved. 
Vll 
Acknow ledgll1ents 
1 would like to express my deepest gratitude to my supervisor Prof. Masaharu Imai, 
Osaka University, for introducing me to this research area and guiding this work , 
for providing al the facilities to carry it out , and for continuous support , help and 
encouragement. 
1 would also like to express my thanks to Prof. Ken-ichi Taniguchi and Prof. 
Teruo Higashino for helpful suggestions and comments in writing this thesis. 1 
wish to express my thanks to the professors and staff of the Department of Inforｭ
matics and Mathematical Science, Graduate School of Engineering Science, Osaka 
University, for their guidance, especially the late Prof. Seishi Nishikawa, and the 
late Prof. Mamoru Fujii, and Prof. Masaru Sudo, Prof. Nobuki Tokura, Prof. Akiｭ
hiro Hashimoto, Prof. Toru Kikuno, Prof. Hideo Miyahara, Prof. Toshinobu Kashiｭ
wabara, Prof. Toru Fujiwara, Prof. Katsuro Inoue, Prof. Kenichi Hagihara, Prof. 
Masay叫ö Murata, Prof. Toshimitsu Masuzawa, Prof. Tadahiro Kitahashi and Prof. 
Shinichi Tamura. 
The author is extremely thankful to Prof. Yoshinori Takeuchi, Prof. Akira Kiｭ
tajima from Osaka University, Prof. Jun Sato 丘om Tsuruoka N ational College of 
Technology and Prof. Akichika Shiomi 仕om Shizuoka U niversity for their continｭ
uous support , help and encouragementう and many tha叫cs to al members of the 
PEAS project for their kind assistanceヲ especially， Dr. N. N. Binh, Mr. Yoshimichi 
Homma, Prof. Takumi Nakano, Prof. Tsutomu Kimura, Mr. Nobuyuki Hikichi 台om
Software Research Associates, Inc. , and to the members of the VLSI System Design 
Laboratory at Osaka University, especially, Ms. Akiko Fujii, Ms. Ranko Morimoto, 
Mr. Takafumi Morifuji, Mr. Norimasa Ohtsuki, Mr. Shigeaki Higaki, Mr. Shinsuke 
Kobayashi , Mr. Yoshiharu Watanabe, Mr. Tomohide Maeda and Mr. Naoki Morita. 
lX 
The author would like to express her special thanks to Dr. Tokinori Kozawa, 
Mr. Michiaki Muraoka and Dr. Eiji Masuda from STARC (Semiconductor Technolｭ
ogy Academic Research Center) , Dr. :Y.Iasao Hottaヲ Mr. Mitsuho 8eki, Mr. Yoshio 
Takarnine from Hitachi Limited, and Mr. Masanobu Mizuno from Matsushita Elecｭ
tric Industry Corporation. 
This research was supported in part by STARC. 
X 
Chapter 1 
Introduction 
1.1 Background 
With advancements in semiconductor technology, chip complexityう that is, the numｭ
ber of transistors on a silicon chip, is doubling every three years. In the ne訂 future ，
it is estimated that over 10 million transistor circuits will be realized on a silicon 
chip of only 1 cm2. From the technology innovation, System-on-a-Chip (SoC) with 
compound functionalities integrated on a single chip tends to be widely used in 
electronic equipment [1]. Figure 1.1 shows a typical orga凶zation of SoC. SoC usuｭ
ally consists of a combination of the following components: processors such as CPU 
core, digital signal processor (DSP) and specific processors; A81Cs such as signal 
processing hardware, control hardware and other specific hardware; memories like 
flash and DRAM; and analog circuits. L81 designs are moved 仕om the individual 
design of microprocessors and application-specific integrated circuits (A8ICs) to a 
whole system design on a chip. 
In SoC design, design productivity is a key issue. The roadmap of SEMATｭ
ECH [2] indicates the growing productivity gap between available transistors and 
those that can be designed in microprocessors. The growth of transistor density is 
58% per year. On the other hand, the growth of design productivity is only 21 % per 
year. 1n addition, the rapidly changing technological environment shrinks product 
life cycles and shortens time-tかm紅ket .
l 
日回目
B己主 、，
Figure 1.1: Typical Organization of System-On-a-Chip. 
1.2 ASIP Development 
Focusing on the design of application specific instruction set processors (ASIPs) 
that are integrated to the SoC, HardwarejSoftware 時design [3] environment with 
architectural design space exploration is considered to be key to design productivity 
improvement. For the ASIP design, it is important to explore suitable processor 
architecture for the tぽget application. The HW jSW cかdesign environment enｭ
ables the designer to design and evaluate the processor while considering target 
application and suggests the direction for design improvement. Using the HW jSW 
co-design environment, the designer is able to design and evaluate various architecｭ
ture candidates at instruction set architecture level easily. As a consequence, the 
designer is able to choose the most suitable architecture for the target application 
in a short design time. 
Figure 1.2 shows one HW jSW cかdesign 台amework for effective design space 
exploration. The designer speci五es processor architecture with entry system at inｭ
struction set architecture level. Processor components such as registers う memory
access units, functional units and so on are instantiated from a module library. The 
module library provides instances at various abstraction levels. A processor synｭ
thesizer generates a simulation model and a synthesizable model of the designed 
2 
Libræγ 
Model A 
Ccトverifier
/ Estimator 
Figure 1.2: HW jSW Cかdesign Framewor k 
processor. The processor synthesizer receives instances used in the processor 仕om
the database manager of the library. Software development tools such as a compiler 
and an assembler are also generated 仕om the same processor description as processor 
synthesis. The synthesized instruction set simulation model and SW development 
tools enable co-verification and performance evaluation of the designed processor. 
A generated RT level processor model is used to estimate area, clock 台equencyand
power consumption. Estimation and veri五cation results suggest the direction for 
improvement of the processor design. Using a HW jSW cかdesign 仕amework like 
this , exploration of large design space becomes possible because the turn-around 
time of the ASIP is drastically reduced. 
The following techniques 紅e required to implement a HW jSW co-design 仕ame­
work: instruction set level processor specification, processor synthesis and software 
development tool synthesis method 企om the same processor description, and fast 
estimation of designed processor. The processor synthesis method of the HW jSW 
co-design system often limits the design space of the system. In the recent research, 
several HW jSW cかdesign methods and processor synthesis methods are proposed, 
but their design space is veηT small in regard to pipeline orga凶zation .
3 
Z 一ーー一一ー一一一一 一一一一一一一 一一一一一寸|
1.3 Objective 
The aim of this research is an investigation of a processor synthesis method for 
exploration of a large design space for ASIPs. Because processor synthesis methods 
usually limit the design space of the HW jSW co-design environment, processor 
synthesis methods should support various other architecture candidates of the ASIPs 
for architectural design exploration. 
To explore a large design space, two requirements must be satisfied: a short turnｭ
around time for evaluation of various candidates, and a large design space. Even if 
the design space is large enough, the design space cannot be fully explored in the 
restricted design time if the turn-around time for the design is too long. There is 
a tradeoff between the easiness of the processor specification and the design space. 
An appropriate abstraction level for processor specification should be considered. 
1.4 Approach 
Considering the tradeoff between the easiness of the processor speci五cation and the 
design space of the specification language, micrかoperation level processor specificaｭ
tion for processor synthesis and a processor synthesis method for micro-operation 
level speci五cation are proposed in this thesis. 
Micro-operation level processor specification is based on a clock base behavｭ
ioral description of instructions. With the abstraction level processor specification 
higher than the RT level, the design time and design modi五cation time of the ASIPs 
紅e drastically reduced. Despite the easiness of the specification, the design space of 
micro-operation level processor specification enables the designer to specify practical 
straightforward pipelined processors. The designer can specify the pipeline organiｭ
zation, hardware module configuration and external interrupts. From these points, 
a Illlcrかoperation level is appropriate for a straightforward pipelined processor in 
terms of the easiness of the design and design space. 
At a micrかoperation level processor design, datapath structure and controller 
紅e synthesized 仕om behavioral description of instructions and hardware module 
con五guration. The designer is free 丘om tedious, eロor-pronedatapath and controller 
4 
design. Therefore the designer can design various ASIPs in a short design time. 
The target processor architecture is straightforward pipelined architecture that 
includes basic functionality of embedded microprocessors [4. 5: 6: 7 、 8 ぅ 9] such as 
multi-cycle operation, delayed branch and external interrupts. Micro-operation level 
processor speci五cation includes: the number of pipeline stages and the number of 
delayed branch slots; utilization of parameterized hardware modules; user-defined 
instruction format , processor interface ports and external interrupt conditions; and 
clock-based behavioral representation of instructions and interrupts. Operations of 
each pipeline stage are speci白ed by the designer wi th micrかoperation description of 
instructions. The pipeline depth, role of each pipeline stage and hardware modules 
have an impact on clock frequency and area. The number of delayed branch slots 
affects code size and execution cycles. Therefore, fiexibility in the processor archiｭ
tecture , such as the number of pipeline stages and delayed branch slot and the role 
of pipeline stagesう and in the con五guration of hardware modules, allows exploration 
of a large design space. 
For the processor synthesis method from a micro-operation level processor speciｭ
fication , datapath and controller synthesis is required for user-de五ned pipeline orgaｭ
nization in terms of the number of pipeline stages, the number of delayed branch slot 
and role of each pipeline stage. The controller synthesis includes pipeline control 
logic synthesis for pipeline hazards う interrupt controller synthesis, and instruction 
decoder synthesis. Structural hazards are caused by multi-cycle operations and reｭ
source confiicts 仕om multiple stages. Hence generation of the hazard detection logic 
and pipeline interlock logic is required. To deal with the specified number of delayed 
branch slots, generation of branch control and pipeline fiush control logic are also 
required. 
In this thesis, to deal with user-defined pipeline organization, a fiexible pipelined 
processor model is proposed. The model has fiexibility regarding the number of 
pipeline stages and pipeline controllogic. The model consists of datapath and conｭ
troller of each pipeline stage, instruction decoder and external interrupt controller. 
The pipeline control rnechanism using the model for pipeline interlock and pipeline 
fiush is discussed. The processor model and pipeline control rnechanism supports 
5 
τ一一一一一一一一一一一一一一一一rlIIIIIoIIr..!._y- 一一一一一一一 一一一一-
the processor synthesis from the micro-operation level processor specification. 
Finally, the processor synthesis method based on the processor model is proposed. 
Processor synthesis from the micro-operation level processor specification includes 
datapath structure synthesis and controller synthesis. Synthesis of datapath and 
controller allows the designer to concentraもe on instruction set design and evaluate 
various architecture candidates in a short time. In datapath synthesis, data fiow 
graph generation 仕om micro-operation description , signal confiicts resolution and 
insertion of pipeline registers are performed. In controller synthesis, instruction 
decoder, pipeline controllogic such as pipeline stal and pipeline fiush , and external 
interrupt control are syn七hesized .
1.5 Contribution of the Research 
The e旺ectiveness of architectural design space exploration using the proposed proｭ
cessor design method and synthesis method is known from the experimental results. 
Design and design modification time is reduced compared with the RT level prか
cessor design. Processor design space was successfuly explored at an architecture 
level in a short design time. Processor synthesis time was about two m匤utes for the 
processor, which has 52 instructions. 
In the design quality of synthesized processors and manually design processors, 
the clock 丘equencies are almost the same. The area of synthesized processors is 
about 20% larger than those of manually designed processors. Though the area is 
inferior to manual design, the advantage of effective design space exploration has an 
impact on the total design quality. 
Consequently, the effectiveness of the micrかoperation level processor specificaｭ
tion and processor synthesis for architectural design space exploration is con五rmed.
By the architectural exploration of a large design space, design productivity for 
application specific processors is improved drastically. 
1.6 Organization of the Thesis 
This thesis is organized as follows. 
6 
In Chapter 2, ex﨎ting HW / SV.,r cかdesign environments, customizable processor 
cores and processor sy凶hesis methods are reviewed. Problems of existing methods 
are discussed. 
Chapter 3 describes micrかoperation level processor specification and processor 
design environment PEAS-III. To determine parameters and user-de五nable parts 
of target processors, the characteristics of processor architecture are classi五ed and 
evaluated in view of their impacts on performance and area on the processor. 
In Chapter 4 ヲ thepipel匤ed processor model for processor synthesis is ilustrated. 
The model consists of datapath, an instruction decoder, a pipeline controller and 
an external 匤terrupt controller. The conditions of pipeline interlock and pipel匤e 
fiush are considered. The pipeline controller mechanism using these conditions 﨎 
expla匤ed. 
Chapter 5 is devoted to the processor synthesis method. The datapath and 
controller synthesis methods are described. The datapath synthesis includes data 
fiow graph generationぅ signal confiicts resolution, and pipeline register insertion. 
The controller synthesis includes instruction decoder synthesis, pipeline controllogic 
synthesis and interrupt controller synthesis. 
In Chapter 6, the effectiveness of the method is evaluated through several exｭ
periments and architectural design space exploration is demonstrated. Examples 
in experiments are a MIPS R3000 compatible processor, DLX, a simple RISC conｭ
troller, PEAS-I core, and a customized MIPS R3000 processor for DSP application. 
The amount of processor design time was drastically reduced compared with that 
of conventional RT level manual design in HDL. The processor design space was 
successfully explored at an architecture level 匤 a short des刕n time. 
Chapter 7 presents the discussion of this thesis. The design space, design producｭ
tivity and design quality of the proposed processor synthesis method are evaluated. 
The direction of further expansion of des刕n space, reduct卲n of turn-around time 
and improvement of the design quality are discussed. 
The last chapter discusses the research results and concludes with future work. 
7 
E 一一一一一一一一一一一て~ 一一一一一 一一一一一寸
Chapter 2 
Related Work 
In this chapter, related work for application specific instruction set processors (ASIPs) 
design is reviewed. 
2.1 HW jSW co-design in early years 
HW jSW co-design systems in early years 訂e closely connected to the base prか
cessor of the system. The system adopts parameterized processor cores. Howeverヲ
datapath structure and pipeline organization are almost restricted. PEAS-I [10] ヲ
Sat叫 [11] and ARC [12] are clωsified to this approach. In these systems, the 
glven con五gurable arcrutecture is tuned to specific application by chang匤g some 
arcrutectural parameters such as bit w冝th of hardware functional blocks, register 
file síze, memory size, etc. The super set of instructions that can be executed on 
adopted processor arcrutecture for the system is restricted. The system does not 
allow the user-defined extension instructions, so that the system cannot always fully 
satisfy the demand of diverse applications. User-defined instructions for extension 
are required to gain high performance. 
2.2 Recent ASIP Development System 
In the recent research, several ASIP development systemsぅ whichpermit user-defined 
application specific instructions to be equipped with the target processor, have been 
proposed. These systems use their original processor description language to deｭ
scribe the target processor's instruction set and the hardware structure. From the 
9 
「ー一一一一一一一一一一一一ιι 一一一一一一一一一一一一一
processor description, a code generator, an instruction set simulator and HDL deｭ
scription of the targe七 processor are generated. 
ISPS [13] is a common processor description for code generation, simulation and 
processor synthesis in 1980's. While itincorporates a rich set of control mechanisms 
to describe paralelism and synchronization of processes ヲ the synchronization mechｭ
anisms are inadequate to model pipeline operations and hazards for modern pipeline 
processors. 
The other processor description based ASIP design systems for pipelined procesｭ
sors are classified into three types. 
1. Adding several dedicated instructions to already designed processors. This 
approach includes FLEXWARE [14], Xtensa [15]， τTimaran [16], CASTLE [17] 
and MetaCore [18]. 
2. Software development tool generation and performance evaluation system for 
its original processor speci五cationlanguage. This approach includes ISDL [19] 
based system, Expression [20] b脱d system and LISA [21] based system. 
3. RT level processor HDL description synthesis system for its original processor 
specification language. This approach includes MIMOLA [2] based system, 
nML [23] based system, AIDL [24] based system, and [25]. 
2.2.1 Prepared Processor based Approach 
Processor descriptions in the first approach describe instruction set and portion of 
the datapath structure. In the approach, their pipeline organizations are fixed , so 
that modification of pipeline control does not allowed. 
In the FLEXWARE [14], user-defined instructions can be described by the combiｭ
nation of generic instructions. The generic instructions are supported by instruction 
set simulator model of the FLEXWARE in VHDL. The designer can specify exeｭ
cution cycles for each instruction, but cannot specify pipeline organization. 羽1hile
FLEXWARE supports the retargetable code generator CodeSyn and the instruction 
set simulator Insulin, i七 doesn ' t suppo口 processor synthesis. 
10 
--四一一ー一一一一一一一
Xtensa [15] uses a customizable processor core. Xtensa permits some user-defined 
instructions using 百nsilica Instruction Extension Language (TIE). While Xtensa 
supports both processor synthesis and software development tool generation , userｭ
defined instructions must be executed in restricted cycles. The designer can only 
describe behavior of the instructions and the structure of “execut ion" stage, but 
he/she cannot change the number of pipeline stages and other pipeline stages. 
τ'rimaran uses processor description language MDes [26], which describes both 
behavior/structure of the target processor. Trimaran allows only a restricted retarｭ
getability of the simulator to the HPL-PD [27] processor family. 
CASTLE [17] specifies target processor's datapath in block diagram and generｭ
ates VHDL description of a processor. The target arcrutecture of CASTLE is VLIW. 
The feature of CASTLE includes: instantiation of VHDL descriptions for functional 
units 仕om a module library, automatic input signal confl.ict resolution by selector 
insertion, and generation of VLIW control word for speci白ed datapath. However, 
CASTLE assumes a basic VLIW architecture and cannot change pipeline stages. 
MetaCore [18] is an application specific DSP development system. MetaCore 
prep紅白 basic and extended instruction set , and additional user-defined instructions 
are permitted. Net-list level description of the datapath structure and behavioral 
description of instructions 紅e described as a specification of a target processor. 
From these descriptions, software development tools and an HDL description of the 
target processor are synthesized. Howeverぅ additional execution units are specified 
only for the “execution" stage. Additional execution units for other stages and 
changing the number of pipeline stages are not permitted. 
2.2.2 Software Development Tool Generation Systems 
Processor descriptions in the second approach describe an instruction set and strucｭ
ture of datapath. The designer can define pipeline structure of the target processor 
? terms of the nurnber of pipeline stages and operations in each pipeline stage. 
ISDL [19] [28] is one of such approach that describes an instruction set and 
datapath structure. In ISDL う constraintsof pipeline execution are explicitly specified 
through illegal operation groupings. This is tedious for complex arcrutectures like 
11 
DSPs that permit operation parallelism. 
EXPRESSION [20] specifies an instruction set and datapath structure. A Pipeline 
description provides a mechanism to specify the order of pipeline stages. Accurate 
reservation tables can be generated 台omthe description. While EXPRESSION supｭ
ports cycle-accurate instruction set simulation by SIMPLESS [29], processor synｭ
thesis has not been su pported. 
LISA [21] [30] describes the datapath structure and operation-level description of 
the pipeline. LISA describes activation relationship among pipeline stages, pipeline 
stals and pipeline fiushes. However, LISA is used for retargetable simulators [31]. 
Processor synthesis has not been supported, either. Furthermoreう description of 
pipeline control is tedious 七o design and to modify branch instructions and multiｭ
cycle operations. 
2.2.3 Processor Synthesis Systems 
In the last approach, both behavior and datapath structure of the target processor 
are described. Synthesizable processor HDL descriptions are generated. 
MIMOLA [2] describes behavior and structure of the target processor and genｭ
erates RT level processor description. However, pipeline control is not supported 
since MIMOLA is micro-code based approach. 
nML [23] describes behavior of instructions and datapath structure. From nML 
description , an instruction set simulator is generated [32]. nML isused by the retarｭ
getable code generation environment CHESS [3] to describe DSPs and ASIPs. Prか
cessor synthesis tool "Go" isalso developed for nML processor description. However, 
nML does not directly support complex pipeline control such as pipeline interlock. 
AIDL [24] specifies operations of each pipeline stage and timing relations and 
causejeffect relations among pipeline stages. Using AIDL, various kinds of procesｭ
sors can be represented including processors with out-of-order completion. However, 
the modi五cation of the design is di伍cult for complicated architecture because the 
designer have to consider various kinds of dependency in the inter-instruction beｭ
havior. 
Hamabe, et al. [25] proposed a description of clock based instruction behavior 
12 
and pipeline stage information includes the correspondence of hardware units to the 
stage that contains their operations. However, designers must describe instruction 
behaviors considering with pipeline registers. Furthermore 、 pipeline control is not 
directly described. 
2.3 Problems of Existing Processor Descriptions 
Existing processor development systems have some problems. 
1. Existing processor development systems need both structural and behavior 
description of the target processor in order to generate the processor. Describｭ
ing a datapath structure wastes design time. Furthermore , for design space 
exploration it is tedious to describe datapath structure in consideration of 
consistency between behaviorjstructural descriptions 
2. Most systems do not support specification of pipeline organization. The 
pipeline model of such languages is restricted. The designer cannot change 
the number of pipeline stages and role of each stage. Several systems supｭ
port pipeline control synthesis, but explicit de五nition of the pipeline control 
is needed. Pipeline control definition is error-prone task and design of it takes 
long design time. 
For the more effective architectural design space exploration, synthesis of datｭ
apath 仕om behavioral description of instructions and pipeline control logic synｭ
thesis for user-defined pipeline organization are required. The ability of dealing 
with the user-defined pipeline organization is essential to evaluate various pipelined 
processor architectures. Datapath synthesis and pipeline controllogic synthesis for 
user-defined pipeline organization and instructions can reduce the design time and 
design modification time drastically. Consequently, large design space for ASIPs can 
be explored in a short design time. 
13 
Chapter 3 
PEAS-III: Processor Design 
Environrnent 
This chapter describes micrかoperation level processor specification and application 
specific instruction set processor (ASIP) design environment PEAS-II based on 
micro-operation level processor specification. First of all, characteristics of processor 
architecture 紅e classified. Then, their impacts on performance and cost on the 
processor are evaluated for decision of fiexibility on micrかoperation level processor 
specification. 
3.1 Characteristics of Modern Processor Archiｭ
tecture 
Architectural characteristics of modern processors are classified into the following 
po匤ts: 
• instruction set architecture: Instruction set architecture is an interface beｭ
tween software and hardware. Instruction set is in且uenced by many other 
architectural features described below. 
• configuration of functional units: Performance of the functional unit affects 
execution time of application program. Hardware cost of the function unit 
affects total chip area. The functionality of the units and connectivity among 
them, in other words “datapath structure," restricts instruction set. The numｭ
ber of functional units determines how many operations are executed at the 
15 
E-ーーーーーー一一一一一一一一てι 一一一一一一 一一一一一
same time. 
• storage units' organization: Storage units' organization includes location of 
operands, the number of operands, size of register-file and memory, memory 
hierarchy and so on. 
The operands can be located in accumulators, special registers , general-purpose 
registers and memories. When operands are located in the accumulators or 
special registers , location of them are implicitly appointed by an instruction. 
Using implicit operandsう the designer can reduce the instruction word length. 
However, load and store overhead from memory or register to accumulator or 
special registers makes execution time long. On the other hand, locations of 
operands are explicitly declared in an instruction when operands are located 
in a general-purpose register or in a memory. 
Furthermore, the processor architecture is classified to register-register arｭ
chitectur民 register-memory architecture and memory-memory architecture 
whether operands ぽe located in a general-purpose register or memory. Adｭ
dressing modes for operands affect various 五elds such as instruction bit width, 
execution cycles, the number of address generation units and memory access 
units, pipeline orga凶zation and structural hazards. 
In general, register-register architecture and harvard architecture are preferred 
for the design of general purpose RISC processor. Complex memory architecｭ
ture and memory-accumulator architecture are often preferred for data inｭ
tensive digital signal processor design. For ASIP design, decision of suitable 
memory organization for applications is required. 
• pipeline organization and pipeline hazard resolution policy: Clock f民quency
and pi peline hazard occurrence 訂einfluenced by pipeline organization in terms 
of the number of pipeline stages and role of each pipeline stage. The deep 
pipeline makes clock 企equency high, but hardware cost of it also increases. 
Scheduled operations of each pipeline stage decide clock frequency of the prか
cessor. SpeciちTing the operations of each pipeline stage also decides clock 
frequency, area, and condition of pipeline hazard occurrence and penalties of 
16 
them. 
The penalty of pipeline hazards increases execution time of application prか
gram. Several techniques to decrease penalty of pipeline hazards are proposed. 
Data forwarding , re-order buffer reduces data hazards. Delayed branch, branch 
prediction and non-overhead loop reduce the penalty of control hazards. Addiｭ
tional functional units for the division of the operations of conflicted resource 
resolve structural hazards. Selection of those techniques makes trade-off beｭ
tween performance and hardware cost. 
• instruction issue and completion policy: The policies of instruction issue and 
completion are classified into in-order and out-of-order. Complex issue and 
completion mechanism make processor performance high but hardware cost 
becomes high, too. 
• exception and interrupts: Exception and interrupt handling manner has some 
variations especially for architectures with out-of-order instruction completion. 
One of the exception mechanisms is to use history file or future file to keep 
original register values. Another approach is to store status of each pipeline 
stage in detail and let the interrupt handling routine to recover the pipeline 
status. The other is a technique that stops the instruction issue while it is 
uncertain that al the execution instructions will complete without causing an 
exception. 
These characteristics are not orthogonal and influenced each other. The designer 
has to decide processor architecture in considering with these architecture characterｭ
istics and feature of target applications. To overcome the di伍culty of architecture 
exploration, pipeline stage level processor design system is indispensable. PEAS-II 
is proposed as one of pipeline stage level processor design system. 
For the architectural design space exploration in consideration of target appliｭ
cation, micro-operation level processor specification and design system PEAS-III is 
proposed [34 ぅ 35]. PEAS-III enables the designer to do architectural design space 
exploration in a short design time. The designer can try various architecture candiｭ
dates including following architecture variations: configuration of hardware modules, 
17 
specification of application speci五cinstructions which include multi-cycle operations, 
user-defined external interrupts, the number of branch delay slots , and the number 
of pipeline stages. 
Figure 3.1 shows the organization of PEAS-III. The designer entries processor 
specifcation using GUI,“Architecture Design Entry System ," and processor syntheｭ
sis system generates micrかoperation level simulation model and RT level processor 
description for logic synthesis in VHDL [36]. The designer selects resources 台om
fiexible hardware model database (FHM-DB) [37] and the processor synthesis sysｭ
tem receives HDL descriptions of selected resources 企om FHM-DBMS. Estimation 
is also performed at each design step, architecture design phase and micro-operation 
specification phase. Estimation system also accesses to FHM-DBMS to get estimaｭ
tion results of selected resources. This thesis describes architecture level processor 
specification and processor synthesis. 
Figure 3.1: PEAS-III System. 
3.2 Design Methodology 
Fi思rre 3.2 shows a design fiow of PEAS-III. With PEAS-III, processor is designed 
design step by step. Firstly, design goal and processor architecture type are set. 
Secondly, outline of the processor is specified. Specification in the second step 
includes declarations of resources , which are used in the processor, definition of inｭ
struction format and conditions of external interrupts, and definition of interface 
18 
Satisfy Design Goal 
Describe MicrかOperation of 
Instructions and Interrupts 
Processor Synthesis 
RT Level Estimation 
Satisfy Design Goal 
Logic Synthesis 
Figure 3.2: PEAS-III Design Flow. 
19 
ports. In the resource declaration, hardware modules are selected with appropriate 
parameters from parameterized hardware library FHM-DB. The designer can specｭ
ify application speci五c interface between the processor core and other modules on 
SoC by specifying the external interrupt condition and specific processor interface 
ports. Then, area, clock 台equency and power consumption of designed processor 
are estimated at the first cut estimation . 明司len the estimation results do not satisfy 
the design goal, the designer changes archltecture parameters, resources, instruction 
formats and so on to satisfy design constraint. 
3.2.1 Flexible Hardware Model 
After the estimation results satisfy the design goal, clock based micro-operation 
description of instructions and interrupts is defined. Simula七ion model and syn七he­
sizable model of the processor are generated from the processor description. The 
functionality of the designed processor can be validated using the generated simuｭ
lation model. The simulation model consists of behavior level instances in VHDL. 
The simulation model can also be used for evaluation of execution cycles of appliｭ
cation programs, and for cycle based cφveri五cation. The area, clock frequency and 
power consumption of the designed processor are evaluated from synthesized datｭ
apath and controller. When estimation results do not satisちT the design goalう the
designer improves the processor design by re-scheduling operations of instructions 
to the pipeline stages or changing the number of pipeline stages. Re-scheduling may 
improve clock 台equency and the number of pipeline stages improve area and clock 
frequency. 
For architectural design space exploration, effective design reuse of hardware modｭ
ules and frequent cut and try of them are required. For that purpose, fiexible 
hardware model [38] is utilized. FHM isparameterized with various characteristics 
such as bi七 width ， algorithm of the operation, etc. , and various design instances 
can be generated according to the given parameter values. Since instances can be 
generated with various combinations of parameter values , the designer is able to 
evaluate many kinds of resources only by changing parameter values of FHM. 
Several instances of different abstraction levels can be generated from an FHM. 
The processor synthesis system uses behavioral level instances to synthesis micrcト
operation level simulation model and gate level instances to generate RT level proｭ
cessor HDL description for logic synthesis. FHM provides estimation results of 
ms七ances for various combinations of parame七er values. The estimation results of 
FHMs are also used for estima七ion of 七he designed processor. 
‘ .
時6
4勝 目bít_wídth
algordtu捻
Description and modificaもiontime of micro-opera七ionlevel processor specification 
is shorter than other existing processor description for synthesis because datapath 
and pipeline control logic are automatically generated. To generate datapath of 
designed processor,“Processor Synthesis System" inserts selectors for signal confiicts 
and pipeline registers for pipeline execution. The pipeline hazard detection and 
pipeline control logic for pipeline interlock and pipeline fiush are also synthesized. 
The designer can concentrate on instruction set design. 
øJ.主主
寝泊ぽ
滋滋!:IW
?viWr 
回1.. 姻~
I1ItJltiplier 
/JQrelshifter 
rut.ョrttlr . 
Rn:If，鴫_ port S?I' 
Iざs;-it'1ま ♂f1・併 a びUU1't1 ~t1rf"'.t 
tS cmp .sJgned cmp，脅ag(3)‘ C. l1 ag(2) Z. flag(1)' $, t1ag(O) V" 
(合 ag) -. (a, b I ctrl • .01010ぺ CIn〆1')
fi 0 cmpzu .unslgned cmp zero ，ねg(3) C. f1ag(2) ・ Z. tlag(1): S , nag(O) ? 
I(flag) ・ (a I clri 錨‘00000. ， cm 〆むう
明 l' cmþ~ 輸SJgn色d cmp zero ，持ag(3) C, t?g(2) Z. 官ag(l) S. nag(O) 0" 
(明ag) '. (a I ctrt .勺1000. ， Cln . ・0')
η2 dec 'unslgned decremen!. l1ag(3) C, flag(2). Z, !lag(l) S. 担ag(O) O. 
(resu f1，升ag) ._ (a I c1rl 民 泊00011" ， cm .もう
れ 3 ' cdec . 'unslgned decr喧ment(c l!p) ， t1ag(3) C ，肯ag(2) Z，官ag(1) S，ね9(0) . 0・
(resuft，ね9) ・ (aI ctr! .. ・00 111. ， cm 〆0')
器 噌，
cll1 
-園田『品"・
a‘ 
4 例制
M
A門
nM0
・ 152JlOOOtJO 品 3O'6.00000D
Figure 3.3: Flexible Hardware Model Browser View. 
20 21 
Figure 3.3 shows an FHM browser. FHMs in FHM-DB are displayed in the 
left box. FHM parameters are shown in the upper-central box and the designer 
can select candidates of parameter values from the pull down menu on the right. 
Functionality of the selected FHM isshown in the central box. Estimation results 
of the FHM with selected parameters are shown at the bottom of the window. An 
FHM "alu" has a two parameters "biLwidth" and "algorithm.η “32" and “carry 
look ahead (cla)ηare selected for the parameter value of "alu" respectively. 
3.3 Micro-operation Level Processor Specification 
The micrかoperationlevel processor description consists of six major parts as follows: 
1. Design Goal and Architecture Parameter Setting 
2. Resource Declarations 
3. Instruction Format Def?ition 
4. Interrupt Condition Definitions 
5. Interface Definitions 
6. Micro-operation Descriptions of instructions and interrupts 
In this section, details of each part 紅e described. 
3.3.1 Design Goal and Architecture Parameter Setting 
Figure 3.4 shows a portion of design goal and architecture parameter setting window. 
In this step, the designer speci五es design goal of area, clock 仕equency， execution 
cycle count and power consumption. Then, architecture p訂ameters for pipelined 
processors are speci五ed.
The number of pipeline stages and the number of delayed branch slots 泣e supｭ
ported, currently. Pipeline interlock logic for multi-cycle operation is synthesized. 
Pipeline interlock logic for data hazard, register bypass and memory bypass 紅e not 
synthesized. These parameters are prepared for future extension of PEAS-III. Figｭ
ure 3.5 shows a portion of processor description, which is output 仕om architecture 
22 
File SIl騨鵠
令時lete
vu, 
Fi♂rre 3.4: Architecture Parameter Setting Window. 
23 
始Ip
-
a‘ 
entry system (GU1). 1n the example the number of pipeline stages is 日veand delayed 
branch architecture is selected. The number of delayed branch slots is speci五ed to 
'1'. 1t indicates that synthesized execute one succeeding instruction to the branch 
instruction whether branch is taken or not. 
FI{e 邸'i1 J)J蝉鳩
αw凶e
鎗苦'Ip
Figure 3.5: Example of Architecture Parameter Settings. 
0m ._  
oぽ
bZ [0 blt_Wl的 認
gz 芯55fmM
o AfX)(J 
0)区抑o /)JVfi 
0:F!p . 
Rn.謹加縮財t繊細m嘘
fむ nop "No opera?lOn" 
G 篇(I enb . ・0 ・， 7$1 ・ '0
1 reset “ res♀t" 
0 ・(I rst• '1 ') 
吃 wnte -register wnte" o . (data_1n I enb ..., ')aflp.r 1/: cycle whεn pOSI!lve_!'dge(Cfk.) 
ﾟ. !~d.， 'lMI$1p.ue;ui" 
4 
‘ .AbstractJevel_architecture{ 
Pipeline __architectu吋
Number _of_stages{" 5" }, 
Delayed_bra吋l{" Yes" } ，
Number_oL閃c_delayed..slot{number{" 1づ }}}
v ・ a挽wior kl filfe 
総'JWior 軒・ Q1Ie
3.3.2 Resource Declarations 
明則
M
??? 。‘120000
dυ0.700000 
hP; 26.000000 
ー
O.700瀇O 
'35000000 
議~) 0.120000 
0.100000 
( 35000000 
Figure 3.6 shows a resource declaration window. Flexible hardware models are seｭ
lected fro皿 FHM-DB ， and instance names and parameter values for them are speciｭ
fied. Abstraction levels of resources are specified for micrかoperation level simulation 
model and for RT level synthesizable model, respectively. To synthesize simulation 
model，“Behaviorη is more preferable than "RT" and “Gate" for simulation. On the 
other hand,“Gate" level is frequently used for synthesizable model generation. 
Figure 3.7 shows a portion of a resource declaration description. The processor 
synthesis system instantiates HDL descriptions of declared resources from resource 
declarations. 
1n an example shown in Fig. 3.7, instruction register “1R" is declared. “1R" 
is a positive edge trigger type register and its bit width is "32." “Behavior" level 
instance is used for micrかoperation level simulation model generation and “Gate" 
level instance isused for logic synthesizable model generation. 
t年減置t事 、 IBv紛繍øer
Figure 3.6: Resource Declaration Window. 
Resource{ 
"IR"{ 
class{η regi悦r"} ，
classpath {"" } , 
parameter{ 
abstraction_jevel { 
for _simulation {" Behavior" } , 
for _synthesis{" Gateづ} ，
biLwidth{ワ2" } う
edge_trigger{" positive"} } } 
3.3.3 Instruction Format Definitions 
Figure 3.8 shows an instruction format definition window. Bit fields ， 五eld type, field 
name , and binary value of it are de五ned for each instruction type. Field type is 
Figure 3.7: Example of Resource Declarations. 
24 25 
制iき
一一 Rl type 
4・
聞 酬曙
俳句鳩 .. /)/J滅的守 もももODO
晶、‘'
21 亀..，.，. .., nt.鵬 噌· rs 
-20 16 魯附窃軍事 V 搬鵬 ..,. rt 
省蹄溜)(/ .. 路・e . rO 
四 明一、 一目、目、 明 『 一一
-""~..._一九
備踊~副rJ.... /)l周q... 00000 
5 。 寄常減量E! ..., ，.則E ‘ .
< Q開制
AOD 
8/牌Jgy51 持-者-f 73M1 26J芯P点か;r。d?4?h binJg￡?"μーぴ-， るCbc1ぬ・
J1湾国E 約 25 21 opefお羽 name r宮RI}I1.調P 灯
k1t~四陸部 20 16 O~et釘ld R叙ne rt 
Jtype 11 15 11 Op曽削減 name rd 
lStJ'll調!' 11 l.188 決1:served bth:怠ry OﾟOfl 必?1)'僻灯
5 0 OF人 code btnary ,'00000 /FtJII.到~ " 
KJtWJ割u n l 司P
~ ? C1?e1 
Figure 3.8: Instruction Format De:finition Window. 
26 
司.
selected among “op-code," "operand" and "reserved." "op-code" means operation 
code and “reserved" indicates that the field is reserved for extension in the future. 
Operation code value is specified when the value is constant for al instructions 
belongs to that type, and the value for reserved field is also specified. 
Then, for each instruction, instruction type is selected among defined insもruction
types and operation code value is decided. 
Instruction_type{ 
"Rltype"{ 
"OP-code" {"binary" {司00000"} ，width{" 31" ," 26"}} , 
" Operand" {" name" {" rsづ ，width{"25" ,"21"}} , 
" Operand" {" name" {川" } ，width{" 20ぺ" 16" } }, 
"Operand" {" name'う { " rd" },width{" 15" ," 11づ}，
" Reserved" {" binary" {" 00∞O∞O∞0"づ} ，川川Wlほdtぬh{" l叩0
"う OP-code"{"name" {"rfunct" },width{" 5" ， " 0円}}} }
Instruction { 
"ADD" {type{" Rltype円 } ，"0 P -code" {" binary" {" 000000" }, width {" 31" ," 26づ} ，
" Operand'ヲ { "name" { "が}ヲwidth{"25" ,"21"}} , 
" Operand" {" name" {川円 } ，width{"20" ," 16"}} , 
"Operand" {"name" {"rd" },width{" 15" ," 11 " }} ヲ
円 Reserved"{"binary" {" OOOOO"} ,width {" 10" ," 6"} } , 
"OP-code" {"binary" {円 100000" } ， width{"5" ， " 0づ}}
Figure 3.9: Example of Instruction Format Definitions. 
In micrかoperation descriptions, bit field of the instruction is referred by the field 
name that is de:fined in instruction format definition phase. Modification of instrucｭ
tion format which includes varying instruction bit width, re-ordering instruction 
fields, changing operation code and so on do not require modification of micrか
operation description of instructions. When bit width, name and role of the field 
are not changed, there is no need to modify micrかoperation description. Instrucｭ
tion code definition is used to generate instruction decoder, which is mentioned in 
Section 4.4.1 and Section 5.2.4. 
In an example shown in Fig. 3.9, an instruction type "Rltype" and an instruction 
“ADD" which belongs to “R1type" are defined. The instruction type “R1type" has 
27 
ーー一一一孟=二一一一一一一--一一一一一一一一 一一一一一
six instruction fields. The range of the first field is 仕om “31" to “26." The type of 
the first 白eld is “OP-codeηand its value is constant “000000." The second and the 
third fields indicate register address of source oper叩ds and the forth field indicates 
destination register address. The 五fth field is reserved for future extension. The last 
filed is an operation code for Rltype instructions. The operation code for "ADD" 
is "100000." 
3.3.4 Interrupt Condition Definitions 
Figure 3.10 shows an interrupt condition de五nition window. Interrupt definitions 
include interrupt conditions and the number of execution cycles of the interrupt. In 
the example of interrupt “intO." Processor receives interrupt “intOηwhen external 
input port "INTηreceived '1', and needs one cycle to process the interrupt “intO." 
εxcept!on 
F司rtSfII
JnJfO 
鎗圏， εn唱'1 1<<1
InIs吻.t ・ ßxt.ω拘.，
ギ".腎 :!i'-':，
制_Ior /)1蹴;riptlm 伽Sまrail1l'~as.録的t鋤令畑町曾句~c;md/J量獅
rst..., • 
Figure 3.10: Interrupt Condition Definition Window. 
3.3.5 Interface Definitions 
Figure 3.11 shows an interface definition window. In an interface definition , an 
entity name, and input and output ports of target processor are defined. Port 
name , direction, type and attribute of processor interface ports are also de五ned. For 
the standard processor, memory interface port, clock port, reset port and external 
28 
File Edit sæn:カ
α7IIPlete 
〆 Clk
〆 m智}
〆 lot
〆 r5t
蜘Ip
111 .. std_loglc CIOC~ 
Jn . s!d_log'c_ I，' ector(ご$
m • S!?_loglC 
$ 
〆 InstAB a.Jt ". std_loglC_ l,' ector(3 In strucむon_memory_addres$_bus 
ザ msむB in • std_loglC_ ...eClor(3 In structJon_memor~'_data_bus 
〆 dataAB OUI' . std_loglc_....ector(3 dala_memory_addreSs_bus 
〆 d昌laDB 紺'Id .. stè_loglε_l/eclor(3 dala_memory_data_bus 
tI' we aIf . std_loglc_vector(3 data_memory _wnte_enable 
‘ .
Figure 3.11: Interface Definition Window. 
interrupt ports are usually de五ned. Furthermore special purpose interface port can 
be declared. 
Figure 3.12 shows a portion of interface definition description. In the example, 
clock port "clk" of which type is “std_logic" is de五ned. “stdJogic" is a bit type that 
is generally used in VHDL. 
3.3.6 Micro Operation Descriptions 
Figure 3.13 shows a micro-operation description window. In the micrかoperation
description phase, the designer defines clock based instruction behavior and interｭ
rupt behavior. In the micrかoperation description of interrupts, operations of the 
processor such as setting specific values to special registers and jumping to the inｭ
terrupt handler routine, are described. Micro-operation consists of three kinds of 
statements: (i) Operations which are executed by resources, e.g. arithmetic and logic 
operation, readぅ register write う (ii) Data transfers between resources , and (ii) Conｭ
ditional execution of (i) and (i). 
29 
Port_declaration { 
e凶ty_name{" CPU" } 1 
Port{ 
九lk"{ 
directio吋円 iIf}?
signaLtype{" stdJogic"} , 
signaLatt巾ute{ηclock"} }, 
"instAB" { 
directio吋"outη} ，
signal 一tげyp戸州e吋{"、S削t吋札dι一Jog伊i比C一V刊附e舵Cωt加O町r市(仰3但1 dωowntω00町)"
signal一a州t此t巾u凶1比巾te吋{"ins計tru削lctior山悶n∞Oωry_addむress岱S一b凶usぜ?η'}}}}
Figure 3.12: Example of Interface Definitions. 
F匀e 制~t
lnstrucむon
91 tl 
ミ'1 #1 
ミ'1t! #1 
3.T 11 
S1.TI 11 
3.TfU n 
3.Tl/ 11 
3? n 
S'Ut! /1 
S司( 11 
Sヨ(V 曹E
S叙'1 /1 
SJ1 /1 
滋~ /1 
線'.L T 11 
鰻'.LTlJ n 
9ft! 叙
/)/決'1 /1 
m鳴Z 書官
厳司W I1
lIlHJ 宮E
Kl.DI1 
震:0 11
1IE 11 
Vi，情翁町功
ph部E
a・.
勝...，
eenavlor Description 
IR ..IMEM[Pq 
pc.1ncO, 
spc...pc. 
OをCODE (IR).
$tt "GPR readl (吟J
局'ICT(1勾即itI'J
事 rs ・G PR.readO(rs).
$Imm 鯵 EXTO $Ign(o宵$el)，
$0符S E't • $lmm(29 àov,:nto 0) & . 00ぺ
$targe ! ・ ADDO‘add($pC ， Solfseり，
約ag . .. ，o.Lυo cmp($rs.$門}.
It($llag(2)〆i) then P~:雪arget. end tf, 
OK 
Figure 3.13: Micrかoperation Description Window. 
30 
梅lp
MOT{ 
mnemonic{ 
"BEQ"{ 
c比(l){"IR := IMEM[PC]; 
PC.incO; 
$pc:=PC;つ ?
clk(2){" DECODE(IR); 
$rt:=GPR.read1 (rt); 
$rs:=GPR.readO(rs); 
$imm := EXTO.sign(offset);"} , 
clk(3){可offset := $imm(29 downto 0) 判明\";
$target := ADDO.add($pc, $offset); 
$flag:=ALUO.cmp($rs,$rt) ; 
if($flag(2)='1 ') then PC:=$target; end if;"} , 
clk( 4){""} , 
clk(5){""} } } } 
Figure 3.14: Micrcトoperation Description of instruction BEQ. 
Figure 3.14 shows an extracted description of Figure 3.13. In the example, 
a m.icro-operation description of an instruction “Branch on Equal (BEQ)" is de-
scribed. The instruction “BEQ" jumps to “PC + offset * 4円 when register values 
of "rs" and “rt" 訂e the same. Capitalized identi五ers ， such as “IR" and "ALUO" 
denote resources declared in the resource declaration phase. Symbol “:=" denotes 
assignment. Identifiers which begin with '$' are temporal variables. An identifier 
surrounded by symbols “[" and “]" specifies address to memory or register file. The 
expression “DECODE(IR)" in the second stage denotes that an instruction code is 
decoded in the second stage, where “IR" is an instruction register. The expression 
"$flag := ALUO.cmp($rs, $rt)" in the third stage denotes that values stored in “$rs" 
and "$rt" are compared using resource “AL UO" and the result wiII stored in "$flag." 
The “if' statement in the third stage is an example of conditional execution. 
Definition and modification of micrかoperation description are easy because de-
signer does not need to take care of selectors, pipeline registers and pipeline con-
trollogic. PEAS-II generates HDL description of ASIPs from user-defined micro-
operations of instructions and interrupts by inserting selectors and pipeline registers 
31 
ーー孟 ーー ~ ーー里ーー-一一一一I-
automatically, and generating control logic for pipeline interlock and pipeline fiush. 
Exception{ 
円 reset"{ 
Condi tion f'rstこう 1 "'}, 
Type{円 External"} , 
Cycles{"l"} , 
MOD{ 
clk(l){"PC.resetO; GPR.resetO; 
EPC.resetO; HI.resetO; 
LO.resetO; IR.resetO ;" }}} , 
"iniO" { 
Co吋ition{"int =γand intn = \可OO\""} , 
Type{官xternal" } , 
Cycles{" 1づ?
MOD{ 
clk(l){"EPC := PC; 
PC:=\勺0000000000000000000000010000000\";" }}}} 
Figure 3.15: Example of Interrupt Definitions. 
Figure 3.15 shows an example of interrupt de五nition description. Defined interｭ
rupt condi七ions and micro-operation description of interrupts are combined in the 
description. In the example, the processor detects the interrupt "intO" when input 
port “intO" receives '1' and value of program counter (PC) is stored in exception 
program counter (EPC) and PC is updated to "Ox800080." 
32 
Chapter 4 
Processor Model 
In this chapter, processor model for processor synthesis is described. In Section 4.1 , 
limitation of target processor is discussed. In Section 4.2, requirements of the prか
cessor model is described and proposed processor organization are described. In 
Section 4.3, organization of datapath and controller are described. Processor conｭ
trol mechanism which includes pipeline interlock and pipeline fiush is demonstrated. 
4.1 Processor Class 
Feature of the t訂get architecture of processor synthesis includes: 
• single phase straightforward pipelined processor. PEAS-III assumes pipeline 
architecture, but the number of pipeline stages and operations assigned to each 
pipeline stage are fiexible. Each pipeline stage is proceeded synchronously with 
positive edge of a clock. 
• delayed branch with predict-not-taken policy. The designer can speciちT the 
number of delayed branch slot. The processor executes succeeding specified 
number of instructions whether branch is taken or not , and nullifies other 
fetched instructions when branch is taken. 
• multi-cycle operation. PEAS-III is able to deal with multi-cycle units such as 
sequential multiplier, memoηr access units and so on. The processor syntheｭ
sized by PEAS-III stalls succeeding instructions until multi-cycle operation is 
completed. 
33 
Out-of-order completion • in-order instruction issue and in-order completion. 
and out-of-order instruction issue are not supported. 
』ω=
。』言。
υ
On the User-defined external interrupts are supported. 
other hand, internal exceptions are not supported. 
• external interrupt. 
• flexible addressing modes, storage organization. The designer is able to design 
addressing modes freely in micr<roperation description of instructions. Multi 
port memory and multiple memor冾s can be used. 
• single word instruction. The width of instruct卲n word is user-defined constant. 
Mult?word instruct卲n is not d叝ectly supported. 
The designer can specify data forwarding in micrかoperation description of in-
structions. Data hazard detection and data forwarding logic are not automat兤ally 
generated from micrかoperation description of instructions. 
Processor organization 4.2 
Since the number of pipeline stages is parameterized and micrかoperations of each 
stage is defined by the designer, fiexible processor model is required. 
Figure 4.1 shows an example of a pipelined processor organization [39]. This prか
cessor consists of five stages, instruction fetch (IF) , instruction decode and operand 
む。
Eω
窓
口。コ
U2
』】凶ロ【
fetch (ID) , execution (EXE) , memory access (MEM) and register write back (WB) 
In general , operations in a pipeline stage complete in one clock cycle and stage. 
The operation results are referred from the store the result to pipeline registers. 
nest stage at the next clock cycle. 
To deal with flexibility in pipeline depth of target processor, datapath and con・
troller is divided into pipeline stages like Fig. 4.2. Specified number of datapath and 
controller sets for each pipeline stage are arranged and connected together. A set of 
datapath and controller is added or deleted when the number of pipeline stages is 
Figure 4.1: Example of Datapath and Controller of Pipelined Processor. 
changed. 
Fi思rre 4.3 shows a processor model for five stage pipelined processor. The model 
35 
cons﨎ts of 五vesets of datapath and pipeline stage controller, instruction decoder and 
34 
Figure 4.3: Processor Model. 
』U一τubtoU5PEudt
』
MU
ロ。包ロ。
υ
h』。
εω
豆
口。一
ωυ
ヨ』】回口同
37 
Figure 4.2: Example of Pipelined Processor Divided into Pipeline Stages. 
36 
interrupt controller. Instruction decoder is arranged to the instruction decode stage 
indicated by keyword “DECODE" in micrかoperation description. The term "stage 
controller" isused to indicate a controller arranged to each pipeline stage. The stage 
controller sends control signals to resources in the datapath and manages pipeline 
flush and interlock. The stage controllers and the interrupt controller communicate 
each other. The stage controller determines the pipeline stall and the next state 
from the output of controller of next and previous stage. Since the load of pipeline 
controllogic is distributed to each stage controllers, controller synthesis is simplified. 
The rest of this chapter describes datapath model of pipeline stages う instruction
decoder, stage controllers and interrupt controller. Section 4.3 describes datapｭ
ath model and Section 4.4 describes controller model. The organization of stage 
controller is described. Pipeline interlock and pipeline flush using proposed stage 
controller are demonstrated. In Section 4.4 .3, the organization of interrupt controller 
and how to handle interrupts are described. 
4.3 Datapath Model 
The datapath model is illustrated in Fig. 4.4. The datapath model consists of 
resources, selectors, pipeline registers and connections among them. From micrか
operations that are described by the designerう datapath and controller are impleｭ
mented using this model. Resource operations in micrかoperations are executed by 
resources, and assignments 紅e implemented as connections between resources. Seｭ
lectors are used to resolve signal conflicts. Operation results are transferred to the 
next stage via pipeline registers. 
4.4 Controller Model 
Controller consists of three major ports, such as instruction decoder, stage conｭ
tr叫lers ， and inteηupt controller. 
38 
control signals 合omstage controller 
口口口 Decl鉱山…s
図仰向炉問日記lector
Figure 4.4: Datapath Model. 
4.4.1 Instruction Decoder 
There are two ideas of instruction decode shown in Fig. 4.5. The one is to execute 
instruction decode in the instruction decode stage. The other is to send instruction 
code to pipeline stage step by step and decode the code in each pipeline stage. 
The former method leads to shorter critical path of pipeline stage than the latter 
method because the latter method makes additional delay of instruction decode 
for each pipeline stage. The latter one, however, makes decoding logic simple. In 
this thesis ヲ the former type instruction decoder is adopted to generate high-speed 
processor. 
Instruction decoder in this thesis identifies which instruction is fetched and genｭ
erates two types of control signals in the instruction decode stage: control signals 
for resources and instruction identification signals for stage controllers. The latter is 
used to judge whether executing instruction in the pipeline stage belongs to a certain 
set of instructions or not. Generated signals are transferred to the stage' controllers 
39 
Datapath 
Datapath 
Figure 4.5: Example of Two Types of Instruction Decoder. 
40 
step by step synchronously with pipeline execution. The behavior of stage controller 
is described in Section 4.4.2 and usage of instruction decode result are explained. 
4.4.2 Pipeline Stage Controller 
The stage controller generates control signals for resources , pipel匤e registers, and 
selectors. The controller assigns control signals to resources to execute described 
micro-operations. The controller also manages pipeline registers to transfer the data 
to next stage as usual, and to keep the operation results in the case of pipeline interｭ
lock. The stage controller also regulates pipeline execution in the sense of pipeline 
interlock and pipeline flush. The controller stalls the pipeline to wait for completion 
of multi-cycle operation and resolution of resource conflicts. The controller flushes 
the pipeline by nulliちring executing instructions when branch is taken. 
Control IDodel of the stage controller is based on the pipeline control model 
published in [40]. In [40], pipeline controller synthesis for pipeline interlock 仕om
usage information of resources is discussed. In this thesis, instead of usage informaｭ
tion of resources, structural hazard detection method is proposed. Furthermore, the 
pipeline controller is extended to pipeline flush and suspension of instruction fetch. 
The controller model is common to al pipeline stages. Decision of next state 
and generation of control signal are distributed to each pipeline stage. Distributed 
control logic makes controller organization and synthesis method simple. 
Suppose n isthe number of pipeli田 stages and k(l 三 k 三 n) is the stage number, 
the controller of each stage k isrepresented by finite state machine 
Mk = (qk, h , Ok, 6k, Pk ， ηop) 
and datapath control signal generator. Each item of Mk is defined as follows: 
states variable: qk ε{ηop， exec} 
input signals: h 全 {brαηch ， lockk , gOk-l , gOk+l , validk- 1, validk+1} 
output signals: Ok 全 {vαlidk ，gOk} 
next-state function: 
ふ(qk ， brαnch ， lockk , gOk-l , gOk+l ， υαlidk- 1 ， validk+1) 
41 
??our
レ&
Z
ハM
i
ρU
相川
ftttj
、
ttEK
ム一
when (brαnch+ ωηcel( k)) ・ (validk_ 1 ・ gOk-1+ 
(qk = exec) ・ (lockk 十叩lidk+ 1+ gOk+l)) 
The function cαncel (k) holds if and only if the k-th stage has to nulliか current
instruction when branch is taken. Detail of cαηcel (k) is described in the following 
sect卲n. 
otherwise 
output functions: Pk 全 {p叩lidk ，Pgok} The interrupt controller outputs true for goo , as usual. However it output false 
when interrupt is occurred and suspension of instruction fetch is required. When 
goo = f alse and gOl = true , next state qt becomes nop, and operations of the 五ISt
stage will be stopped. If the instruction in the first stage does not stay, execution 
of the first stage will be stopped at next clock. 
Output signal gOk becomes f alse if and only if at least one of the following 
conditions is satisfied. 
P叫dk (qk) 全 (qk 二 exec)
ρgOk (qk , lockk, validk+1, gOk+l) 全 (qk ニ exec) . lockk . (υαlidk+ 1 + gOk+I) 
The status variable qk indicates whether executable instruction exists in the k-th 
stage or not. When qk = exec, an instruction exists in the k-th stage. The value of 
qk becomes ηop when pipeline is stalled and valid ins七ruction is not moved to the 
k-th stage or pipeline flush is executed, etc. qk = ηop means there is "no operation" 
in the k-th stage. The initial of value qk is nop. 
Values of input signals are specified as follows: 
initial status:ηop 
brαηch - fαlse 
lockk fαlse 
gOk fαlse 
validk = fαlse 
• The k-th stage causes pipeline interlock 
• An instruction in the (k + l)-th stage does not move to the (k + 2)-th stage. 
when branch is taken 
when branch is not taken 
when an instruction in the k-th stage causes pipeline interlock 
otherwise 
when an instruction in the k-th stage is transfered to the next stage 
when an instruction in the k-th stage stays 
when valid instruction exists in the k-th stage 
when no instruction exists in the k-th stage 
When the k-th stage causes pipeline interlock by multi-cycle operations or resource 
confl.icts, gOk becomes f alse and the instructions in the succeeding 1 三 t 三 (k-l)-th
stages are also stalled. 
Control signals to datapath resources are generated 仕om output signals Ok of 
stage controller Mk ぅ results of instruction decoder and output signal of interrupt 
controller. Stage controller outputs control signal for described micrかoperation of 
executing instruction in the k-th stage as usual. The controller outputs the control 
signal to hold the status of resources when the pipeline is stalled (gok = fαlse ).
Pipeline hazards are classi五ed as follows: 
The values of gOn+l , Vαlidn+ 1 are de五ned as goπ+1 = true , Vαlidn+ 1 = false. An 
input signal goo is an output signal of interrupt controller. 
Next-state function 6k outputs exec if and only if the following conditions are 
satisfied. 
• structural hazard う which is caused by multi-cycle operations and resource conｭ
自icts ，
• and control, hazard which is caused by branch. 
• branch is not taken or the k-th stage does not need to nullify instruction when 
branch is t叫cen. For the structural hazard, pipeline is interlocked until the multi-cycle operations 
are completed and resource conflicts are resolved. For the control hazard, some 
instructions in the pipeline stages are flushed when branched. 1n the following secｭ
tion, pipeline control mechanism and the controllogic of lockkl brαnch and function 
• An instruction in the (k -l)-th stage will reach or current instruction in the 
k-th stage stays. 
42 43 
cαηcel (k) are described. Pipeline interlock sigr叫 lockk is described as follows: 
lockk = lock_mk + lock_Tk 
lock_mk is a pipeline interlock signal for multi-cycle operations and lock一九 lS a 
pipeline interlock signal for resource conflicts. 
Pipeline Interlock caused by 乱1ulti-cycle Operations 
When multi-cycle operation is executed in the k-th stage, instruction transfer from 
stage j (1 三 3 く k) to stage j + 1 issuspended to stall succeeding instructions. 
Time 
T-l 
T 
T+l 
T+2 
T+m 
T+m+l 
2nd stage 3rd stage 4th stage 5出 stage
Instruction D execute m cycle operation 匤 the 3rd stage 
...__、 Instructionis transferred to 出enext stage 
-ø..二 Instructionis not transferred to 出en側 stage
Figure 4.6: Example of Multi-cycle Operation. 
Figure 4.6 shows an example of pipeline interlock caused by multi-cycle operｭ
ation. Suppose instruction D executes m cycle operation at the third stage from 
time T. The instructions in the first , second and third stages are not transferred 
to the next stage while multi-cycle operation is executed. The state of fourth stage 
becomes "ncトoperation" because instruction in the third stage is not transferred. At 
time T + m , multi-cycle operation is completed and then instructions in the first , 
second and third stage are transferred to the next stage at time T + m + 1.
44 
In the case of pipeline stall, the j-th stage controller assigns control signals to 
storage resources to disable write back while instruction transfer is suspended. 
CLK 
Star ヒ
Fl.n 
Sig_To_Module 
Sig_From_Modul 
Figure 4.7: Timing Interface Between Controller and Multi-cycle Resource. 
Figure 4.7 shows a timing interface between the controller and mult?cycle reｭ
sources. The controller makes start signal “Start" active for one cycle and then the 
resource starts operation. After the multi-cycle operation is finished , the resource 
outputs the result and changes the value of the fiag “Fin" active to inform the comｭ
pletion of the operation. When multiple multi-cycle operations are executed in the 
same stage and the same instructionヲ the stage controller stalls the pipeline until al 
multi-cycle operations are finished. The operation results and completion fiag must 
be kept until other multi-cycle operations are finished. Because the resources keep 
operation results and fiag values until next operation starts, additional structure for 
saving the results and fiags 征e not required. The interface information that includes 
start signal input port, fiag output port, and active value of them can be obtained 
仕om FHM-DB. 
Suppose Uk爪叩 = {(exp , inst) I exp ε Exp， ir凶 ε I} is a set of conditional 
expression exp and instruction inst pairs, which represent execution conditions of 
operation op of resource T in the k-th stage. In another words, an operation op of 
resource T in the k-th stage is executed if and only if one of the executing instrucｭ
tions is inst and condition exp holds. The control logic of lock_mk for multi-cycle 
operations is represented as follows: 
45 
where 
lock_mk V V 
Tモ li_ (exp ，ins t) モUk.r.o'Prn 
OPm. モ OFm 「ー
(instk = inst) . exp . fin叩m
R: a set of resources 
OPm: a set of multi-cycle operations in the k-th stage 
instk: indicates executing instruction name in the k-th stage 
after multi-cycle operation OPm is completed 
during multi-cycle operation OPm is executing 
(4.1 ) 
Equation (4.1) means that lock_mk holds if and only if at least one multi-cycle 
operation is not completed. lock_mk becomes false after al the multi-cycle operaｭ
tions are completed. 
The start signal of the multi-cycle operation is activated at 七he first cycle, and 
then negated 仕om the second cycle to the start of the next multi-cycle operation. 
In the example, control signal for multiplier is activated at time T and then negated 
at time T + 1.Suppose Vactive for the active value of control signal stα付ぅ control
logic of stαrt is as follows: 
st叫 L・= [Vactive when f切 V叩mεOPmV (exp， inst)εUKTOM(tηsh = inst) 仰1 , "- I 可ctive otherwise 
(4.2) 
flαgt - gOk-l (4.3) 
flαgk is a register, which indicates whether it is the first cycle of multi-cycle operｭ
ation or not. The value flαgk becomes true when new instruction is transferred to 
the k-th stage and becomes ηop when execution instruction stays in the k-th stage. 
Pipeline Interlock caused by Resource Conflict 
When resource conflict is occurred between stage k and stage j (k < j) , the k-th 
stage is stalled until completion of the j-th stage ヲs operation. 
Figure 4.8 shows an example of resource conflict. An example processor sh紅白
a single-memory for data and instructions. The first stage is the instruction fetch 
46 
Time 1st stage 2nd stage 3rd stage 4th stage 5th stage 
T-1 
T 
T+1 
T+2 
Instruction C accesses to memoηin the 4出 stage.
?、もInstructionis transferred to the n側 stage
可ι可... Instruction is not transfeπed to 出enext stage 
Figure 4.8: Example of Resource Confiict. 
stage and the fourth stage is the memory access stage. Suppose an instruction C isa 
memory access instruction. The first stage is stalled at time T. After the instruction 
C completes memory access operation and moves to the fifth stage, memory access 
in the first stage is executed at time T + 1.
Suppose Vr,k = {inst I inst ε I} is a set of ir凶ructions ぅ which represents the 
instructions that use the resource r in the k-th stage. To put it in another way, a 
resource r isaccessed 仕om instruction inst in the k-th stage. Suppose ηis the numｭ
ber of pipeline stages. The controllogic of lock一九回 forresource con丑ict is represented 
as follows: 
lock_Tk v ( V( V (instj = ij) . validj ) ・
γモRkく3壬n tjEミ Vr.j
( V (instk = 九) • Vαlidk ) ) 
ik ε Vr ， k 
(4.4) 
Equation (4.4) means that lockk ,r holds if and only if at least one resource r is
accessed 仕om the k-stage and from at least one stage j where k < j ::;η. 
Control signals for conflicted resources are generated 台om multiple stage conｭ
trollers. Suppose ctrlr isa control signal for resource r and ctrlr,k is a control signal 
generated by stage controller of the k-th stage. The control signal is selected as 
follows: 
47 
ctrlr = V ctr lゅ sel吠 (4.5)
l くたくη
se lr、k = ( V (instk = 九). validk). V V ((instj= ω .vαlidj ) ) 
ik EVr ,k k くj三n ljEミ VγJ
(4.6) 
Equation (4.6) means that control signal ctrl哨 frorn the k-th stage controller 
is selected when selM=tTue.selT k becomes tT1Le when resouceTis not accessed 
frorn any stage j (k < j 三 η) and is accessed 企om stage k. Figure.4.9 shows an 
block diagrarn of interlock si伊al generation logic represented in Equation (4.4) and 
control signal selection represented in Equation (4.6). 
atapa出 ofstage k datapath of stage j 
loc~ ctrl r 
S凶ge con甘ollerfor stage k stage controller for stage j 
Figure 4.9: Exarnple of Control Signal Selection for Confiicted Resource. 
Pipeline Flush 
Branch control is based on a predict-not-taken policy and delayed branch. In PEASｭ
II system, the number of delayed branch slots d isparameterized. The processor 
48 
executes succeeding d instructions whether branch is taken or not , and ftushes the 
pipeline by nulliちring other fetched instructions. When d = 0, the architecture of 
the processor is pure predict-not-taken architecture. 明乃len branch is taken at stage 
b, the controller of stage k (1 < k :;b -d) nullifies transferred instruction and makes 
its state "nかoperation円 at the next clock cycle. 
Time 
T-l 
T 
T+l 
T十2
1st stage 2nd stage 3rd stage 
The number of delayed branch slots d =1 
The branch stage number b =3 
4th stage 
Figure 4.10: Exarnple of Branch. 
5出 stage
In the exarnple shown in Fig. 4.10, the branch stage b isthe third stage and 
the number of delayed branch slots d isone. In this example, branch is taken at 
time T and instruction E that is succeeding to the branch instruc七ion D is executed 
continuously and the instruction F that is succeeding to instruction E is canceled 
by stage controller in the second stage at tirne T + 1.
The function cαncel (k) is as follows: 
f true when (1 < k 三 b -d) cαncel(k) _ ~ v;~~:~ ~.~~::~~ 1 f alse otherwise (4.7) 
Suppose Br = {( exp , inst) I exp ε Exp， irぱ ε I} is a set of conditional exｭ
pression exp and instruction inst pairs, which represent branch condition. The pair 
(exp , inst) εBγrepresents that branch is taken when executing instruction in the 
かth stage is inst and conditional expression exp holds. The logic of control signal 
brαηch is represented as follows: 
brαηch = ωlidb .( V (instb=inst).exp) 
(exp，仇st)ε Br
49 
(4.8) 
Limitations of the proposed branch control method are as follows: 
• Branch stage b must be unique. 
• Instructions that change the statuses of resources such as register write and 
so on, in the k-th ( 1 三 k < b -d) stage should not be scheduled within d + 1 
to b slot after branch instruction. Ifthese instructions are scheduled within 
d + 1 to b slot after branch instructionう those instructions change the statuses 
before branch. Restoring mechanisms such as buffers are needed to cancel the 
effects of 七he canceled instruction completely. Since the proposed method does 
not synthesis such a mechanism, instructions that change machine statuses in 
early stages have to be scheduled within d+ 1 to b slot after branch instruction. 
• Instruction that executes a multi-cycle operation in the j-th (b -d 三 j < b) 
stage must be scheduled after d instructions from branch. When the multiｭ
cycle instruction is scheduled within d instructions from branch instruction, 
some stages becomes empty between branch stage and stage which includes 
multi-cycle operations. The empty stages push out instructions in the delayed 
branch slots. Pushed out instructions are fiushed by the con七roller.
4.4.3 Interrupt Controller 
The interrupt controller suspends instruction fetch and executes described interrupt 
operations. The interrupt controller consists of the following finite state machine 
Mintr and control signals generator. 
Mintr - (qintr , I intr , Ointn 6intr ， ρmか)
Each item of Mintr is de:fined as follows: 
status variable: qintr ε{iηtr， exe ， ωit} 
input signals: Ii附全 {ir山T問pt ， restαrt ， complete} 
output signals: 。的tr 全 {goo ，int} 
50 
next-state function: 
6intr ( qintr , interrupt , restαrt ， complete) 
( iηtr 
ム J exe 
l ωαit 
when (qintr = ωαit) . complete 
when (qintr = iηtr) . restαγt 
when (qintr = cxe) . interr叩t
otherwise ~ qiηtr 
output functions:ρintr 全 {Pvαlido 1 ρint} 
initial status: intr 
Pvalido ( qintr ) 全 (qintr 二 exe)
ρint (qintr) 全 (qir昨= intr) 
States “intr ," "exe" and "wait ," of qiηtr are execution state of interrupts, exe-
cution state of instructions and waiting state for completion of al already fetched 
instructions, respectively. The initial state of qintr is "intr ," because the processor 
has to begin with reset interrupt. 
Input signal interrupt indicates the processor receives an interrupt. Input signal 
complete signal indicates execution of al fetched instructions is competed. restαrt 
signal indicates interrupt handling is completed and instruction fetch can be started. 
羽市en an external interrupt occursう the state of the controller changes the state from 
“exe" to "wait." Then, the controller suspends i:pstruc七ion fetch by forcing the 900 
to false. It makes the state of the 五rst stage “nかoperation. " After al fetched 
instructions are completed, the states of al stages become “nかoperation." Then, 
the state of the controller becomes "intr." An equation below is an control logic of 
complete signal. 
complete = V validn (4.9) 
l<k<n 
The controller begins to execute interrupt operations described in micrかoperation
description of interrupts. When the interrupt is completed, the state of the controller 
becomes “exe" and the output signal 900 becomes true to execute the first stage of 
the pipeline and to restart instruction fetch. 
The following items of interrupt controller that depend on processor specification 
description and have to be synthesized. 
51 
1. logic of restαrt ， 
Suppose 1 ntr is a set of defined interrupts, Si is a defined execution cycle 
count of interrupt i and Cnt is a counter which counts execution steps from 
the status variable qintr becomes intr. The control logic for signal γestart is 
represented as folows. 
restαrt = V (Si > Cnt) 
2. logic of interrupt , 
n山rrupt= V ( speci五ed condition of interrupt i) 
zε Intr・
3. and datapath control signal generator. 
52 
( 4.10) 
( 4.11) 
Chapter 5 
Processor Synthesis 
In this chapter the processor synthesis method is explained. The processor synthes?s 
method consists of two major parts: datapath synthesis and controller synthes?s. 
In this chapter datapath synthes?s method is described first , and then controler 
synthesis method ?s described. 
5.1 Datapath Synthesis 
In datapath synthesis, data-flow graph is generated 仕om mlcrかoperation descripｭ
tions of instructions and interrupts at first. Then , techniques in high-level synｭ
thesis area [41] are utilized for datapath synthesis. Since the designer performs 
micro-operation scheduling to the pipel匤e stages and resource allocations 匤 micrcト
operation descriptions, 匤terconnect卲n generation and pipeline register insertion are 
performed 匤 datapath synthesis. 
F?gure 5.1 shows the datapath synthesis flow. Data-flow graphs (DFGs) of inｭ
structions and interrupts are generated 台om mlcrかoperatíon descr厓t卲ns (MODs). 
Then, DFGs of instructions are merged together to get required data-flow and conｭ
dition of it. DFGs of interrupts are also merged together. For the resolution of 
signal conflicts, selectors are inserted to the both merged DFGs of ins七ructions and 
interrupts. For the pipel匤e execution, pipeline registers are inserted to the DFGs 
of instructions. DFGs of instructions and interrupts are merged and signal conflicts 
are resolved. Then, the DFG that represents the datapath of designed processor is 
synthesized. Each generation step is described in the following sections in detail. 
53 
MODof 
mstruc1J.ons 
MODof 
mterruots 
Figure 5.1: Datapath Synthesis Flow. 
54 
5.1.1 DFG generation 
By analyzing a micrかoperation description of each instruction inst , a data-fl.ow 
graph is generated. The data-fl.ow graph is represented by Ginst = (九ηst ， Ìlinst , Einst) 
where Rnst is a set of resources , ﾌlinst is a set of al resource ports う and Eiηst is a 
set of connections between ports of the resources. (s , d) ε Einst represents data 
transfer 仕om the port s ε Ìlinst to the port d ζ Ìlinst which is specified by a microｭ
operation description. conde ，i1凶 represents a conditional expression for the data 
transfer represented by eε Einst for instruction inst. If the data transfer is written 
in an if-statement of M 0 D う the conditional expression exp of the if-statement is 
extracted to conde.inst. If the data transfer is not written in if-statement , coηde ，inst 
of the transfer e becomes '1'. 
adder.add register. read 
inputO α outputO q 
input1 b reg?ter. wrlte 
outputO result inputO d 
control dη ← 0 control enb • 1 
Figure 5.2: Interface Information for Resources. 
To get input and output ports for resource operations described in the MOD , 
interface information of resources is used. This information consists of corresponｭ
dence of input/01均ut arguments of the resource operation to port names. The 
information also includes required control signals to execute the operation. This inｭ
formation is registered for each model in FHM-DB. Example of registered interfaces 
for an adder and a register 訂e shown in Fig. 5.2. In Fig. 5.2, the first argument of 
operation “add" is connected to adder's input port "a" and the second is to “b." 
The operation result is output from port “result." The controller have to provide 
control signal '0' to port “cin" to execute “a+b." 
An example of the extraction of connections is shown in Fig. 5.3. "RZ,"“RX" 
and "RY" in Fig. 5.3 denote registers and “AD DO" denotes an adder. From the 
interface information shown in Fig. 5.2 ぅ connections eo, el , e2 are extracted. Where 
eo , el and e2 denote the data transfer from port q of resource “RX" to po比 αof
55 
m兤ro-operatlOn : 
“RZ := ADDO.add(RX ,RY); " 
connectlOns : 
eo = (RX.qぅ ADDO.α)
e] = (RY. q, ADDO.b) 
e2 二 (ADDO.γesult ，RZ.d) 
Figure 5.3: Connection Extraction. 
resource “ADDO ," from port q of resource “RY" to port b of resource “ADDO ," and 
from port result of resource “ADDO" to port d of resource “RZ ," respectively. 
5.1.2 Basic Datapath Synthesis 
After the analysis of micrかoperation ， the data-flow graphs of instructions are merged 
into a da抗ta-白.白ow graph G = (R , V, E). It represents a basic datapath of the processor. 
R 二 U ~nst (5.1 ) 
instεI 
v = U viηst (5.2) 
instεI 
E = U Einst (5.3) 
instεI 
where 1 isa set of al instructions. C onde for each data transfer e ε E is determined 
as follows: 
Conde {(coηde ，inst ， inst) I inst ε I}. (5.4) 
(exp , inst) εC onde denotes that the data transfer eε E is executed when 
executing instruction is inst and condition exp holds. 
5.1.3 Signal Conflicts Resolution 
明乃len the same destination port d is shared by multiple connections in E , input 
signals for port d must be conflict. This section presents a selector insertion proceｭ
dureう which resolves input signal conflicts. In this section, basic selector insertion 
algor咜hm is introduced firs七， and then improvement of the algorithm is described. 
56 
8uppose that stαgesTC ( ε) is a stage number where the port 5 to which data 
transfer e = (5 , d) outputs data, stαgedst( e) is a stage number where port d inputs 
data, and width(p) is bit width of port p. Instructions can be executed correctlv if 
selectors are inserted at any stage 仕om stαgesバe) to stαgedst (e). For a reduction of 
pipeline registers, selectors 紅e inserted at each stage 仕om stαgesrc ( e) to stαgedst(e ) . 
Furthermore , a destination port d inputs data 仕om different ports in multiple stage, 
some selectors are inserted for each stage stαgedst( e) to resolve signal confiicts in a 
stage, first. Then, a selector is inserted to resolve inter-stage signal conflicts. 
Stage 3 
Stage 4 
Stage 5 
Figure 5.4: Example of 8elector Insertion. 
In a selector insertion example shown in Fig. 5.4, operation results of ALU , 8FT 
andDMEM 紅eselected by selectors "sel" in the third, the fourth and the 五丘h stage, 
respectively. Because selectors are inserted in each stage, data transfers over pipeline 
stage boundary are reduced. Another example shown in Fig. 5.5 is a case of signal 
confiict over stages. The example is non-harvard architecture and memory access 
unit “MEM" is accessed from both the 五rst stage and the fourth stage. Firstly, 
signal conflict in the fourth stage is resolved and then signal confiict between the 
first stage and the fourth stage, that is data transfer 仕om PC and 台om inserted 
selector, is resolved. 
Outlines of selector insertion procedure are shown in Fig. 5.6 and Fig. 5.7. 
Fig. 5.6 shows an intra-stage signal conflict resolution and Fig. 5.7 shows an interｭ
stage signal conflict resolution. For each destination port d, a set Xd of stage 
numbers in which stage the po口 dreceives data. For each member j ε Xd ， selectors 
57 
stagel stage2 
OrignaJ 
DFG 
巴】???凶
duu
?BU
、
no
?
???
??
??
Inter-stage 
signaJ 
conflicts 
resolution 
Figure 5.5: Example of Selector Insertion for Inter-stage Signal Confiicts. 
会L
???、J4EU叫卯 G = (R, Vう E)
G = (R, V, E) 
1 foreach(dε V) loop 
2 Xd := {stαgedst(ε) I e= (s , d) ε E} 
3 foreach(j ε Xd ) loop 
4 Ed，j:={el ε = (s , d) ε E， stαgesバe) = j} 
5 min := minimum( {stαgesrc(e) I eε Ed，j} ) 
6 for k := min to j loop 
7 Ed,j,k := {e I eξ Ed，j ， stαgesrc ( e) 三 k}
8 if( I Ed,j ,k I >1) then 
9 zηserLselector (1 Ed,j,k 1 ， ωdth(d) ) 
10 i := 0 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
eaut := (s叫 d)
stαgesrc ( eωt) := k 
stαgedst( e叩t) := j 
foreach(e' = (s' , d) ε Ed，j ，k) loop 
ei := (s' , d叫)
C ondei : = C onde, 
stαgesrc ( ei) := stαgesバピ)
stαge白t(ei) := k 
Condeout := Condeout u Conde , 
ci .= (Psel , v叫)
C ondc := C onde, 
stαge(ci) := k 
C:= C U {Ci} 
E:= EU{町} -{e'} 
i := i + 1 
end loop 
E:= EU {eωt} 
end if 
end loop 
end loop 
end loop 
Figure 5.6: Selector Insertion Procedure. inseァLselector(x ，y) is a function to insert 
x inputs and y bit selector. 
58 59 
to resolve signal confiict in the j-th stage are inserted. Ed ,j is a set of data transfers 
that send data to the port d in the j-th stage is calculated. Thenぅ the minimum 
stage number min of data output stage of al data transfer e E Ed,j is searched. 
Selectors are inserted at each stage k from the minimum stage min to the j. 
A set Ed ,j ,k is calculated. For al e ε Ed，j ， k， the output stage of e isless than k is
calculated. Ed,j ,k is a set of data transfer from the k-th stage or before the k-th 
stage. When the number of data transfers in Ed,j ,k is more than one, a selector is 
instantiated and inserted in the k-th stage. A feature of the selector used here is as 
follows: Input ports count is equal to the number of d叫a transfers Ed ,j ,k and the 
bit width is equal to the bit width of input data for port d. 
With the selector insertion, the data transfer e ε Ed，j ，k should be modi五ed. Each 
data transfer 仕om e' = (s' , d) ι Ed，j ， k is deleted 仕om the connection set E. A new 
data transfer ei = (s' , d叫) is added to E. ei is a data transfer 仕om the port s' to 
the i-th input port dSe1i of the selector. The condition of ei is equal to the condition 
of the 白leted data transfer (s' , d). The data input stage number for ei is equal to 
k and output stage for ei is equal to that of deleted one. The control signal value 
ci = (Psel , v叫) is added to C. Psel is control input port of the selector and v叫 1S
a value of selecting the i-th input. The condition Condc; is equal to the condition 
Condei. Addition of selector control signal is described in Section 5.2.1. 
? ?
???、Ji
4
・u
叫側 G = (R, V, E) 
G = (R, V, E) 
In addition, e仰t that is a data transfer from selector output port Ssel to the por七
d isadded to E. The data output stage number is equal to k and input stage is equal 
to j. The condition of data transfer using (sseれ d) is a co吋unction of conditions of 
connections of Ed ,j ,k' 
After intra-stage signal confiicts are resolved う selector insertion for inter-stage 
signal confiicts resolution is executed. The procedure of inter-stage signal confiicts 
resolution is shown in Fig. 5.7. If the number of stages in Xd is more than one, a 
selector is inserted over stages. Feature of the selector used here is as follows: Input 
ports count is equal to the number of stages in Xd and the bit width is equal to 
the bit width of input data for po吋 d. Each data transfer ed,j = (Sd山 d) in the j-th 
stage is deleted 台om the connection set E. A new data transfer ei - (Sd山 d叫)
is added to E. ei represents the data transfer from the port Sd ,j to the i-th input 
1 foreach(dε V) loop 
2 Xd := {stαgedst(e) I e = (s , d) ε E} 
3 if(IXdl > 1) then 
4 inserLselector(IXdl , width(d)) 
5 i := 0 
6 eωt := (Ssel , d) 
7 foreach(jε Xd) loop 
8 ed,j := (Sd山 d) ε Enstαgedst(ed,j) = j 
9 ei := (Sd山 d叫)
10COTIdez:=CondEdJ 
11 stαges同(ι ) := stαgedst( ei) :ニ j
12 Condeωt:=COTLdEouU COTZ4423 
13 Ci := (p叫 U叫)
14 stαge( Ci) := j 
15COTEdc:=C07zdEd3 
16 C:= C U {α} 
17 E:= E u {叶- {ed ,j} 
18 i := i + 1 
19 end loop 
20 E:= EU{eωt} 
21 end if 
22 end loop 
Figure 5.7: Selector Insertion Procedure for Inter-stage Signal Confiicts. 
60 61 
port dSe1i of the selector. The condition of ei is equal to the condition of 七he deleted 
data transfer ed,j' The data input stage number and the data output stage number 
for ei are equal to j. Then, the control signal value ci = (Psel) Vseli) is added to C. 
PseZ is control input port of the selector and Vseli is a value of selecting j-th input. 
The condition C ondCi is equal to the condition C ondei. Addition of selector control 
signal is described in 8ection 5.2.1. 
Improved selector insertion algorithm 
Figure 5.8: Original DFG for 8elector 8haring Example. 
The algorithm shown in Fig. 5.6 and Fig. 5.7 inserts wasteful selectors. In an 
example data-fiow graph shown in Fig. 5.8, some input ports receive same sets of 
input signals. Resource "FWUR8" and "FWURT" are the data forwarding units. 
They receives data 丘om “ALU" and “8FT" in the third stage ， 企om “ALU，" "8FT" 
and “DMEM" in the fourth and fifth stage, respectively. Because different input 
ports of "FWUR8,"“FWURT" are used in each pipeline stage, there are no interｭ
stage signal confiicts. The conditions of data transfers 仕om each functional unit 
to forw釘ding units and general-purpose registers (GPR) are the same. Figure 5.9 
illustrates selector insertion results. The selectors in the third stage always output 
the same results and the selectors in the fourth stages, too. Therefore, improvement 
of the selector insertion algorithm is required to reduce selectors. Before line nine 
62 
Figure 5.9: 8elector Insertion Result without 8haring. 
of Fig. 5.6 う procedure to search a selector Tsel that have common input signals and 
select conditions. If the selector T sel exists, the data transfer sets Ed,j ,k are deleted 
仕om E and data transfer from the output port of selector Tsel to the port d isadded. 
8uppose Ersel is a set of edges which represent data transfers to the selector input 
ports, the condition to use the inserted selector T sel is described as follows: 
Ve = (s , d) ε Ed，μ ヨ esel = (s , dsel ) ε Er sel C ondesel = C onde (5.5) 
The condition which is shown in Equation 5.5 becomes true if and only if data 
transfer esel exists for al data transfer e of Edふk. e and esel have the same input 
port and condition of data transfer. 
Figure 5.10 shows a selector insertion result of improved algorithm. Wast efu 1 
selectors are reduced to one for each stage. 
5.1.4 Pipelining 
羽弓len data are transferred over pipeline stage boundary, a pipeline register is reｭ
quired to transfer operation results to the next stage. A data transfer eεE 
63 
stage5 
Figure 5.10: Selector Insertion Result with Sharing. 
??
???、
Ji
ふL
叫川・胃Art G = (R, Vう E)
G = (R, V, E) 
~ 
Pipeline Register 
1 Ereg: = {e I e E E , stαge sバe) > stαgedst( e)}; 
2 if Ereg ヂゆ then
3 e:ニ (s ， d) ε Ereg; 
4 insta出ate width( s) bit register; 
5 ein := (s ぅ drω;
6 stαges陀 (ein) := stαgesrc(e); 
7 stαgedst (ein) := stαge s陀(ε ); 
9 E:= Eu {ein}; 
8 E':= {e' I e' = (s ， d') ε Er句 ， stαgesrc ( e) = stαgesrc(e')}; 
10 foreach e' := (s , d') ε E' loop 
11 eout := (Sreg , d'); 
12 stαge sバe仰t) := stαgesバε)+ 1;
13 stαgedst(eωt) := stαge出t(e'); 
14 E:= E U {eωt}; 
15 E := E -e'; 
16 end loop; 
17 goto 1; 
18 end if; 
which satisfies stαgesrc ( e) < 5tαgedst( e) means data are transferred over pipeline 
stage boundary, so that pipeline registers are required at each stage boundary from 
stαgesrc (ε) to 5tαgedst( e). In a pipeline register insertion example in Fig. 5.11 , 
pipeline registers are inserted to each pipeline stage boundary. 
Stage 2 
Stage 3 
Stage 4 
Figure 5.11: Example of Pipeline Register Insertion. 
Figure 5.12: Pipelining Procedure. 
Pipelining procedure is illustrated in Fig. 5.12. First of all, a set of data transfer 
Ereg that is a subset of E is calculated. For al data transfer e ε Ereg satis五es
stαgesrc ( e) < stαgedst(e). Wllen the number of data transfers in Ereg is not zero, 
there are some data transfers over pipeline stage boundary. One data transfer e = 
(5ヲ d) of the Ereg is selected arbitrarily, and width( 5 )-bit pipeline register is inserted 
between the stage stαgesrc ( e) and the stage stαgeιs♂許7γバ.
i凶ns印er凶tiぬon孔， connection e向iη 仕om poぽrt s to dιreg 1詰s added tωo E where dιT陀eg and 5斗Tε句9 are 
64 65 
an input port and an output port of inserted pipeline register, respectively. Because 
t山he r閃珂吋e句矧gis伽 i回s written in the stage st α gesrc( e) , ばα gedst( ein) is 叩al to st α g らc( e). 
For al data transfer e' ( s ぅ d' ) which transfers data from port s and satisfies 
stαgesrc (ぜ)ニ stαgesrc ( e ) ， connection eout = (Sreg , d') is added to E. Because the 
register is read at one stage after it is writtenヲ stage number stαgesrc ( eout) becomes 
nstαgedst( ein) + 1. An original connec七ion e' = (s , d') is deleted 仕om E. Until 
no data transfer , which transfers data over pipeline stage boundary, exists in E , 
pipeline insertion procedure is repeated from line one. 
5.2 Controller Synthesis 
The controller synthesis is based on the controller model described in Section 4.4. 
The control logics that depend on processor specification are synthesized from proｭ
cessor specifications, mainly from micrかoperation descriptions. 
The controller synthesis procedure consists of six parts: 
1. Control Signal Extraction 仕om micro-operation descriptions , 
2. lnterlock Condition Extraction, 
3. Branch Condition Extraction, 
4. lnstruction Decoder Synthesis, 
5. Stage Controller Synthesis, 
6. and lnterrupt Controller Synthesis. 
Each synthesis procedures are described in the following sections in detail. 
5.2.1 Control Signal Extraction 
Control signals for declared resources to execute described micro-operation are exｭ
tracted in this step. By analyzing a micrかoperation description of each instruction 
inst , a set of control signal assignments Cinst is generated. A control signal assignｭ
ment cε Cinst is a 七uple (p , v) where p denotes a control input port of a resource and 
66 
v denotes 七he value to the port p. An expression condc.inst represents a condition 
for the control signal assignment of c for instruction inst. The extraction procedure 
of control signal c from a micrかoperation description is explained using example 
shown in Fig. 5.13 as follows. Control signal values for ciηport of “ADDO" and 
enb port of register “RZ" are induced by Fig. 5.2. As a consequence, control signal 
assignments Co and Cl are extracted. 
micro-operation : 
“RZ := ADDO.add(RX ,RY); " 
control signals: 
Co = (ADDO.cin ,O) 
Cl = (RZ.enb , 1) 
Figure 5.13: Control signal extraction. 
After the analysis of micrかoperations ， sets of control signals are merged into C. 
C is obtained as follows: 
C - U Cinst (5.6) 
where 1 isa set of al instructions. C ondc isdetermined as follows: 
Condc - {(cond仰st ，inst) I inst ε I}. (5.7) 
(exp , iηst) ε Cond(p，v) denotes that the value v isassigned to port p when executing 
instruction is inst and condition exp is satisfied. The stage controller assigns the 
value v to p when al the following conditions are satisfied. (1) The status variable 
qk is '1' , (2) the executing instruction of the stage is ir凶， and (3) the expression 
exp holds. The stage controller assigns control signal value for “no-operation" when 
one of the conditions above does not hold. “nかoperation" value means the resource 
of the port p do not change its status during the port p isreceiving the value Vo ・
For exampleヲ the “ncトoperation" value for register write enable port is its negative 
value. 
When the selectors are inserted in datapath synthesis, control signals for selectors 
are also added to C described in Fig. 5.6 and Fig. 5.7. 
67 
E 士プ一一一ーーーー竺竺竺
5.2.2 Interlock Condition Extraction 
To synthesis pipeline interlock control signal lockk ぅ conditions of multi-cycle operｭ
ations and resource confiicts are extracted. In this sect ion, condition extraction 
of multi-cycle operations is described 五rst and then that of resource confiicts is 
described. 
Pipeline interlock logic for multi-cycle operation is synthesized from the Equaｭ
tion (4.1) in Section 4.4.1. Execution conditions of resource operations Uk九叩=
{ (exp, inst) I exp ε Exp ， inst ε I} are extracted 仕om the micrかoperation deｭ
scription of instructions where exp is a conditional expression and inst indicates an 
execution instruction. If operation op of resource r occurs in the micrかoperation
description of instruction inst in the k-th stage, execution condition (exp , inst) is 
added to Uk ，r ，叩 ・ Completion fiag fin叩m = (p , v) is defined for each operation op in 
FHM-DB. fin叩m = (p , v) denotes that the output signal of the port p becomes v 
after the operation OPm is finished. From extracted execution conditions Uk ，r押 and
received completion fiag expression fin叩m 仕om FH :NI-DB, interlock controllogic for 
multi-cycle operation multiJockk is synthesized. 
Suppose Fin is a set of completion fiags of al described multi-cycle operation. 
Fin {fin停m} (5.8) 
OJうin is a set of multi-cycle operations, which have the same completion fiag fin. 
。乃in = {oplfin叩m = fin} 
Then EXPk,fin and h，J仇，exp are calculated by the following equations. 
EXPk ,Jin 
h ,Jin,exp 
{exp I (exp , inst) ε Uk爪恥 op ε OJうin}
{inst I (exp ， inst) ε Uk爪叩 ， op ε01ラin}
(5.9) 
(5.10) 
(5.11) 
EXPk ,fin is a set of execution condition of operation op ε OPfin ・ h，t民自p is a set of 
instructions, which execute multi-cycle operation op εo Pfin when exp holds. 
Suppose 01卯ut(p) indicates output value of port p. Using Equation (5.8) , (5.10) 
and (5.11) multiJockk is represented as follows: 
68 
multi よOCkk
v ( V ((instk ε h，Jin ， exp ) . exp . validk) . (ωtput (p) チ υ))
f仇=(p，v)ε Fin expεEXPk ， fin 
(5.12) 
Signal multiJockk becomes ' 1 う when at least one of the multi-cycle operations is not 
completed. The signal multiJockk becomes ゅう when al the value of completion fiag 
output port p becomes v. 
Pipeline interlock detection logic for resource confiicts is synthesized from the 
Equation (4.4) in Section 4.4.1. A set Vr,k is calculated 仕om the set C , function 
resσurce(p) and stαge(c). A function resource(p) returns resource r of port p and 
stαge(c) returns the stage number in that stage control signal is assigned to port p. 
Suppose Cγ = {c I c = (p , v) ε C, resource(p) - ァト and Cけ = {c I c ε 
Cr, stαge(c) = k}. Usi時 Cr，k ヲ aset Vr,k is calculated. From \九 and Equation (4.4), 
lock signal for resource confiict res_conflictk issynthesized. に，k is also used to 
synthesis control signal selection for resource r. 
Vr.k {iηst I (inst , exp) εCoηdc，. . k ' 今，k ε Cr，k} (5.13) 
res_conflictk V ( V (instj ξL4J)-uαlidj ) ・ (instk ξl勺) • validk) 
7・εR k<j壬η
(5.14) 
5.2.3 Branch Condition Extraction 
By analyzing a micrかoperation description of branch instructions, branch stage 
number b and a condition Br = {( exp , inst) I exp ε Exp， inst ε I} of branch 
are extracted. The conditional expression exp is a condition for the case that the 
program counter PC iswritten. Suppose EXPBr is a set of conditional expression 
exp for branches and 1 Br,exp is a set of instructions, which execute branch when 
condition exp holds. 
The set EXPBr and IBr ,exp are calculated as follows: 
EXPBr - {exp I (exp , iπst) ε Br} (5.15) 
69 
1 Br,exp - {inst I (exp, iηst ) ε Br} (5.16) 
From an Equation (4.8) in Section 4.4 .2 , controllogic of brαnch is calculated as 
folows: 
brαnch validb . ( V (instb ξ IBr、白p) . exp) 
expεEXPBァ
5.2.4 Instruction Decoder Synthesis 
(5.17) 
Instruction decoder inputs instruction word and generates two types of signals based 
on the model described in Section 4.4.1. In this section, resource control signal 
generation is described first , and then instruction identi:fication signal is described. 
The control signals that are independent of datapath status signals are selected. 
Then the instruction decoder logics for selected control signals are generated. A set 
of instructions Ic ,exp ,k which assigns the value v to control port p in the k-th stage 
when condition exp holds is calculated as follows: 
Ic,exp,k - {inst I cεC， (exp , inst) ε Condc ， stαge(c)=k} (5.18) 
Ic,l ,k is a set of instructions which assigns control signal represented by c indeｭ
pendently of datapath status. A set of control signal assignment Cp ,k for the port p 
in the k-th stage is selected as follows: 
Cp ,k = {c I (p ， v) ε C， stαge(c)=k} (5.19) 
!p ,k (inst) is an output function of instruction decoder which generates control 
signal for the port p in the k-th stage. The decoded result !p,k(inst) is sent to each 
pipeline stage step by step via pipeline register. Suppose Zj ,Jp ,k(inst) is a pipeline 
register for !p ,k(inst) between the (j-1)-th stage to j-th stage and d isan instruction 
decode stage. The value of Zk ,Jp ,k(inst) is assigned to the port p in the k-th stage 
when k isless than instruction decode stage d. The decoded result !p,k is directly 
70 
assigned to the port p when k isequal to d. ! p,k and Zよ!p ，k(inst ) are represented as 
follows: 
!p ,k(inst) v v. (iηst ε IC， l ， k )) + υ0 ・八 (inst 要 Ic，l ， k ) (5.20) 
c= (p，v) εCp ， k Cε Cp， k 
Zj ,Jp ,d inst ) . g句 + Zj+l ，fp ， k ( inst) ・ gOj (d<j<k) (5.21) '7+ LJ j+ l ,Jp ,k (inst) 
ZLl,fM(mt)=fp,k(mst). (5.22) 
Function !p,k(inst) returns the value v when fetched instruction inst is a member 
of Ic,l ,k. If fetched instruction inst is not a member of Ic， l ， k ぅ !p ， k(inst) returns “nか
operation" value Vo ・
The other type of instruction decode, the result indicates whether fetched inｭ
struction is a member of a certain set of instructions or not. It is used to generate 
the following control signals: interlock control signal lockk , branch detection signal 
brαnch and resource control signal, which depends on daもapath status. The decoded 
result of m(I, inst) is also send to each pipeline stage step by step via pipeline regｭ
ister Zj，m(I， inst) ・ Suppose Zj ,m(I,inst) is a pipeline register for m (I , inst) between the 
(j -l)-th stage and the j-th stage. Suppose 'inst' isa fetched instruction and 1 is
a set of instructions , function of the latter type instruction decode is described as 
follows: 
m(I , inst) 
Fア+
LJ j+ 1,m (I, inst ) 
rァ+
LJd+ l ,m (I ,inst ) 
(lwh…εf o otherwise (5.23) 
Zj，m(I， inst) ・ gOj+ Zj+l ,m (J,inst ) . gOj (d < j < k) (5.24) 
m (I , inst). (5.25) 
The stage controllers input decoded result Sk ,Jp ,k(inst) and Sk ,m (I ,inst ) that are 
shown in the following equations. 
S-j ZKJK(mst) 
k ,Jp ,k (inst) - ì んバtηst)
f Zk ,m (J ,inst) 
Sk ,m(I,inst) - ~ m (I, inst) 
71 
(d く k)
(d = k) 
(d < k) 
(d = k) 
(k < d) 
(5.26) 
(5.27) 
Eニ 一一一ーー ζーーー一一ヨ
The controller of the k-th stage uses output signal of pipeline register Zk ,* when 
k isgreater than dヲ which represents instruction decode stage number. The stage 
controller of instruction decode stage uses decoded result directly. 
5.2.5 Stage Control1er Synthesis 
The stage controller, which is based on a model in Section 4.4.2 , is described. The 
following items of stage controller depend on architecturallevel processor description 
and are synthesized, and items of finite state machine Mk are generated from the 
model: 
1. interlock detection signal lockk. 
lockk multiJockk + res_conflictk (5.28) 
lockk is a logical sum of multiJockk and res_conflictk. multiJockk and 
res_conflictk are defined in Equation (5.14) and (5.12). Using the result of 
instruction decode shown in Equation (5.23) , mult曻ockk and res_conflictk 
are represented as follows: 
mult曻ockk 
γes_conflíctk 
v ( V Sk ,m(h ,fin ,exp ,inst) . exp . validk ) ・ (p ヂ v))
finεFin expεEXPk ， fin 
(5.29) 
V ( V Sj ，m(昨，j ， inst) ・ validj ) . Sk ,m (Vr ,k ,inst) . validk) 
TεR k三j~η
(5.30) 
2. branch detection signal brαnch that is defined as Equation (5.17). U sing the 
result of instruction decodes shown in Equation (5.23) , brαnch is represented 
as follows: 
brαηch 叩Jμid仇b' ( V Sb，川m叫州(υI恥B酌T巾叩，占e口φz
εxpεEXPBγ 
(5.31 ) 
3. function cαncel (k) that is generated from Equation (4.7) in Section 4.4.2 and 
extracted branch stage number b. 
4. and control signal generation functions. 
Control Signal Generation Functions 
Control signal generation functions are classified into three types, 
1. control signals for resources in the k-th stage, 
2. control signals for multi-cycle operations, 
3. control signals for the resources, which are accessed 仕om multiple stages. 
Control signal of the port p in the k-th stage is generated from Cp,k shown 
in Equation (5.19) and (5.18) and Expゅ Expゅ is a set of expression exp for 
conditional control signal assignment of c. 
EXPc ,k - {exp I cεC， (exp , í)ε Condc ， stαge(c)=k} (5.32) 
Referring to Equations (5.19) , (5.18) and (5.32) , control signal Sp ,k for the port 
p in the k-th stage isrepresented as follows: 
Sp ,k = (V V v. αp-SK，叫ん叫川st) ・ gOk)
cε Cp ， k expεEXPc ， k 
exp手l
十九fp， k(inst) . gOk . (八 八(可+ Sk，m(Jc ， exp ，印刷))
cεCp， k expεEXPc ， k 
exp:;il 
十吾百k. V。 (5.33) 
Control signal Sp ,k becomes the "nかoperation円 value Vo when the k-th stage is 
stalled (gOk = 0). Control signal Sp ,k becomes the value v ifcondition exp holds and 
executing instruction is a member of the set of instructions Ic,exp,k. If any condition 
exp of conditional signal assignrnent does not hold, the result of instruction decoder 
fp ,k( inst) , which is described in Equation (5.20) in Section 5.2.4, is assigned to the 
port p. 
Control signals for multi-cycle operations and confiicted resources are discussed 
in Section 4.4.2. 
Because control signal Sp,k becomes active value of start signal when multi-cycle 
operation should be executed, control signal for multi-cycle operation Sp,k ,start is 
described as follows: 
Sp ,k ,start = f lα9 ・ S刊十 flα9 ・苛EEZE (5.34) 
Using the result of instruction decoder, resource usage condition v;.,k , and control 
signal Sp,k of port p in the k-th stage, control si伊alSp for the port p which is accessed 
仕om multiple stages is described as follows: 
s p 
Selr,k 
V Sp,k. selr 
l<k<η 
Sk ，m(に， k ， inst) ・叫idk . V Sk，m(巧 ， k ， inst) ・叫idj
k<j壬η
5.2.6 Interrupt Controller Synthesis 
(5.35) 
(5.36) 
針。m an interrupt definition, a state machine shown in Section 4.4.3 is synthesized. 
The conditions of state transition 仕om “exe" to “wait" are logical sum of defined 
interrupt conditions. The synthesis method of data-path and control signal values 
to execute described interrupt operation is the same as that of instructions. 
interrupt V condition of defined interrupt i (5.37) 
iEミ Ii札tr
restαrt (Cη，t ~三 Si) (5.38) 
(5.39) 
where 
Si execution cycle count for interrupti 
Cnt counter for interrupts 
羽なlen one of the conditions for specified interrupt holds, the output of the signal 
interrupt becomes '1' and detects interrupts. The counter Cnt counts the number 
of execution cycles of interrupts. 
74 
If one of the interrupt conditions condition holds, the signal restαrt turns to '1' 
and detects interrupts. The counter S isused for interrupt control during the status 
variable keeps qiπtr = O. 
Suppose Cintr is a set of control signals assignment for interrupt intr ・ Suppose
cycle(c) is a function which returns in what stage control signal assignment c is
executed in micrかoperation description of interrupt intr. The control signal for 
interrupts are defined as follows. 
Sp,int = v V . exp . (intr = int一name). (Cηt = cycle(c)) + 
c= (p ，v)ε Cintr 
v v . exp . (intr = inLnαme) . (Cnt = cycle(c)) . Vo(5.40) 
c=(p，v)εCintr 
If execution interrupt is intr ヲ the number of steps since interrupt processing is 
started is equal to cycle(c) and condition exp holds, the control value v isassigned 
to port p. 
75 
Chapter 6 
Experilllents 
In this chapter, the effectiveness of the proposed processor design method and synｭ
thesis method are evaluated through experiments. 
6.1 Objective of the Experiments 
In this chapter five kinds of experiments are described. 
1. existing RISC processor to evaluate variety of instructions that can be designed 
and synthesized by the proposed method. 
2. PEAS-I processor core to comp紅e design quality of synthesized processor to 
that of manually designed one. 
3. embedded RISC controller for comparison between conventional design method 
and proposed processor synthesis method in terms of design time and design 
quality. 
4. pipeline depth tuning. It is aimed for evaluate modification time and efｭ
fectiveness of an adjustment of the number of pipeline stages and operation 
re-scheduling to the pipeline stages 
5. architectural design space exploration for FIR filter. It is aimed for evaluate 
the design time of new instruction specification and the range of explored 
design space. 
77 
正二二竺一一一一一一一一一一一_，.正一ー一一ーーー-ーで=二二二二
To evaluate the effectiveness of the proposed processor synthesis method, prcト
cessor synthesis time , design productivity and the quality of synthesized processor, 
and the largeness of explored design space are examined. Processor synthesis time is 
evaluated using prototyped processor synthesizer of the proposed synthesis method 
on Pentium II processor. Design productivity is evaluated in terms of design time 
and the amount of description. Design productivity is evaluated for both new proｭ
cessor design and architectural design space exploration. The quality of synthesized 
processor is evaluated in terms of area and clock frequency. 
6.2 Basic RISC Processor 
In the first experiment, the easiness of new processor design and its derivative proｭ
cessor design is explained. First, a MIPS R3000 [4] [42] compatible processor PEAS 
R3K was designed. Then, it was modi五edinto DLX [43] for evaluation of the easiness 
of design in micrかoperation level processor specification. 
At the first step, a subset of MIPS R3000 instruction set was implemented. The 
number of implemented instructions is 52 out of 74 instructions of al instructions 
on MIPS R3000. Coprocessor instruction and interrupt instruction were not imｭ
plemented in this experiment. Required time for design was about eight hours. 
Required time for synthesis was about two minutes. 
Table 6.1: Results of Synthesis for PEAS R3K 
component # Total Area Frequency 
(gates) (MHz) 
user specifiedresources 17 45759.22 157.48 
regIsters 20 7064.67 769.23 
selectors 10 2046.08 471.70 
controller 1 2347.18 200.00 
sum of the above 46 57217.17 157.48 
processor 1 59818.34 125.63 
using Design Compiler (0.5pm CMOS library) 
The results of synthesis are summarized in Table 6.1. The column “#" denotes 
the number of components in the processor. “Total Area" indicates the component 
78 
area including wiring area. "Frequency" means the maximal 仕equency of the correｭ
sponding component. "User specified resourcesう， are the resources that are explicitly 
declared by the designer. "Registers，円“selectors ，" and “controllerηare automatiｭ
cally introduced resources by the generator. “Sum of the above" means just the 
summary of al values above in the column. “Processor" is the synthesis result as a 
processor. 
From these experimental results , it is confirmed that automatically generated 
parts does not so much affect area and performance of the processor. Area of the 
generated part is about 15% of the whole processor. Frequency of the introduced 
resources including the controller is relatively highヲ hence they do not include the 
critical path individually. The critical path of PEAS R3K was zerかfiaggeneration by 
AL U and PC update. This path is synthesized from the micrかoperation description 
in the third stage of some branch instructions like "Branch on Equal (BEQ) 円 Micro­
operation description of BEQ is shown in Fig. 3.14 in Section 3.3.6. 
Table 6.2: Results of Synthesis for PEAS DLX 
component # Total Area Frequency 
(gates) (MHz) 
user specifiedresources 14 45758.01 157.48 
regIsters 23 7545.82 769.23 
selectors 15 1960.82 628.93 
controller 1790.78 200.80 
sum of the above 53 57055.39 157.48 
processor l 48469.03 116.28 
using Design Com piler (0.5μm CMOS library) 
At the second step in this experiment , a subset of DLX (called PEAS DLX) was 
implemented based on PEAS R3K. The number of implemented instructions is 51 
out of al instructions 91. Similar to the case of PEAS R3K, coprocessor instruction 
and interrupt instruction were not implemented in this experiment. The reuse ratio 
for DLX design from the description of PEAS R3K is 59% since both architectures 
have many similar instructions. Required time for modi五cation is about 3 hours. 
Table 6.2 shows a logic synthesis results of DLX. 
79 
The amount of descriptions for both PEAS R3K and PEAS DLX is shown in Taｭ
ble 6.3. The amount of description for micrかoperation level processor specification 
is about less than one sixth of the case of the corresponding generated HDL descripｭ
tion. It is clear that proposed processor synthesis method reduces the designer冶
load. 
Table 6.3: Comparison of the Amount of Descriptions for PEAS R3K and PEAS 
DLX 
PEAS R3K 
PEAS DLX 824 
6.3 PEAS-I Processor Core 
PEAS-I core is a processor generated by the PEAS-I system [10]. PEAS-I system 
can generate an optimal processor for a given application program 仕om predefined 
instruction set. Predefined instruction set consists of a primitive instruction set and 
optional instructions. The primitive instruction set contains basic instructions that 
most processors have. 1nstructions in the prirnitive instruction set can be categか
rized into arithmetic instructions, data transfer instructions, and execution sequence 
control instructions. 1n this experiment , the existing design and new one designed 
with PEAS-III's micrかoperation level processor specification are cornpared. First, 
a PEAS-I core from a prirnitive instruction set was designed with PEAS-III. The 
instruction set contains 85 instructions. Then, this processor was extended with 
adding multiply instructions. 
The result of the first step is shown in Table 6.4. The column “original" corｭ
responds to the case of the original design, and the column “with PEAS-III" corｭ
responds to the case of the design with PEAS-III. Workload for the design with 
PEAS-III is about one third compared to the original one. Maximum delays of each 
design area almost the same and area of the design with PEAS-III is 20 % larger 
80 
than original design. 
Table 6.4: Result of PEAS-1 Core Design 
" original I with PEAS-III 
work load (hour) 96 32 (札
lines in the 6431 7194 
HDL description (1038 for MOD) 
maximum delay (ns) 9.80 9.74 
# of gates 22,247 26 ,970 
(吋 ) includes learning about the system and improvement of MODs. 
Next in this experimentう this processor was extended with additional multiply 
with signedjunsigned operations using PEAS-III. The result of the logic synthesis is 
shown in Table 6.5. To implement the multiply instructions, several functional units 
for multiply operation can be selected. Using the proposed method, this selection 
is done by speci令ing the parameters for the multiplier in the resource declarations. 
Pipeline interlock logic is automatically generated and the designer has no need to 
design pipeline control logic. 
Table 6.5: Delay and Size of PEAS-I Core with Multiply Operation 
Design 11delay (ns)|Area (gate) 
under 100MHz 
SR 17.93 49567.8 
SL 9.77 49946.5 
CR 9.7"8 67905.8 
仁L 9.72 75089.6 
under 200MHz 
SR 17.69 50784.4 
SL 7.68 51828.9 
CR 6.93 69351.8 
CL 6.07 76577.2 
S: sequential circuit implementation，仁: combinational circuit implementation; R: using 
ripple carry adder, L: using carry-lookahead adder. 
81 
6.4 Embedded RISC Controller 
Table 6.6: Work Load for Designing a RISC Controller 
This experiment is aimed for comparison between designs with conventional method 
and designs with the proposed processor synthesis method used in PEAS-III. The 
original controller that is used for image processing was designed by manual RT -level 
description. A compatible controller was designed with PEAS-III in this experiment. 
This RISC controler has Harvard architecture. The instruction width is 24 bits. 
The number of instructions is 54. The controller consists of three-stage pipeline. It
has synchronous interrupt facility. 
An undergraduate student designed this controller with PEAS-III. He had no 
experience of processor design with PEAS-III at the beginning of this experiment. 
Design proceeded as the following way. 
works 1l hue叩 lmodiheatlon
(hour) 1 (hour) 
selecting resourc邸 3 
determining instruction set architecture 12 8 
writing micrかoperation description 40 2 
modiちring errors 2 2 
total ? ?? 13 
1. He learned the usage of PEAS-III. 
Examples of instruction code assignment of both 24-bit and 32-bit are shown in 
Fig. 6.1. In this example, code assignments of ADDU (add unsigned) instruction 
are shown. Fields named like “opr1" in Fig. 6.1 is referred in the micrかoperation
description. 
An example of a portion of a micrかoperation description is shown in Fig. 6.2. 
In this exampleヲ the micro-operation description of ADDU instruction is shown. 
It consists of behavior of each stage. At the stage 2, the value of operands are 
referred using the names "opr2" and “opr3." As shown in this example, modi五ca­
tion of instruction codes can be done without modi五cation of the micro-operation 
description. 
The column “modi五cation" of the Table 6.6 shows the required time for this 
work. 
2. He designed the controller with 32 bi七s for instruction width. 
3. He modified the design to fit 24 bits for instruction width. 
The time required for learning PEAS-III is about seven hours. The learning 
includes reading manuals and trying design with a sample processor attached to 
PEAS-III. 
In the 五rst design, he designed the 32 bits instruction width for ease of the code 
assignment う because the code assignment of the original instruction set was not 
given. He implemented al 54 instruc七ions. The workload for this work is shown in 
the column “自rst design" of Table 6.6. The total required time is 58 hours. Though 
he was not familiar with PEAS-III, he designed a processor in a few days. The 
designed controller has various addressing modes and special registers and resister 
files. Because the complex addressing mode makes the micrかoperation descriptions 
difficult , design time became longer than other processors in the experiments. 
In the second design, he modified the first design concerning about the instruction 
width. The main work was 七he modification of instruction format. While some 
trivial modifications were required, the most part of the micrcトoperation description 
was reused. 
addu (24・b比)
|1001 lop什叩ロ lop市 lopr4 I 0000 ! 
31 26 25 22 21 18 17 14 13 10 9 0 
|∞00∞ lopr1 IOPr2 ! op白 !opr4 ! 0∞∞∞∞o 
Figure 6.1: Difference of Instruction Code of ADDU between 24-bit and 32-bit in 
RISC controllers. 
The design quality in terms of area and available clock 仕equency are also exam-
82 
83 
stage 1 IR := IMEM[PC]; 
PC.inc(); 
stage 2 DECODE(IR); 
$sr1 := freg.readO(opr2); 
$sr2 :=台eg.read1(opr3); 
stage 3 ($result ぅ $aluflg) := ALU.addu($srl う $sr2 ， '0'); 
alufl.g := $alufig(2) & $alufig(3); 
仕eg.writeO( opr1, $result); 
The improvement includes changing the rnicro-operation scheduling to the pipeline 
stages. 
PEAS R3K-5 is an extended version of PEAS R3K for data forwarding. Design 
time for forwarding extension was about half an hour. 
6.5.1 Changing the Number of Pipeline Stages 
12.7k 
12.9k 
14.3k 
14.6k 
Changing the number of pipeline stages may lead to change the critical path and 
the number of pipeline registers. In other words , both performance and hardware 
cost can be improved by proper choice of the number of pipeline stages and micrcト
operation scheduling to the pipeline stages. 
Hardware cost is approximately linear to the number of pipeline stages. Because 
the number of pipeline registers increases in proportion to the increase of the number 
of pipeline stages. 
On the other hand, maximal frequency is more complicated. If operations in the 
critical path can be divided into di宜'erent stages by increasing the number of pipeline 
stages, the length of the critical path can be reduced. However, if operations in the 
critical path cannot be divided into different stages, the length of the critical path 
cannot be reduced. 
The critical path of PEAS R3K-5 was the path from pipeline register to program 
counter (PC) through ALU and stage controller in the third stage. ALU comp紅白
operands stored in pipeline registers and output zerかfiag ， then the stage controller 
decides whether branch is executed or not and sends control signal for PC to update 
its value. 
Figure 6.2: Micr仁トoperation Description of ADDU in RISC Controllers. 
ined in this experiment. The generated HDL description of 32-bit version of a RISC 
controller and the HDL description of real RISC processor were synthesized under 
the same condition. Table 6.7 shows the result. Two target 仕equencies 50 MHz and 
108 MHz was set up for logic synthesis. Given proper constraint for logic synthesis, 
both controllers have achieved these frequencies. N ote that the original controller 
has several instructions that were added to the original instruc七ionset for extension, 
and they were not implemented in the controller designed with PEAS-IIL Though 
rough comparison of the values for the 紅eas is not justi五ed enough, there seems no 
remarkable difference. 
Table 6.7: Comparison of the Design with PEAS-III and with Conventional Method 
for a RISC Controller 
ハり一
ZZ
M
一
HH
C
一
MM
LU
一1j-AU00
・出一
U一伊
m
w一加一川一吋一
e
d-b
一・回
uf
一一
S
廿一山一疋m一明一例
|I PEAS-II I conventional method 
32 24 
58 420 
(using CMOS 0.25μm library) 
To change the number of pipeline stages from five to fourヲ llllcrかoperations in 
the fourth stage and the 五fth stages were merged (PEAS R3K-4) because there 
were not critical path in these stages. The critical path of PEAS R3K-4 was the 
same as PEAS R3K-5. To change the number of pipeline stages 仕om four to three, 
arithmetic and logic operations, address calculation operations in the third stages 
紅e merged into previous stage and branch operation was merged into next stage 
(PEAS R3K-3). Because delay time of a sequential operations such as address 
calculation operation by ALU and memoηr access operation need longer time than 
6.5 Pipeline Stage Tuning 
In this experiment the number of pipeline stages of PEAS R3K-5 was varied from 
three to five. Then, the design improvement for clock frequency was described. 
84 85 
E ァ一一一一一一一一一ーで二コ
品位PS R3K-5 お任PS R3K-4 h征PSR3K-3 
Instruction Fetch ト→ Instruction Fetch ト→ Ins汀uctionFetch 
2 Ins住uctionDecode ト→ 2 Instruction Decode Instruction Decode Operand Fetch Operand Fetch 卜、 Operand Fetch 
Arithmetic and Logic Arithmetic 担dLogic 2 Arithmetic and Logic 
Operation 一一+ Operation 
片J Operation 
3 
Address Calculation 
3 
Address Ca1culation Address caJculation 
Branch IRr月n c:h Branch 
4 Memory Access \、 Memory Access 3 Memory Access 4 』・ー... Write Back 
.--'" WriteBack 5 Write Back 
Figure 6.3: Scheduling Result of Micrひoperations to the Pipeline Stages. 
other operations, scheduling these two operations into different stages is preferable. 
To keep the branch stage same as PEAS R3K-5 and PEAS R3K-4 , branch operation 
was scheduled to the third stage. Figure 6.3 shows a scheduling result of PEAS R3K-
4 and PEAS R3K-3. 
Table 6.8: Comparison of the Design that has Different number of Pipeline Stages. 
# of stages 
3 
企eq. (MHz) I # of gates (k gates) 
95.0 I 57.4 
4 1 121.1 1 60.4 
5" 119.9 1 62.3 
using Design Compiler (0.5μm CMOS library) 
The number of gates in Table 6.8 is approximately linear to the number of 
pipeline stages. The difference of clock 仕equency between four-stage processors 
and five-stage processors is caused by the difference of the logic of decoder and 
autornatically inserted selectors. 
The time of each modification for changing the number of pipeline stages is 
les than 20 minutes. In micro-operation level processor specification, changing the 
number of pipeline stages needs rewriting the micrかoperation description. 
86 
6.5.2 Clock Frequency Improvement 
For the improvement of clock frequency, there were two ideas for changing operation 
scheduling to the pipeline stages. 
(a) One was to move the branch stage to the next stage and divide the critical 
path into two stages as follows: comparison by AL U and zerひflag generation, 
and conditional program counter update. This modi五cation increased branch 
penalty. As a result , an execution cycle becomes increased. 
(b) The other was addition of dedicated comp訂ator to shorten the delay time of 
comp紅白on and zercトflag generation. 
Table 6.9 shows design modi五cation result for (a) and (b). Because comparison and 
branch operations were divided into different stages in the design of PEAS R3K一三
the design of R3K-3 (original) and R3K-3 (a) were the same. 
Table 6.9: Design Quality of Clock Frequency Improvement. 
original (a) (b) 
# of freq. # of gates freq. # of gates freq. # of gates 
stages (MHz) (k gates) (MHz) (k gates) (MHz) (k gates) 
3 95.0 57.4 95.0 57.4 100.7 57.3 
4 121.1 60.4 144.3 62.8 140.4 61.1 
5 119.9 62.3 141.2 64.5 131.8 62.2 
using Design Compiler (0.5jLm CMOS library) 
Frorn the result shown in Table 6.9, it is confirmed that both rnodification of (a) 
and (b) irnproved clock 仕equency. Division ofbranch stage and comparison stage (a) 
made clock 企equency higher than addition of dedicated cornparison (b). However, 
considering the branch penal ty increase of processor (a) , whether the execu tion 
time of processor (a) is less than that of (b) or not it depends on an application 
program. Ifan application progr出n includes rnany branch instructions which are 
taken 台equently， execution cycles of (a) becornes rnuch larger than that of (b) , and 
execution time of (a) becomes larger than that of (b). 
87 
On the 0七her hand, the area of (a) was increased because additional pipeline 
registers were required. The area of (b) was also increased. Add咜ional comp紅ator
made the area increase. 
The design modification time of (a) and (b) was only two or three minutes for 
each mod凬ication. For the design of (a) , micrかoperation description of branch 
and jump instructions were modified with moving branch operation to the next 
stage. For the design of (b) , resource declaration for dedicated compぽator was 
added. Moreover, micro-operation description of branch and jump instruct卲ns were 
modi五ed with replacing comparison resource 仕om AL U to added comp紅ator .
initialize ar and aj; 
while (1) { 
retrieve xr [OJ 臼d xj[OJ from input; 
yr = 0; 
yj = 0; 
for (i = M; i > = 0; iー) { 
yr +=む [M -iJ * xr [iJ -aj [M -iJ * xj [iJ ; 
yj +=む [M -iJ * xj [iJ + aj [M -iJ * xr [iJ ; 
xr[iJ = xr[i -1J; 
xj [iJ = xj [i -1J; 
???、α但?? ???p
‘ 
+し?
??
6.6 Design Space Exploration for DSP Applica田
tion Figure 6.4: Pseudo Code of an FIR Filter. 
An FIR filter is one of applications in digital signal processing area. In the second 
experiment , modules to calculate the following equation are designed as ASIPs: 
6.6.1 Customization of PEAS R3K 
M 
y[η] =乞 αi x x[N -i] (6.1) 
To improve performance, three types of new instructions are added. As another 
architectural design space exploration, the effect of changing the number of pipeline 
stages is examined. 
where α ， x , and y are complex numbers. 
Speci五cation of the filter module is as follows. Data size of inputjoutput value 
is 32 bits. It consists of two 16-bit parts. The higher 16 bits corresponds to the 
real part of the complex number and the lower 16 bits corresponds to the imaginary 
part. Both parts are fixed point representation. Input data are provided to the filter 
module at specified intervals. Output data must be produced before the next data 
input. The result of the calculation is rounded to round-もかnearest.
An algorithm of th﨎 filter 﨎 shown in Fig. 6.4. This is a stra刕htforward imｭ
plementation of Equation (6.1). Variables ar and aj correspond to real part and 
imaginary part respectively of coe白cientsαi in Equation (6.1). Variables xr, xj , yr, 
and y j follow the same manner. 
A program of the filter is coded for PEAS R3K processor in assembly language. 
The code size is 1631ines. 
Adding New Instructions 
Complex MAC Complex MAC type instructions consist of complex MAC operｭ
ation and related operations such as initialization of complex MAC operation. The 
instruction ' cmult う performs multiply, accumulatíonヲ androunding. By introducing 
instructions related to complex MAC operation, drastic improvement of execution 
cycles of the application is expected. 
To implement the Complex MAC type instructions, a complex MAC module 
was designed. A block diagram of the module is shown in Fig. 6.5. This MAC Unit 
simultaneously calculates complex multiplication and addítion, in other words , real 
part and imaginary part computation, at once. It also includes a round-tかnearest
rounding function. 
To add Complex MAC type instructions to PEAS R3K, instruction definitions 
and micrかoperation descriptions were added by the designer. The micrかoperation
88 89 
a b 
start 
fin 
load 
result 
Figure 6.5: Block Diagram of a Complex MAC Unit. 
description of 'cmult う instruction is shown in Fig. 6.6. In this description , it is 
specified that the pipeline is proceeding with instruction fetch at stage1 , decoding 
of fetched instruction at stage2, execution of complex MAC operation with complex 
MAC module 'CMACO' at stage3. As shown in this example, multi-cycle operations 
do not need supplemental description compared to single cycle operations since 
proposed processor synthesis method can detect multi-cycle operations and generates 
the controller with multi-cycle handling. 
stage1 IR := IMEM[PC]; PC.incO; 
stage2 DECODE(IR) ; 
$rs:ニ GPR.readO(rs);
$rt := GPR.read1(rt); 
stage3 ($result , $fl.ag) :二 CMACO.mac($rs ， $rt); 
stage4 
stage5 
Figure 6.6: Micrcトoperation Description of cmult (Complex MAC). 
90 
Modulo Addressing Modulo addressing is one of addressing modes to calculate 
address for queues. In the algorithm in Fig. 6.4, buffer x for preceding inputs has 
overhead of load/store. Using Modulo addressing, when load/store instruction is 
executed, the next address is also calculated in the instruction. By introducing 
instructions related to Modulo addressing, some improvement of execution cycles of 
the application is expected. 
Since these instructions require no special resources ぅ the designer only added 
instruction definitions and micrひoperation descriptions for introducing these inｭ
structions. 
Loop Loop instruction is one of branch instructions. The loop instruction perｭ
forms decrement of counter and branch as a single instruction. Though the imｭ
provement of the number of the execution cycles is at most one instruction per 
iteration, relatively large improvement can be expected for the iteration of short 
basic block length. 
To implement Loop instruction, the designer added instruction definitions and 
mlcrかoperation descriptions for introducing these instructions. 
6.6.2 Pipeline Stage Thning for Derivative Processors 
The number of pipeline stages of derivative processors, which are added the instrucｭ
tion described in the previous section, was varied from three to 五ve. Micrかoperation
re-scheduling that is described Section 6.5 is also done to improve clock frequency. 
6.6.3 Results of Design Space Exploration for DSP Appliｭ
cations 
Results of logic synthesis for each modification are shown in this section. Design 
times for each modi五cation 紅e also shown. 
Results of Adding Instructions 
Five derivative version processors have been designed. Let M mean the processor 
including complex MAC instructions, L mean the processor including Loop instrucｭ
tions, and A mean the processor including modulo addressing instructions. Results 
91 
of logic synthesis, i.e. , the number of gates and maximal clock frequencyヲ and the 
nurnber of execution cycles for calculating a single output value for M = 128 are 
summarized in Table 6.10. 
Table 6.10: Design Quality for Each Processors 
processor max frequency # of gates # of execution 
(MHz) (k gates) cycles 
original 119.9 80.0 23932 
M 
し
MA 
ML 
MAL 
101.8 71.4 3893 
104.4 64.7 23805 
100.0 92.0 
102.7 73.6 
100.7 94.8 
using Design Compiler (0.5μm CMOS library) 
M : including complex MAC instructions 
L : including Loop instructions 
A : including Modulo Addressing 
3507 
3766 
3509 
In Table 6.10, the number of execution cycles is drastically reduced by introducｭ
ing CMAC type instructions. In this case, maximal 丘equency of processor decrease 
approximately 30%. 
Table 6.11: Design Time for Each Instructions 
instructions I time (hour) 
original instructions I 8.5 
CMAC 0.8 
Mod. Addr. 
Loop 
0.5 
0.8 
Design time for each processors is shown in Table 6.11. Original PEAS R3K 
processor has been designed in eight hours. To add new instructions, less than one 
hour was required in any type of instruction in this experiment. Furthermore, any 
processor, which has any combination of already designed instruction, can be easily 
synthesized by PEAS-II. 
92 
Results of Pipeline Tuning 
Results of logic synthesis for each design are summarized in Table 6.12. The colｭ
umn “model" denotes the variation of instruction set addition shown in Table 6.10. 
The column “type" denotes variation of clock frequency irnprovernent shown in Secｭ
tion 6.5.2. 
As rnentioned in Section 6.5, the nurnber of gates in Table 6.12 is approxirnately 
linear to the number of pipeline stages, too. Clock frequency for three stage derivaｭ
tives is about 40 % less than that of four and five stage derivatives. Clock frequency 
of both four and five stage derivatives is alrnost the same. 
Table 6.12: Design Quality for Changing the Number of Pipeline Stages 
# of pipeline stages 
model 
# of exec. 3 stages 4 stages 5 stages 
type cycles 仕eq. area freq. area 仕eq . area 
(MHz) (k gates) (MHz) (k gates) (MHz) (k gates) 
original ong 23932 95.0 57.4 121.1 60.4 119.9 62.4 
a 24063 95.0 57.4 144.3 62.8 141.2 64.5 
b 23932 100.7 57.3 140.4 61.2 131.8 62.2 
ML ong 3766 66.3 70.9 101.9 72.6 102.7 73.6 
a 3895 66.3 70.9 102.8 75.4 100.6 75.3 
b 3766 65.8 71.4 102.6 73.2 104.0 73.8 
MA ong 3507 73.4 85.9 101.9 89.1 100.0 92.0 
a 3636 73.4 85.9 98.3 91.8 98.9 93.9 
b 3507 72.2 86.5 102.0 89.2 101.5 92.1 
MAL ong 3509 65.3 89.3 98.0 92.4 100.7 94.8 
a 3638 65.3 89.3 104.8 94.2 102.7 96.3 
b 3509 65.1 89.3 103.8 92.7 101.2 93.9 
using Design Compiler (0.5μm CMOS library) 
Relationship between area and execution time for FIR filtering application is 
plotted in Fig. 6.7.τ同de-offbetween area and execution time is plotted in Fig. 6.8. 
At various design constraint , various architecture candidates can be selected in terms 
of the number of pipeline stages, extension instruction set and so on. 
Design time of derivative processors in terms of pipeline tuning was within an 
hour per one model. Total design time of pipeline tuning for four models in Ta-
93 
3∞ 
ble 6.12 took four hours. Total design time of al derivatives, which was the addition 
design time of new instructions and design time of pipeline tuning, was about six 
hours. 
ﾗ 
200 X X 
'" E 
ﾘ"150 
E 
• 
×交×
100 
50 
X 
X>ac: x 
ﾗ 潟〈
×獄滅XX
。
60 65 70 75 80 85 
Aaea(K gate) 
90 95 1∞ 
Figure 6.7: Area and Execution Time for al Derivatives. 
2∞~ X oriﾇJ4 
×町り4b
XonﾇJ4a 
150 
帥E)
@E
一』
1∞ 
50 
ML3X 
_ ML4b 
M5 X)湾政 ML5b 
ML4 ML5 MA4 >伽A4b XMAL4b 
。
60 65 70 75 80 
Aa回(Kgate)
85 90 95 
Figure 6.8: Trade-off of Area and Execution Time. 
94 95 
E士一一一一一一一一一一一一一 一一一一一一竺竺竺三
Chapter 7 
Discussion 
In this chapter, the effectiveness of the proposed processor design and synthesis 
method is discussed with the results of experiments. The effectiveness is discussed 
at the following points: 
• largeness of explored design space, 
• design and design exploration time, 
• and design quality. 
7.1 Design Space 
The proposed synthesis method supports portion of the architectural characteristics 
shown in Chapter 3. The supported items are as follows: 
• hardware module con五guration ，
• storage units organization, 
• pipeline organization that includes the number of pipeline stages and micrひ
operation assignment to the pipeline stages, 
• structural hazard detection and pipeline interlock control synthesis, 
• predict-not-take based delayed branch, 
• and external interrupt. 
97 
From experimental results, e百ectiveness of design space exploration with these arｭ
chitectural variations was shown. 
In the experiments, design space was explored in terms of the following points: 
Design space was explored in terms of the following points: 
• hardware module configuration. Hardware configuration includes changing 
resource parameters and addition of new resources. Changing resource paｭ
rameters for PEAS-I core, which is shown in Section. 6.3. Addition of new 
resource for clock 仕equency improvement is shown in Section 6.5.2. 
• instruction bit width. The instruction bit width of embedded RISC controller 
was changed 仕om 32-bit to 24-bit. 
• the number of pipeline stages. The number of pipeline stages for PEAS R3K 
processor and derivative processors for DSP applications were changed. It is 
shown in Section 6.5 and Section 6.6.2. 
• operation scheduling to the pipeline stages. Changing the stage of branch 
operation for PEAS R3K and DSP derivative processors is shown in Section 6.5 
and Section 6.6.2. 
• organization of storage units. Complex addressing modes for special registers 
and multiple register 五les were designed. Complex addressing modes were 
shown in Section6.4. 
Furthermore proposed synthesis method has a potential ability for designing 
complex mernory architectur弘 such as rnernory-rnernory architecture, non-harvard 
architecture, multiple port rnemory and so on. Synthesis of structural hazard deｭ
tection and pipeline interlock logic enables to design such processors. 
Proposed synthesis method can deal with rnuch larger design space than that 
of existing prepared processor based systems. Design space is enlarged in terms of 
instruction bit width, user-defined pipeline organization in terms of the number of 
pipeline stages, the number of delayed branch slot, role of each pipeline stage and 
multi-cycle operations, storage unit organizatio民 and user-defined external inter-
rupt. 
98 
For the further expansion of the design space, extension for out-of-order instrucｭ
tion issue and out-of-order completion, VLIW architecture, and internal exception 
are required. When target processor uses a functional unit that has long latency 
to calculate the result, the pipeline organization with out-of-order completion is efｭ
fective. The processor with out-of-order completion can execute other succeeding 
instructions while executing instructions that have long latency. For the applicaｭ
tions which requires high performance, the processor with out-of-order instruction 
issue and VLI羽T processors are suitable to execute multiple instructions at the same 
time. 
Extension for branch mechanism is also required. The synthesis method cannot 
deal with the non-overhead loop instructions which are popular in DSP application 
because branch architecture is fixed to predict-not-take base delayed branch. 
7.2 Design Time and Design Space Exploration 
Time 
7.2.1 Design Time for New Processors 
With the higher abstraction level processor specification than RT level , design tirne 
of the ASIPs are drastically reduced. Higher abstraction level processor description 
contributes the easiness of the design. 
Frorn the experimentsヲ reduction of the design tirne was shown. Design tirne 
for ASIP with proposed rnethod was about three to seven times shorter than those 
for conventional RT level design as shown in experirnents. Compared with other 
processor description language AIDL [24], AIDL needs 37 hours to design 23 inｭ
structions of PA-RISC processor. AIDL includes cornplex specification descriptions 
for complicated processors. Frorn the results, it is obvious that micrφoperation 
level processor description is effective for shortening design tirne of straightforward 
pipelined processors. 
99 
7.2.2 Design Time for Derivative Processors 
Proposed micro-operation level processor specifcation also reduces design exploｭ
ration time compared with that of RT level processor specification. Synthesis of 
datapath structure and controler reduces design and des?gn mod凬ication time of 
them and enables the designer to change the architecture in a short time. 
From the experiment s, turn around t匇e for derivative processor designs was 
shown. The derivative processor designs includes the following rnodi五cations:
• changing resource parameters. The designer change pararneters for the sake of 
the evaluation of various hardware module which has same functionality and 
different design quality in the sense of area, clock frequency and execution 
cycles. This modification needs only few seconds per one parameter. 
• addition of application speci五c user-defined instructions. The designer defines 
instruction format and describe rnicrかoperation description of new instrucｭ
tions. This rnodification takes ten to 五fteen minutes per one instruction of 
DSP ins七ructions shown in Section 6.6. 
• addition of new resource. The designer declares additional resources to gain 
performance of the design. It takes only a few rninutes. 
• changing the number of pipeline stages and changing operation scheduling 
to pipeline stages. The number of pipeline stages will be decreased for the 
reduction of the 紅白. On the other hand, the number of pipeline stages 
will be increased to reduce the delay tirne of the critical path. The designer 
also changes the stage of rnicrφoperation to reduce the delay tirne of critical 
path. Changing the nurnber of pipeline stages and operation scheduling to 
the pipeline stages requires re-scheduling of rnicro-operation to the pipeline 
stages. The changing time is within a rninutes per one instruction. From the 
experiments, pipeline tuning takes 20 minutes for the PEAS R3K processor 
that has 52 instructions. 
Large design space has been successfully explored. The trade-off of the design is 
found. The designs of 12 derivatives were tried in a day. 
100 
Though design and design modi五cation time isvery short , evaluation and valiｭ
dation time for the designed processor makes turn around time long. Efective and 
rapid estimation and critical path analysis for synthesized processor are required. 
For more e伍cient support of design space exploration, optimization of resource 
selection , instruction format decision, the number of pieline stages and micrか
operation assignment for pipeline stages are required. 
7.3 Design Quality 
In the design quality of synthesized processors and manually design processors, clock 
fおquencies of thern 訂e almost the same. The area of synthesized processors is about 
20% larger than those of manually designed processors. Though the area is inferior 
to manual design, the advantage of effective design space exploration has an impact 
on the total design quality. The disadvantage on the area does not affect so rnuch. 
To improve the design quality of synthesized processor, optimization of selector 
and pipeline register insertion is required. For the reduction of pipeline registerぅ
pipeline register sharing could be effective. However, sharing the register needs adｭ
ditional selectors. On the other hand, when critical path includes automatically 
inserted selectors, moving the selector to the previous pipeline stage or to the next 
pipeline stage, if possible, reduces the critical path. Pipeline register sharing and 
selector insertion stage optimization based on an RT level rapid and accurate estiｭ
mation irnprove the design quality. 
101 
E ...'---- ， . τ ・ 一±
Chapter 8 
Conclusions and Future Work 
In this thesis a micro-operation level processor specification and processor synthesis 
method is proposed for the architectural design space exploration of ASIPs. 
8.1 Conclusion 
In this thesis, micro-operation level processor specification for architectural design 
space exploration hωbeen discussed. The specification includes a parameterized 
pipeline structure in the sense of the number of pipeline stages and roles of the 
pipeline stages. Furthermore , a complex mechanism that includes pipeline control 
logic, designing a datapath structure is not needed. The easiness of the design and 
design modification and flexibility on the processor enables architectural exploration 
of a large design space in a short design time. 
For processor synthesis, a processor model has been examined. To deal with fl.exｭ
ibility in pipeline depth of target processor, datapath and controller is divided into 
pipeline stages. The sequence of datapath and controller models of each pipeline 
stage organizes the pipelined processor. The organization of each pipeline stage 
conもroller ， instruction decoder and external interrupt controller are discussed. The 
pipeline hazard detection and control mechanism that includes pipeline interlock and 
pipeline fl.ush are formalized. The pipeline control model and pipeline hazard detecｭ
tion mechanism are used to synthesize pipeline control logic 仕om mlcrcトoperation
level processor specifications. 
Processor synthesis method 台om illicrcトoperation level processor specification is 
103 
proposed. Each part of the datapath synthesis, such as data fiow graph generat ion, 
signal conflicts resolution and pipeline register insertion are described. Instruction 
decoder synthesis, pipeline controllogic synthesis and interrupt controler synthesis 
are also described. The synthesis method supports user-defined pipeline organization 
in the sense of the number of pipeline stages and the number of delayed branch slot , 
multi-cycle operation , structural hazards resolution, and external interrupts. The 
wide su pport for the pi pelined processor enables exploration of a large design space 
for ASIPs. 
Through the five kinds of experiments, the e旺'ectiveness of the micrcトoperation
level processor design is confirmed. The design time and the architectural design 
space exploration time are reduced while keeping the fiexibility in the pipeline organiｭ
zat ion, hardware configuration and so on. A large design space has been successfully 
explored. The trade-off of the design is found by trying 12 derivative processors. 
The designs of 12 derivatives were tried in a day. 
In this thesis , micrcトoperation level processor specifcation and a processor synｭ
thesis method for architectural design space exploration for ASIPs are proposed. 
It is confirmed that in using the proposed method, large design spaces ぽe easily 
explored in terms of the number of pipeline stages and delayed branch slots and 
hardware module configuration, user-defined instructions interface ports and exterｭ
nal interrupts, and operation scheduling to pipeline stages. 
8.2 Future Work 
Future work for further architectural design space exploration includes the expansion 
of design space and reduction of design exploration time. Improvement of the design 
quality of synthesized processors is also a future goal. In the following section, the 
future directions of these articles are described. 
8.2.1 Design Space Expansion 
Supports for the following characteristics described in Section 3.1 to enlarge design 
space are required. 
104 
• branch prediction mechanism and non-overhead loop , 
• out-of-order completion, 
• out-of-order instruction issue, 
• and other interrupt and exception. 
Instruction fetch module synthesis isrequired to extend branch mechanism beｭ
cause branch control is closely related to instruction fetch. Parameterization of 
instruction fetch module and branch architecture, and consideration of their synｭ
thesis method enable the system to deal with various branch architectures. The 
parameter seems to include instruction bit width, increment step, predict-taken or 
predict-not-taken or dynamic branch prediction with branch-prediction buffers or 
that with branch-target buffers, and parameters of buffers. 
Processor model extension for multiple pipeline sequence enables super-scalar 
and VLIW type processor synthesis. However, the extension for multiple pipeline 
sequence makes hazard detection and the resolution algorithm more complex. For 
out-of-order instruction issue, reservation station synthesis is also required. 
For the support of the precise exception, restriction of instruction speci.fcation 
and exception handling should be discussed. Extension instruction fiash and restorｭ
ing the processor status mechanism for exception is also required. 
8.2.2 Design Exploration Time Reduction 
For the further red uction of the design and design modi五cation time, optimization 
of micro-operation level processor speci.fication is required. The target of the optiｭ
mization includes instruction format assignment , resource parameter selection, the 
number of pipeline stages and micro-operation assignment for pipeline stages. 
The critical path of each pipeline stage can be reduced by changing the pipeline 
stage assignment of micrかoperations and hardware module parameters. 
Data fiow graph generation from micrかoperation description and ASAP (as soon 
as possible) base scheduling with design constraint enables optimization of microｭ
operation assignment to the pipeline stages. At the same time, the parameter selecｭ
tion of resources should be done because the design qualities of the resources affect 
the delay time of the critical path. For the scheduling and resource parameter seｭ
lection, fast and accurate estimation of micrかoperation level processor specification 
is required , too. 
8.2.3 Improvement of the Design Quality 
The design quality of the synthesized processor is slightly inferior to that of manual 
design using RT level HDL description. For the improvement of the design quality, 
optimization of selector and pipeline register insertion taking into account the tradeｭ
of between clock frequency and hardware cost is required, as discussed in Section 7.3. 
106 
Bibliography 
[1] Electronic Industries Association of Japan (EIAJ) ぅ EDA Technology Roαdmap 
Toωαrd 2002-Cyber-Gigα-Chip Desigη Technology， apr 1998. 
[2] International SEMATECH, lnternαtional Technology Roαdmα，p for Sem犯on­
ductors 1999 Edition, 1999. 
[3] J. Staunstrup and W. Wolf, eds. , H，αrdware/Software Co-Design: Principles 
αηd Practice. Kluwer Academic Publishers, 1997. 
[4] G. Kane, mips RlSC Architecture. New Jersey: Prentice-Hall, Inc. , 1988. 
[5] Hitachi Ltd・ぅ S叩erHTMRlSC E句ine SH7020 and SH7021 H，αrdωαreMαnual， 
rev. 3.0 ed., Nov. 1999. 
[6J Hitachi Ltd., S叩erH™ RlSC Engine SH7604 HαァdωαreMαnual， rev. 3.0 ed., 
Nov. 1999. 
[7J S. Furber, ARM System ARchitecture. Addison Wesley Longm仏 1996.
[8J Advanced RISC Machines Ltd. , ARM7TDMl Dαtα Sheet， Aug. 1995. 
[9] S. B. Furber, VLSl RlSC Architecure and Orgαnization. Marcel Dekker Inc. , 
1989. 
[10] J. Sato, A. Y. Alomary, Y. Honma, T. Nakata, A. Shiomi, N. Hikichi, and 
M.lmai,“PEAS-I: A HardwarejSoftware Codesign System for ASIP Developｭ
ment ," lEICE Trαnsαctioηs on Fundαmentαls of Electronics, Communications 
αnd Computer Sciences, vo1. E77-A, pp. 483-491 , Mar. 1994. 
107 
[1] B. Shackleford, M. Yas吋a， E. Okushi, H. Koizumi , H. Tomiyama, and H. Yaｭ
suura,“The Satsuki Intergrated Processor Synthesis and Compiler Generation 
System ," in Proc. of the Syηthesis and System Integr，αtion Mixed Techinologies 
(SASIMI '96人(Ft止uoka， Japan) , pp. 135-142, Nov. 1996 
[12] M. Small, An Iηtro to The First U，貯 Deβηαble Processor. ARC Cores Ltd. , 
Nov. 1998. 
[13] M. R. Borbacci and D. P. Siewiorek, The Desigηαηd A nalysis 0 f 1 nst問ctioη
Set Processors. McGraw-Hillう 1982.
[14] P. G. Paulin, C. Liem, T. C. May, and S. Sutarwala, Code Generation for 
Embedded Processors, ch. 4. Kluwer Academic Publishers, 1995. 
[15] Tensilica, Inc., Application Specific Microprocessor Soんtioηs: Dαtα Sheet for 
Xtensα Vl ， 1998. 
[16] Hewlett Packard Laboratories Compiler and Architecture Group , New York 
University ReaCT-ILP Group, University of Illinois IMPACT Group, Trimarαn 
User Mαmαl -An Infr，αstructure for Compiler Research in Instruction Level 
Pαァαllelism-， 1998. 
[17] R. Campぉano and J. Wilberg,“Embedded System Design," Design Aωomα­
tion for Embedded Systems, vol. 1, J an. 1996. 
[18] J.-H. Yang, B.-W. Kim, S.-J. N組1， Y.-S. Kwon, D.-K. Lee, J.-Y. Lee, C.-S. 
Hwang, Y.-H. Lee, and C.-M. Kyung, "MetaCore: An Application Specific 
Programmable DSP Development System," IEEE Transαctions on VLSI SYSｭ
TEMS, vol. 8, pp. 174-183, Apr. 2000. 
[19] G. Hadjiyiannis, S. Hanono, and S. Devadas, "ISDL: An instruction set descripｭ
tion language for retargeability," in Proc. of 34th Design Automαtion Coη1er­
ence (DAC 'g引 June 1997 
[20] A. Halambi, P. Grun, V. Ganesh, A. Khareぅ N. Dutt , and A. Nicolau, 
"EXPRESSION: A Language for Architecture Exploration through Com-
108 
piler jSimulator Retargetability," in Proc. of Desig叫 Automαtion fj Test in 
Europe (DATE '99人 (Munich ， Germany) , pp. 485-490, Mar. 1999. 
[21] S. Pees, A. Hoffmann, V. Zivojnovic, and H. Meyr, "LISA -Machine Description 
Language for Cycle-Accurate Models of Programmable DSP Architectures ," in 
Proc. of 36th Desザn AutomatioηCoη1erence (DAC '99), (New Orleans) , June 
1999. 
[2] P. Marwedel, "The MIMOLA Design System: Tools for the Design of Digital 
Processors ," in Proc. of the 21th Design Automation Conference (DAC '84) , 
pp. 587- 593, 1984. 
[23] A. Fauth, J. V. Praet, and M. Freericks, "Describi時 Instruction Sets U sing 
nML (Extended Version) ," tech. rep. , Technische Universit批 Berlin and IMEC, 
Berlin( Germany) jLeuven(Belgiu叫
[24] T. Morimoto, K. Saito, H. Nakamura, T. Bokuぅ and K. N akazawa,“Ad-
vanced Processor Design Using Hardware Description Language AIDL," in 
Proc. of Asiααnd South Pαcポc Design Automation Conference (ASP-DAC 
'97)ぅ (Chiba， Japan) , pp. 387- 390, Jan. 1997 
[25] M. Hamabe, A. Nose, N. Togawa, M. Yanagisawa, and T. Ohtsuki, "A Generaｭ
tion System for Hardware Description of Pipelined Processors," in Tech. Report 
of IEICE, VLD97-11 ス pp. 33-40, 1997. (in japa凹se) .
[26] J. C. Gyllenhaal, W. W. Hwu, and B. R. Rau,“HMDES Version 2 Spec述ー
cation," Tech. Rep. IMPACT-96-03, University of Illinois, Urbana IL. , 1996. 
IMPACT Technical report. 
[27] V. Kathail, M. Schlansker, and B. Rau,“HPL Playdoh Architecture Speci五ca­
tion: Version 1.0," Tech. Rep. HPL-93-80, HP Laboratories ぅ 1994.
[28] G. Hadjiyiannis, ISDL: Instruction Set Desc付ption Lαηguαge， User's M，αmαl， 
1998. 
109 
[29] N. Savoi, A. Halambi, P. Grun, N. Dutt, and A. Nicolau, "VSAT: A Visual 
Specification and Analysis Tool for System-On-Chip Exploration," in Proc. 0] 
25th EUROMICRO Con]erence, Sept. 1999. 
[30] Institute for Integrated Signal Processing Systems, ISS-RWTH Aachen, LISA 
User's Gωde (Versioη 2.0) ， Oct. 2000. 
[31J V. Zivoj∞vic ， S. Pees, C. Schlager , M. Willems , R. Schoenen , and H. Meyr , 
“LISA -Machine Description Language and Generic Machine Model ," in Inｭ
ternatio7í刈 Co吋引でT問 on Signal Pγocessing Applicα:tions and Technology (ICｭ
SPAη， (但Bos坑ton仏 Oct. 1996. 
[32] M. Freericks, "The nML machine description formalism ," tech. rep. , Universit批
Berlin, Fachbereich Informatik, Berlin, 1991. 
[3] 
[34] 
[35J 
D. Lanneer, J. V. Praet, A. Kifii , K. Schoofs, W. Geurts, F. Thoen, and 
G. Goossens民3
Cωes路soωrsピ，" in Code Gener，αtion for Embedded Processors, pp. 85-102, Kluwer 
Academic Publishers, 1995. 
K. Kataoka, A. Shiomi, M. Imai, Y. Aoyama, J. Sato, and N. Hikichiう“Obser­
vations on the Implementation of a Codesign Workbench PEAS-III for ASIP 
Design -Classification and Parameterization of CPU Architectures -," in IPSJ 
Technical Report, DA 78-20, pp. 121-126, Dec. 1995. (in japanese). 
M. Itoh, S. Higaki, J. Sato, A. Shiomi, Y. Takeuchi, A. Kitajima, and M. Ima白1 ，
“"PEAS-III: An AS釘IP Design Env吋ironmer瓜l式t ，" in IEEE International Co]erence 
on Co仰uterDesign: VLSI in Computers & Processors ρCCD 2000), (Austin) , 
pp. 43(}-436 う Sept. 2000. 
[36] D. L.Perry, VHDL. McGraw-Hill, Inc ・ ， second ed. , 1994. 
[37] T. Morifuji, Y. Take吋lÌ ， J. Sato, and M. Imai,“Flexible Hardware Model: 
Implementation and E百éctiveness，" in Proc. of the Synthesis αnd System 1 nteｭ
grlαtioη Mixed Techinologies (SASIMI '9η ， pp. 83-89, Dec. 1997. 
110 
[38J M. Imai, A. Shiomi, Y. Takeuchi, and J. Sato, "HardwarejSoftware Codesign 
in the Deep Submicron Era," in Proc. 0] Internαtionαl Workshop 0η Logic αηd 
Architecb川 Synthesis ρWLAS '96), pp. 236- 248, Dec. 1996 
[39J D. A. Patterson and J. 1. Hennessy, Computer Orgαnizatioη & Design: The 
hαァdωαre/Softωαre Interjαce . Morgan Kaufmann Publishers, Inc ・ ぅ 1994. 
[40J S. Kowatari , H. Iwashita, T. Nakataヲ and F. Hiro民“Synthesizable Design 
Generation for Pipeline Control1ers ," in Tech. Report 0] IEICE, VLD94-41 , 
pp. 17-24, July 1994. (in japanese). 
[41J D. Gajski, N. Dutt , A. Wu, and S. Lin, High-level synthesぽ introducr的on to 
chip αnd system design. Kluwer Academic Publishers, 1992. 
[42J C. Price, MIPS IV Instruction Set Revis 
[μ4必叫3司J J. 1. Henne白ssy and D. A. Pa抗tt印erson ， Comput柁er Ar陀chiばt柁ectur，陀e: A Q 切Tη吋山tit“'tv
Approαch. California: Morgan Kaufmann Publishers, Inc. , 2nd ed ・， 1990. 
111 
Appendix A 
Grarnrnar of Micro-operation 
Level Processor Specification 
A.l Organization of Micro-operation Level Spec四
ification 
Micrcトoperation level specification is a text file divided into eight parts: 
1. Version 
2. Design Goal and Architecture Parameter 
3. Interface Definition 
4. Instruction Type Definition 
5. Instruction Definition 
6. Resource Declaration 
7. Interrupt Definition 
8. Micro-operation Description 
General grammar of micrかoperation level specification is as follows: 
KEY := [a-zA-Z] [a-zA-ZO-9J * 
STRING : =ぺ [-"]1' \ "')*"
くdesign> := <i七em>
く item> :=くkeyvlOrd> [' { ，くitem>づっ { , " <i七回>}
くkeyword> := (KEY 1 STRING) 
“KEY" includes “clk(n)" as an exception.ηtakes integer valu四1児e
Each p戸a此 ha鉛S own s勾yn凶ta以x and keywords. Each p紅t begins with the keywords: Verｭ
sion, Port_declaration , Instruction_type, Instruction, Resource, Exception, and 
MOT. Keywords and s戸ltax for each part 訂e described in the following sections. 
113 
くSpecification> 'Design' ,{, 
くVersion> ',' 
くArchite cture _jコara血e七er〉 P ， P
<InterfaceJDeclara七 iozl〉 P ， P
く Ins七 _TypeJDefini七 ion〉 P ，'
く Instruc七 ionJDefinition> ' , ' 
<ResourceJDeclaration> ' , ' 
く Interrup七 JDefini七 ion> 勺 3
くMicro_OpJDescrip七 ion>
,} , 
Design goal for clock frequency, chip area, maximum delay, static power consumption 
and dyIMI1c power consumption-Their values are represented in decimal or integer-
A.2.2 Pipeline Processor Parameters 
Architecture parameters for pipeline processor include the number of stages, the number 
ot common stages, the number of phases per stages, synthesis of pipeline interlock logic, 
data bypassing logic, and branch control logic. 
くPipeline _Arch_Parame七er>
ﾁ.2 ﾁrchitecture Parameter 
In the architecture p紅ameter part , design goal and architecture parameters are speci五ed .
<Architecture _Par祖e七er> := ' Abs七rac七...level_archi七ecture ' '{' 
くFhm_Work_Name> ',' 
くDesign_Goal> ',' 
くProcessor_Type> ',' 
くPipeline_Arch_Parame七er>
, } , 
' FhIn_-wo出担e' , { , STRING '}' 
'CPU_type' '{' 
くFhm_Work...Name>
くProcessor_Type> 
STRING 
(' "Non-pipeline'" I '"Pipeline''' I '''VLIW''') 
,} , 
" (仁 "J I '¥'")*" 
くNumber_of .13七 ages>
くNumber_of_Common.13七ages>
くNumber_of_Phases_Per .13tage>
くMulti_Cycle_Interlock>
くData...HazardJnterlock>
<Regis七er ...Bypass>
くMemory...Bypass>
<Delayed.Branch> 
くDelayed...Branch.131ot>
Architecture p訂ameter includes login n剖ne of FHM-DB, design goal, proc田sor type 
and architecture p訂ameters for pipeline processor. Candidates of processor type 訂e
"N on-pi peline円， "Pipeline" 組d “VLIW" . Processor types "Non-pipeline" and “VLIW" 
ぽe prepared for future extension. Currentlyぅ“Pipeline" architecture is supported. 
A.2.1 Design Goal 
INT 
USAGE 
YESNO 
'Pipeline _archi七ecture' '{' 
くNumber _of .13tages>
くNumber_of _Common_S七ages>
くNumber_of _Phases_Per.13tage>
<Muユ七 i _Cycle _Interlock>
くDa七a...Hazard_In七erlock>
<Register.Bypass> 
<Memory.Bypass> 
くDelayed...Branch>
くDelayed..Branch_Slot>
,} , 
, Number _of _s七ages' , {, 1町， } , 
'Number_of_common_stages' '{' INT '}' 
'Number_of_phases_per_stages' , {, 1町 '} , 
'Multi_cycleユロterlock' , { , USAGE γ 
'Dataム臼訂d_interlock ' '{' USAGE '}' 
'Register_bypass' '{' USAGE '}' 
'Memory_bypass' '{' USAGE '}' 
, Delayed_br祖ch' '{' YESNO '}' 
'Number_of_exec _delayed_Slo七 ，{ , 
'number' '{' INT '}' 
'} , 
" [0-9J +" 
( "Use" I "Unuse" ) 
( "Yes" I "No" ) 
The number of pipeline stages takes integer value. The parameter value of delayed 
branch indicat白 whether the proc白sor adopts delayed branch architecture or not. Ifthe 
parameter value of delayed branch is "Yes" , the number delayed branch slot should be 
specified. The number of delayed branch slot must be less than the number of branch 
stage. 
The following p紅ameters are prepared for future extension: the number of common 
stages, the number of phぉes per stages, synthesis of pipeline interlock logic and bypassing 
logic. Pipeline interlock logic for the resolution of structural hazards is automatically 
synthesized regardless of the parameter value. 
くDesign_Cons七rain七〉
>
>
C
c
・-
ウじ組問
d
いは戸
町
>52
U
>
Vdrr
q
ゐ
aaee
?????? ?? ? ?? ?000007
」
円UFUFU
円U
円UAA
/\/、/、、
f、/、
vw
'cons七ruction' '{' 
<Goal_Frequency> ',' 
くGoaLArea> 勺'
くGoalJDelay> ',' 
くGoal_Po-wer_S七atic> 勺'
くGoal_Po-werJDynamic>
'} , 
'Goalゴrequency' ,{, V札UE ,}, 
'Goal_area' '{' Vι四，} , 
'Goal_delay' '{' V札UE '}' 
'Goal_po-wer.13' '{' VιUE ,}, 
'Goal_po-wer _j)' '{' V札UE '}' 
" [0-9J 牢[\. [0-9J 牢J"
ﾁ.3 Interface Definition 
In the interface definition part, entity name, and input and output ports of target processor 
訂e defined. 
114 115 
く InterfaceJDefinition> :=くEnti七y ..Name> 勺 3 くPortJDeclara七ions>
くEntity ..Name> := 'Entity_name' '{' STRING ,}, 
くPortJDeclara七 ions>
くPortJDeclara七 ion>
くPortJDirection>
くinout_direction>
くPor七一Type>
くPort .-Attribute>
STRING 
>?祖
>
J
???11
心
十u+
』
14
???
?
Tム
T4P
ゐ
/\/、/、
: = <Inst_Type..N祖e> ,{, <Field> {勺 2 くField> } '}' 
:= STRING 
:= <Field_Type> '{' 
くField_Value> '.) 
くField_\hdth>
,} , 
( , "OP-code'" I '''Oper回d'" '''Reserved''') 
:= <Field_Contents_Type> ,{, <Field_Value>γ 
・_ (' "n臼e'" I '''binary''') 
:= (STRING I BINARY) 
'Port' '{' 
くPortJDeclaration> {ぺ 3 くPort J)eclaration> } 
,} , 
-くPO口..Name> '{' 
くPortJDirec七ioa>' ， P
<Port_Type> ',) 
くPort ..Attribute> ',) 
'} , 
'direction' '{' く inou七 一direc七ion> '}' 
('"in''' I '"out''' I '''inout''') 
'signalλype' '{' STRINGγ 
'signal_a七七ribu七 e' '{' S冗ING '}' 
" ( [角 "J I '¥" ')牢"
???
???十u
十】
nn
>
?????? ?J
よ
ZJ
??????くくくく
BS
: = "[OlJ +" 
: = "( [角 "J I '¥" ,) *" 
For each instruction type, instruction type name is defined. Then , instruction 五elds
of the instruction type are defined. For each instruction fieldヲ 五eld type, contents of 山
五eld are defined. Field type is selected むnong "OP-code,"“Operand" and "Reserved." 
'OP-code" means operation code and “Reserved" indicates that the field is reserved for 
extension in the future. 
If the type of instruction field is "OP-code" and the operation code for that field is 
common to al the instructions belongs to the instruction type7thehld contents type 
becomes “binaげ， and field value is speci五ed in binary representation. If the operation 
code is variable for the instructions that belongs to the instruction type, the 白eld contents 
type becomes “name円 組d field name of it is defined. The operation code of the field 
is specified in instruction definition part for each instruction. For the "Operand" field 、
f?ld type becom郎、ame" and field name is def?ed. For the "Reserved" fìeld, f?ld type 
becom缶、m訂y" 釦d field value is specified. 
Interface definition part consists of two parts, entity name de五nition and interface port 
declarations. For each interface p卯or爪t ，ヲ P卯or吋t name, d必ir陀ectiぬor払1 ，う 七typ戸ea叩nd a抗tt凶rib凶ut旬ea紅re d白efin訂l e吋d.
Port direction is selected 但nong “吋i凶n
in VHDL standard logic style. If the bit width of the port is one, the port type becomes 
"stdJ.ogic". Ifthe bit width of it is more than one, the port type becomes standard logic 
vector type. 
A.4 Instruction Format Definition 
A.4.2 Instruction Definition Instruction format definition consists of two parts, instruction type definition and instrucｭ
tion definition. 
For each instruction, instruction type is selected むnong defined instruction types and 
operation code value is decided. 
A.4.1 Instruction Type Definition 
'Instructionょype' ,{, 
, sub_f ield_n祖e' '{' 
'NO一札工V くField_Width> 勺'
'type' '{' 
くIns七_Type> {勺， <Ins七_Type> } '}' 
'} , 
,} , 
, }' 
'wid七h' '{' INT ',' INT '}' 
" [0-9J +" 
くInstructionJDefinitions> := 'Instruction' '{' 
'NO_礼IW' '{' 
<Instruction> {勺 3 くInstruction> } 
'} , 
'} , 
.=くIns七ruction..N担e> '{' 
くField> { ',' <Field> } 
,} , 
:= STRING 
Instruction type definition part consists of a list of instruction type definitions. 
くInst_TypeJDefinition>
くInstruc七 ion>
くField_Width>
INT 
く Instruction_Name>
STRING := "([-"J 1' \ "') 本"
Operation code for each instruction is assigned. The syntax of instruction definition 
﨎 common to instruction type definition p訂t .
A.5 Resource Declaration 
In the instruction type definition part, bit fields ， 白eld type, field name, and binary 
value of it むe defined for each instruction type. In the 
r回ource declaration part , hardware modules are selected with appropriate p紅白L
eters 仕om parameterized hardware library FHM-DB. 
116 117 
くResource ...Declara七 ions> .= 'Resource' '{' 
くResource> {'，'くResource>}
'} , 
くResource> := <Resource_Name> '{' 
<FHM_ModeL.Name> ',' 
くClass .J>ath> 勺 3
くParame七ers>
,} , 
くResource_Name> := STRING 
くFHM_Model_N祖e> := 'class' '{' STRING '}' 
くClass _Fa七h> := 'classpath' '{' STRING '}' 
STRING := "([-"JI'\"') 寧 H
Resource declaration part consists of a list of resources. For each resourceぅ resource
name, FHM model name and its parameter values are speci五ed. The class pa七h field is 
prep征ed for future extension. 
くPar日e七ers> := 'parameter' '{' 
くAbstraction_Level>
{' , 'くFHM.J>arame七er>}
'} , 
くAbs七rac七 ion_Level> := 'abstrac七 ion _level' , { , 
'for _s ilI叫ation' '{'くLevel> '}' ',' 
'for_synthesis' '{'くLevel> '}' 
,} , 
<Level> : = ( '"Behavior''' 1 '"RT''' 1 '"Gate''' ) 
くFHM .J>ar阻e七er> .=くPar担e七er_N祖e> '{'くParameter_Value> '}' 
<Parameter _Name> : = NAME 
<Parameter_Value> := STRING 
NAME := [a-zA-ZJ [a-zA-ZO-9J * 
STRING " ( [向 "J 1 '¥" ') *11 
Resource Par出neter includes abstraction level for synthesized simulation model and 
logic synthesizable modeL Abstraction level must be selected among “Behavior" , "RT" 
and "Gate". After the abstraction level, parameters, which are specific to the model，町e
specified. 
A.6 Interrupt Definition 
1n the interrupt definition part, interrupt condition definitions 出ld micrかoperation deｭ
scription of interrupts are combined. 
118 
くInterruptJDefinition> := 'Excep七ion' '{' 
[くInterrupt> { '，'くInterrupt> } J 
'} , 
く Interrup七> .=くinterrupt _Name> '{' 
く Interrup七一Condユ七102〉 P , P 
く In七errupt_Type> ',' 
<Interrupt_Cycles> ',' 
<Behavior_ofよ2七errup七〉勺 3
くAssertion_of_Interrupt> ',' 
くComment ...f or_Interrup七>勺 3
くMOD_of_Interrupt>
'} , 
Interrupt definition includes interrupt name, conditionぅ type ， execution cycle count う
behavior, assertion, comments and micrかoperation description. Interrupt type, behavior 
and ぉsertion are prepared for future extension. 
くInterrupt_Condition> := 'Condi七 ion' ，{，くCondi七 ion> ,}, 
く Interrup七_Type> : = 'Type' '{'く Interrupt_Types> '}' 
くInterrupt_Types> : = ( '"Internal'" 1 '"External''' ) 
くInterrupt_Cycles> := 'Cycles' '{' 1町，} , 
くBehavior_of_Interrup七〉 := 'Behavior' ,{, S冗ING '}' 
くAsser七 ion_of_In七errupt> := 'Assert' '{' STRING '}' 
くCommen七...f or_In七errupt> := 'Comment' ,{, STRING ,}, 
くMOD_of_Interrupt> := 'MOD' '{' <MOD> '}' 
INT : = "[0-9J +" 
Execution cycle count for the interrupt is defined. In the micro-operation description of 
ínterrupts, 匤terrupt handl匤g operat卲ns of the processor such as sett匤g specific values to 
special registers and jump匤g to the interrupt handler routine, are described. The s戸ltax
of Interrupt condition <Condi tion> and micro operation description くMOD are explained 
in Appendix A.7. 
A.7 Micro-operation Description 
Micrかoperation description is used to describe clock based behav卲r of instructions and 
interrupts. Micrかoperation description of ηclock 匤struction (or interrupt) is described 
as follows: 
clk(1) {"くMicro-Op>"} ，
clk(2){" くMicro-Op>"} ，
clk(k){" くMicro-Op>"} ，
clk(n){n くMicro-Op>n}
Behav卲r of the instruction (or interrupt) in the k-th clock is described with "clk(k)". 
Micrφoperation of each clock consists of the following elements: 
• Variable, 
119 
• Constant) 
• Storage, 
• Operand, 
. Function, 
• Assignment statement う
. If-statement, 
• Decode designation. 
くMicro-Op> : =くAssignment ..s七a七emen七>F;3
くFunction ';' 
く If ..s七日ement> };' 
くDecode ..sta七ement> ';' 
A.7.1 Variable 
|〈V訂ゆle〉 :=m | 
VAR : = $ [a-zA-Z] [a-zA-ZO-9J * I 
Variables are declared implicitly in assignment statements. The scope of variable is an 
instruction in which the variable is d白cribed. Assignment for a variable is allowed only 
once. Right value of the assignment can be referred in the s出回 stage 釦d in the following 
stages. The variab1e is imp1emented with net when the variab1e is referred in the sむne
stage. The variable is imp1emented with pipeline register when the variab1e is referred in 
the following stage. 
A.7.2 Constant 
<Constant> := SignalBit I BitVector 
SingleBi七 : = ¥'0¥' I ¥'1¥' 
BitVector := "[01J+" 
Constant in binary expression is used in the micrcトoperation description. Single bit 
constant is quoted by single quote. P1ura1 bits constant is quoted by double quote. Conｭ
stant value is referred in ぉsignment statement, conditional expression of if-statement , 
resource operations and index of addressed storage. 
A. 7.3 Storage 
くS七orage> .-くResource ....Name> [くAddress> J 
くResource_Name> := NAME 
くAddress>
NM伝
-くRight_Fart>
[a-zA-Z] [a-zA-ZO-9J 本
120 
Declared registers) memory access units, register files and so on in the resource decｭ
laration part , are refered as storage units. For an addressed storage unit) location is 
refered by an index. An index is quoted by “[ ]). The contents value of storage unit 
is referred when it is in a parameter of resource operation) right pむt of the assignment 
statement and conditiona1 expression of if-statement. The contents value of storage unit 
is changed to ぉsigned value when it is in a 1eft part of assignment statement. The value 
of the storage unit is replaced synchronously with the transition timing of the stage. 
A.7.4 Operands 
くOperand> :=くField....Name>
くField....Name> := NAME 
NAME := [a-zA-Z] [a-zA-ZO-9J 牢
Certain bit field of the instruction register is refered by an operand fie1d name, which 
is specified in instruction format definition part. The field name and bit fie1d of an operand 
are defined in instruction type definition p紅t .
A.7.5 Function 
くFunction> -くResource ....Name> '.' <Function..Name> '('くParame七er_List> ,), 
くResource ....Name> : = NAME 
<Func七ion....Name> : = NAME 
<Par祖e七er_List> : = [くPar祖e七er> { ',' <Parameter> } ] 
くParameter> : = <Right_Fart> 
NAME := [a-zA-Z] [a-zA-ZO-9J 牢
Functions indicate operations of resources. 1nput data of the operation are described 
出 par町田ters . Output results of the operation are assigned to variables and storage 
units with assignment statement. Ifthe function has no output, the statement consists of 
function expression only. 
A.7.6 Assignment statement 
くAssignmen七..sta七emen七〉
<Left _Par七s>
<Left_Fart> 
くRight _Fart>
<Term>i 
-くLeft _Far七s> ':='くRight_Fart>
: = <Left_Part> I '(' <Left_Part> { ',' <lef七 一P泣七> } ')' 
-くS七orage> I くVariab1e>
-くTerm> {'&'くTerm>}
-くS七orage> [くBit ..selec七>] I くVariable> [くBi七.Be1ect>] I 
くCons七ant> I くFunction> I くOpera且d>
1n an assignment statement う right part values are assigned to the storages and variables 
appeared in 1eft part. Right hand includes storage units, variables, operands, constant and 
function and concatenate of them. 
121 
A. 7. 7 If-statement 
くIf_$tatement> := 'if' ，(， くCondition> ')' くThenYhase >
[くElseYhase> ] 'end if' 
>>>123
>>
>
an
n
a
-
nooO>
引
引
0
・3
・1
・1
也釦釦
コ活巾唱団
C
‘且
-n
れ山口紅白
.七
yp
?????白む円以
K
阿古巴
くくくくくくく-くEquation> I くExpressionl>
: = <Expression2> { '11' くExpression2> } 
-くExpression3> { ，銚 P くExpression3> } 
:= '('くCondition> つ， 1 'not' くExpres s ion3>
:= <RightY訂七> 1 '='くRight_Part>
' then' くMicro-Op>
'e1se' <Micro-Op> 
Appendix B 
Processor Speci:fication of PEAS 
R3K 
If the condition holds, then-phase is executed. If the condition does not hold, elsEト
phωe is executed. The condition of the if-statement is represented in boolean expression of 
equations , which compare value of variables , constants , storages, operands and functions. 
The order of priority of boolean operators is 'not ', ' &gど and '1'. 
、ト
J
? ?nJι 「1
???gb
・工
?????
くDecode_$七atemen七> := 'DECODE' ' ( ' く Ins七ructionJRegister> ')' 
くInstructionJRegis七er> . =くS七orage>
、
PJ
1
・J
????
?
?，、
I
?い凶?hJ
べ
・1u
パ
ゴ凶???????? ?
可よa
比
伊トパ
一
.m.
此〉j
・・
1r
ロむ
p'
lJ
、I1J
1E
・江〉〉川}"。E1J1J
山リ刷、ー
・1.
，""oJJoo
gHJSS
〉
c'
必
3
,
,,
CCE} 
L
リ十十凶冷山崎日一(ee
目代
τc
、
J
d
〉
tt
川也市川
do
????S
比
bbdud-
"な
・1
・1
-C3t
JLW
川
rr11fν
宮
市町
ktt313r
刈
む&〈
ttr
、
qufK3
yaarfi
、
rbd
tη4-
-
oroc 
一，(11 'ote3
ユ
raaclJtcvfk
A団
02
「苫「
elJce-z
??????puc--
・1
一sv
、I-
-Lt
? ?sv
，，
.1bc"
・10e
，一、，
I
、Iσ
凸-
・14SFb
，1-V-J
-JC""osguo
下
J--1J
1'f
.p-u・kl
一回
hb-l-
つ吋
'- Ur
u-u}
・10ggdk adss
〉
gL
下'"lootddttu"1Job??????????
JL
下Jad--"asd"-es--
白い
ω
比
dd{
に"にべ
S
下山
d-
町
g jEtt
児才
f
ヴ泊四
sybt­
a"t"ss?r
冷
r?
喝℃ー
-se
????????
L
は
re
冶Ftfhu.2nm
町
bltf
・1
占帽
rpee-e
℃ー
e
一daae?
WA'
川''叫忠一
wwA-
札口同日明日一
mm
ffr}"}}
，工一一
goPOETsrL-r
mye
，
f'"
，MKmdd
・lig
- Lo
'od
。
日目白川
umydEeoc
伊BRAd
-md3
'阻つ個出個
山間陶ぽ同
'uu'd-vm
ン崎明川
JmLm
つ
m一凶
E一事
E-
mL L
、九
'hd
叩
s-つ
MM
引っ
'Ld-M1hhM
凶凶
mM
ねmねhね
"UtE
勺
cce}"ypn"""
-lu
- "d"d"d
山ぽ
L
'∞∞つ
mds-J"
，LLL
山山
uuh
山
f川ハれ
UurmVMVM-
い，M-い凶れ
L
Ttea
'}OOO
- enSF}""
〉
eefu"dnunfpMMM
民
-mu
・M hu
d-'h.ω
円一夕刊山一
ω叫しmmm
円…↓川一川
iyM
・mrLMc-mOM
∞∞
dhdhd
凶d MM
ven4HSD2
・工cch1
，;"'"
・1
・laseet"t
・1
・le
- e
-le
- e
-lo
--
ヰ同庁刑訂正
-W山vxuu bmmm
，K副-wmm
ぷぷ
mmZ222
沼町
d山山町民
ωωmm
山田。-。由民
hL???md
辺位-b-ud-
叫」仙位
umamuw
小川-muw
十日-mu
aorιdpp ylrrf2344-htyerdy
，ffAapaAaDada 
rwt-
一--一℃
leeeeeee
- -srye-t.2" u t d dafLa
t
-sll
工
11-ebbgggggta
- oab
℃1tttsgsFUPUFfp
hhzzz
記
MMmmmmmMM
同bM23hWC
ぷ芯ぷ伊
A.7.8 Decode statement 
Decode statement indicates instruction decode stage and instruction register. Instrucｭ
tion decode stage is the stage where the decode statement isdescribed. The storage unit 
出 the decode statement indicates instruction register. 
Instruction_type{ 
sllb_field-:name{NQ_VLIW{width{"31" ， "16"} ， type{"Btype吋
I OP-code" {"n祖e"{"opecodeつ， "Width{ワ1" ， "26"}} ，
"Oper担d吋、担e"{lrs "} ，"Width{"25" ， "21"}} ，
"OP-code吋"ロ担e吋"bfunct つ，"Width{"20 " ， "16つ}，
"Oper担d"{川祖e"{"offset つ，"Wid七h{"15" , "O"}} 
122 123 
}, 
"Jtype"{ 
"OP-code"{"n祖e"{"opecodeづ， width{"31" , "26"}} , 
"Oper臼d"{"n祖e吋"targetつ，width{"25" ，叩'}}
}, 
"Rtype"{ 
IOP-code"{"name"{"opecode"} ,width{"31" ， "26つ}，
"Oper臼d吋"n臼e"{"rsつ，width{"25" , "21 づ}，
"Oper臼d'句、担e"{"rt"} ， width{"20" ， "16"}} ，
"Oper臼d"{、担e"{"rd"} ， width{"15" , "11'叫，
"Oper担d"{"n祖e"{" sh祖t"} ， width{"10" ， "6"}} ，
"OP-code"{"n祖e l {l rfunct l } ， width{"5 1 ， "0 つ}
}, 
"R1type"{ 
"0P-code"{"n叩e"{" opecode つ，width{"31" ， "26"日，
" Oper担dベ"n担e "{"rs " } ， width{"25" ， "21 つ}，
" Oper臼dぺ "n担e"{"rt " } ， width{"20 " ， "16り}，
"Operand吋"n祖e叩rd "} ， width{"15" ， " 11 "} } ，
"Reserved" {"binaryぺ "00000つ ，width{"10" ， " 6 つ}，
" OP-codeベ "n祖e "{ "rfunct"} ， width{ "5" ， 句" }}
}, 
"Itype"{ 
" DP-codeぺ "n祖e叩opecodeつ ， width{"31" ， "26" }} , 
" Dperand"{ "name "{" rs "} ,wi dth{ " 25" , "21 "}} , 
" Dper臼d"{"n祖e叩rt " } ， width{ "20" , "16"}} , 
rpemd'fh祖州国出ate"} ， ωth{"15" ， 明〉
"LStype"{ 
" DP- code吋百祖e" { "opecodeつ ， width{"31" , " 26つ}，
" Operandベ、祖e"{ "baseつ ， width{ " 25" ， " 21つ} ，
" Oper担d" { "n祖e"{ " rt " } ， width{"20" ， "16つ} ，
"Oper担d"{"n祖e"{"offsetり ， width{"15" ， "0つ}
} , 
"R2type" { 
"DP-codeベ"且祖eベ" opecodeつ ， width{ " 31 " ， "26つ}，
" Oper担d"{"n祖e"{"rsつ， width{"25" ， " 21"日 ，
"Oper担d"{"n祖e"{"rt"} ， width{"20" , "16"}} , 
"Reserved"{"binary"{"OOOOOOOOOO"} , wi dth{"15 " , "6 "}} , 
" OP-code"{"n祖e"{"rfunctつ，width{"5" , "O"}} 
} , 
"MFtype吋
"OP-code吋"n祖e"{"opecodeつ ， width{ワ1" ， "26"日 ，
"Reservedl{"binary"{"OOOOOOOOOOつ ， width{勺5" ， 勺6つ} ，
" Operand"{"n担eベ"rdつ ， width{"15" , "11"}} , 
" Reserved"{"binary "{勺0000つ， width{"10" ， "6"}} ，
" OP-code" { "n祖e"{"mffunct つ， width{ " 5" , "O"}} 
} , 
"MTtype" { 
勺P-code"{、祖e叩opecodeつ，width{ " 31" , "26つ}，
" Oper祖d"{"n祖e"{"rsつ，width{"25" ， "21 つ}，
"Reserved"{"binary"{IOOOOOOOOOOOOOOO"} ,width{"20" , "6"}} , 
" 0P- code " {"n祖e " {"mtfunct つ， width{"5" ， "0つ〉
}, 
"B1type"{ 
"OP-code吋団組e"{"opecodeつ，width{"31" , "26"}} , 
"Oper担dベヨ祖e"{吋s"} ， width{"25" ， "21 つ}，
" Oper祖d吋百祖e " {"rt"} ， width{つ0" ， "16"}} ，
" Oper祖d"{"n祖e " {"offsetつ ， width{ " 15" , "O"}} 
>
} }} } , 
" 0per担d吋 "n祖e"{"rtつ， width{"20" , "16つ}，
"Oper臼d" { " n祖e"{"immediateつ， width{"15" , "O"}} 
}, "ADD IU"{type{"Itype"} , "OP-code"{"binary吋吋01001"} ，width{"31" , "26つ}，
" 0per担d"{ "n祖e吋"rs"} ， width{"25" ， "21 つ}，
" 0pera且d"{ 川B祖e"{"rt"} ， width{"20" ， "16"}} ，
1 Opera且d " { "nameベ"immediateつ， width{"15" ， "0つ}
} , "ADDU" {type{"R1type"} , "OP-code"{"binary"{"000000"} ,width{"31" , "26"}} , 
"Oper臼d"{"n担eぺ"rsつ， width{"25" , "21"}} , 
" Oper日d吋 "n姐e" {" rt"} ， width{"20" , "16"}} , 
" Oper臼dベ"ロ姐e "{" rd"} ， width{"15" , "11"}} , 
"Reserved" {" binary " {勺0000" } ， width{"10" ，喝つ}，
" DP-code"{"b阻むy"{"100001つ， width{ " 5 " ， " 0つ}
上 "ANDI"{type{ " Itype"} ， "OP-code吋"binary"{ "00 1100"} ，width{"31" ， "26 つ}，
" Oper担d"{"n祖e"{" rs"} ， width{ " 25 " ， "2 1 つ} ，
" Oper担dぺ "n祖e " { " rtつ ， width{"20 " ， "16つ} ，
" Dperandベ "name" {"国脱出ateつ ， width{ " 15" ，"O"}} 
上 "BGEZ吋type{ "Btype"} ， "OP-code"{九阻むy" {"00000l" } ， width{"31" ， "26つ}，
"Dper臼dべ "n祖e叩rsつ， width{"25" , "21 "}} , 
"DP-code"{"binary"{勺0001つ ， width{ " 20" ， "16つ} ，
" Dperand"{、祖e"{"offset"} ， width{"15" , "O"}} 
}, "BGEZAL吋type{唱type"} ， "DP-code"{"binary"{"000001"} ， width{ " 31" ， 勺6 "}} ，
" Dper姐dべ、担e"{"rs " } ， width{"25" , "21 "}} , 
"OP-code吋 "bin訂yベ"10001 つ， width{"20" , "16"}} , 
"Dper臼dベ"n祖e"{" offsetつ，width{"15" , "O"}} 
}， "BGTZ " {type{"Btypeつ， "DP-code"{"binary" {勺00111つ ， width{"31 " ，勺6"}} ，
"Oper臼d"{"n祖eぺ"rs"} ， width{"25" ， "21 つ} ，
"OP-code吋 "binary"{"OOOOOつ ， width{"20" , "16"}} , 
"Op紅白d"{ "n祖eベ"offsetつ ，width{" 15" , "O"}} 
}, 1BLEZ11 {type{"Btype" } , "DP-code " {"binary吋"000110つ ， width{"31 1, "26"}} , 
"Oper担dぺ"n祖e"{"rs " } ， width{"25" ， " 21 つ}，
"oP-code"{"binary"{勺0000つ ， width{"20" ， "16"}} ，
"0句pe町r担d"吋'{、祖eぜ"吋'{"吋'0ffおseげtγ.つ，wid批th{"15" ， "0"}}
}, "BLTZ吋type{"Btype"} ， "DP-code"{"b阻むy"{"OOOOO lつ ， width{"31" ， "26"}} ，
"Oper担d"{"n祖eぺ"rsつ， width{"25" ， "21"}} , 
"OP-codeベ"b阻むy"{"OOOOO'守， wid七h{"20" , "16"}} , 
" Oper担dベ"n祖e"{"offset"} ， width{"15" , "O"}} 
}， "B口弘L"{type{唱type"} ， "OP-c od.e"{もinary"{"OOOOOlつ， width{" 31" ， 勺6"}} ，
"Oper担dぺ、担e"{吋sつ， width{"25" ， "21 つ}，
"oP-codeベ"binaryぺ"10000つ，width{"20" , "16"}} , 
"Oper祖dベ"n担e叩offsetつ ， width{"15" ， "0"}}
}, "IAND吋type{"R1typeつ， "OP-code"{"binary"{吋00000つ，width{"31" , "26"}} , 
"0句pe訂r担dべ"、n祖eぜ"べ'{II吋rs"つ}， wid批七h{"勺25
" 口句pe訂r担d"べ'{"、n祖eぜ"べt{ 1t可rt"つ' }， wid批th{勺01 , "16"}} , 
"Oper也dぺ"nameベ"rd"} ， width{"15" , "11"}} , 
IReserved"{lbinaryl{"00000"} ,width{"10" , "6"}} , 
"ロトcode"{"bin訂y"{"100100つ，width{"5" ， "0つ}
}， "INOR吋type{"R1typeつ， "OP-codeベ "binary"{吋00000つ ， width{"31" ， "26つ} ，
"Oper担d"{"n祖e"{"rsつ， width{"25" ， "21 つ}，
"Operand"{川ame"{吋七つ， width{"20" ， "16つ}，
"Oper担d"{百祖e"{"rd"} ， width{ ・' 15" ， "11"}} ，
"Reserved"{"binary"{"OOOOO"} ,width{"10" , "6"}} , 
"OP-code"{、inaryべ"100111つ， width{"5" , "O"}} 
}， "IoR"{type{"R1typeつ， "OP-code"{"bin訂y"{"000000"} ， width{"31" ， "26"日 ，
"Oper祖d"{"n祖e"{" rs つ， width{125" ， "21 つ}，
"Oper担d"{"n祖e"{" rtつ，width{"20" ， "16つ}，
"Oper祖dぺ"n祖e"{" rd"} ， width{"15
"Reserved"{lbinaryl{"OOOOOI} ， width~"勺10" , "6"}} , 
"OP-codeぺ"binary"{" 100101つ， width{"5" , "O"}} 
上 "ISUB"{type{"R1typeつ， "OP-code"{"binary"{"OOOOOOつ ， width{"31 " ， "26り} ，
"Oper臼dべ"n祖e"{"rsつ， width{"25" ， "21"}} ，
"0句pe位r担d"吋'{"、n祖e "吋{吋t"つ}， wid批th{"勺20" ， "16"つ}η} ，
" 日句pe釘r担dべ"、n祖e"べ{"可'r吋dつ ， wid批th{"勺15
"Rêserved"{"binary"{"OOOOO"} , width{"10" ， "6~'}} ， 
"OP-codeぺ"b阻むy"{"100010つ， width{"5" ，叩'}}
}， "IXOR"{type{"R1typeつ， "OP-code"{"binaryべ"000000つ ， width{"31" ， "26つ}，
"Dperandべ、祖e"{"rsつ， width{"25" ， "21 づ}，
"0句pe釘r担d'ベ"、n祖eぜ"叩'{"吋rt"つ}， wid批th{"勺20" ， "16"}} , 
"0句pe位r臼d"べ'{"、n祖e"べ{"吋'rd"つ}， wid批th{"吋15
"Rêserved"{"binary"{"OOOOO"} ， width{"吋10" ， "6"}} ，
、ー、PJ? ????? ?f司
h
?? ??.、・-?、I1J"
rd
ハυ"?????
?
"1
ム
?vdJ
・、
訂泊?.、‘
.、ム
r1
、I
??c
，
4よ
-
、PJ
ハU
P
干
tIo
nu"
、PJ"
"氏
u"F4
、
?
?
?
????
? ? ? ? ? ? ? ?"、
Il
Jl
J
"'r
、I
? ??開店泊目江川ム喝旭川
4
+UM
"
"
F4L
P4
、hu"
1
,,
Jah"' 
nu"""+LW+U
F4・、"
????r司
、円JL
内444
・1
・1enJh
?p
・rtJLJL
''
oJ
、
? ?+b+ω+-M+LV
-
+lu 
?? ??"1
・1
・10OO
-­
? ???nu
，
，
，ハ
u
ハυ
，
A品、与
J、
PJ
、
PJnunu
、，
J、》
J
"""
""
4よ
"
"
??
????
?
?
?
???
U叩 ?町“寸
hJV
????
?
?
m
祖祖組一
mmd
祖
FM~
“"nhu?
山umhu
? ?
?
? ? ? ?
?
.、品
"
"
"AU"
r句、"
? ? ?民組組組円、∞目白??
?
??
????
?
邸内叩句句
LM即
"'h
a
Tよ"
"""
"
、
トJ"
124 125 
"DP-code"{"binary"{" 100110つ， width{"5" ， "0"}}
} :~， J;;{tyP~{~Jtyp~"} ， "OP-c9de ;'~"?~Il~"t?~~010"} ， width{"31" , "26"}} , 
"Dper担d吋 "n祖èÎ'{"target "}, widt~{"25" ， "0 ':日〉，fJAL吋type{リtype"} ， "DP-codeべ"binary吋勺00011つ，width{ " 31" ， "26"日 ，
"Dper臼d"{"n祖e吋"tむget"} ， width~"25" ， "0'叫}，tJALR" ftypef呪type"} ， "OP-c~de吋"[)~~y" {"000000"} ，..idth{" 31 " , "26つ}，
"Dper臼d吋"n祖e"{"rsつ，width{"25" ， "21 つ}，
"Oper担d"{"n祖e"{"rt"} ， width{"20" ， "16つ}，
"Dpera且d"{"name"{"はつ，width{"15" ， "11"}} ，
"Dper臼d"{"n祖e"{"sh担t"} ， width{"10" ， "6 つ}，
"OP-code"{"binary"{勺01001つ， width{"5" ， "O"}} 
} ， "JR吋type{"R1type"}!" OP-co~e吋 "binaryベ吋00000" } ， width{"31" ， "26つ}，
"Oper臼d叩B祖e "{I'rs"} ，width{"25" , "21 リ}，
"Oper祖d叩B担e"{"rt"} ， width{"20" ，"叩1日6"}} ，
"勺Opいera祖且凶d吋"、n祖e "吋{"吋'rd "つ}， wid批th{"l臼5
"R込らse工ved"吋{ "bi担n訂y "べ{"司明O∞O∞O∞O∞0"つ}，. 凶id批th{"10" , "6 "}} , 
"OP-codeベ"bin泣yべ "001000つ ， .. idth{"5 " ， "0つ〉}, " LB吋type{ "LStypeつ ， " OP-code"{"binaryべ"100000 "} ， width{ワ1" ， 勺6つ} ，
" Oper担d"{百担e"{ " baseつ ， ..idth{" 25" ， 勺1 "}} ，
"。もer担dベ "n祖e" { " rt " } ， ..idth{"20" , " 16"}} , 
"~er担d吋"n祖e "{" offsetつ ， ..idth{吐5" ， "0つ〉}ヲ日U"{type{"LStype札 "OP-code"{"bin.ary叩 100100 " } ， ..idth{"31" , "26"}} , 
" Oper担d" {"n祖e ，， {í' base" } ，width{"25" , " 21"}} , 
勺ber臼d吋"n担e " {"rt つ， width{"20 " , "16"}} , 
" O~er臼d"{"n担e"{" offsetつ ， ..idth{"15" ， "0つ〉〉，Zuf吋type{ " LStypeつ ， 勺P-codeぺ"binary"{ " 100001つ，width{ " 31 " ， " 26"日 ，
"OP紅白d吋"n祖e "{Î'basè " } ，width{"25" , " 21"}} , 
"ohr也d叩E祖e " {"r七"}， width{"20 " ， "16つ} ，
"。もer祖d"{百担e " {"offset" } ， ..idth{"15" ， "0つ}〉，ru町ベtype{"LS~ypeつ ， 叩P-code"{"bin.ary吋 " 100101つ， ..idth{"31 " ， 勺6つ}，
"Oper担d"{"n祖e ，， {í'base"} ，..idth{"25" , "21 つ}，
"Oper担d"{、祖e"{吋七つ，..idth{"20" ， "16"つ}}，
"叩0吟ber祖d" {"百n祖e叩Oぱffおseげtγ'つ ， ..id批th{ "吋~5" ，"?':n } , "LUI"{type{"Itype"上旬P-code"{"binary"{勺01111"} ， width{"31 " ， "26つ} ，
"Qper担dベ"n祖eベi'rs"} ，..idth{"25" , "21"}} , 
"~er担d吋"n担e吋"rtつ ， ..idth{"20" ， "16 "日 ，
"0品もer担d'吋"、na祖皿eぜ"吋'{"泊血即ed色ia抗七eぜ"つ'} ， wid批th{"l臼5}口，ヲ"L印W"ベ巾'叱〈れtype“{"明閣L凶1St抗y伊P戸e"} ， "OP一C∞od白e"叱'{"判{'υ門"吋'もb凶in凶町a紅ryγ'叩 10∞O∞011"} ， wωiは比d批t削h{"付 ワ叶'宮31" ， " 26"}} ，
"Oper坦d"{"n祖e"{"baseつ， ..id七h{"25" , "21"}} , 
"Oper臼d"{"n祖e"{"rt勺， ..idth{"20" ， "16つ} ，
"oPerand"{"n臼e"{"offsetつ， width{"15" ， "0"日}， 勺RI吋type{吐typeつ， "Op-éode"{lbinary"{"001101"} ， ..idth{"31" ，勺6"}} ，
"Oper祖d"{"n祖e吋"rsつ ， ..idth{"25" ， "21 つ}，
"OÎ:>er担d"{"n祖e叩rtつ， ..idth{"20" , "16"日，
"Oper担d"{"n祖e"{・'immediateつ ， width{"15" ， "0つ}
〉，hsB"ftypeftStypeH} ， "DP-code吋"binary"{" 101000つ ， width{"31" ， "26つ}，
"Qper担d"{"n祖e"t"base"} ，..idth{"25" ， "21 づ}，
"OÎ:>er担d吋"n祖e"{"rt"} ， width{"20" ， "16"}} ，
"oPerand"{、ame"{"offsetつ，width{勺5" ， "0つ〉
} ~f， SH"{type{"LStyPe"} ， "OP~éode'~{"binaÌy"~~'101001"} ,width{"31" , "26"}} , 
"Qper担d"{"n祖e"{"baseつ ， ..idth{"25" ， "21 つ}，
"Oper担d"{"n祖e"{"rtつ， width{"20" , "16"日，
"Qper組d吋"n祖e叩offset"} ， width{"15" ， "0"}}日SLL"{type{"Rtype"γOP-code"{"bi町y叩000000"} ， ..idth{"31" ， "26つ}，
"Operand吋"ロ祖e"{"rsつ ， ..idth{"25" , "21"}} , 
" Op紅白d"{"n祖e吋"rt"} ， width{"20" , " 16 "}} , 
"Oper臼d"{"n祖e叩rd"} ,..idth{" 15" , "11"}} , 
"Oper臼d"{"n祖e"{"sh祖t"} ， width{"10" ， "6"}} , 
" OP-code吋"binary"{" 000000つ， width{"5 " ， "0"η 
}, " SLLVベtype{喰1typeつ ， "OP-codeベ"bin訂y"{勺OOOOO"} , width{ ",31" ， 日 26つ}，
"Operand吋"n祖e吋"rs l } ， width{125" ， "21"}} ，
"Oper担d"{団組e"{"rtつ， width{120" ， "16"日，
" Oper担d"{、祖e"{"rdつ ， width{勺5" ， "11"日，
I Reserved"{"binary"{"00000つ ， ..idth{"10" ， "6"日 ，
"DP-codeぺ"binary"{" 000100つ ， width{"5" ， "0つ}じ'SLT"{type{ "Rl typeつ ， "0P-codeベ"b~~l"{勺OOOOO"} ,..i dth{" 31 " , "26 つ}，
" Oper祖d"{"n祖eぺj'rs"} ，width{"25" , "21"}} , 
11 Dper也d"{"n阻e"{ " rt " } ， width{"20" , "16"}} , 
" 。長erand"{百四e"{" rd"} ， width{勺5" ， "11 つ}，
、炉J
、f、I下J
下
J
、I，、I、I
，
lJlJlJ，、
I'lJ1J，lJlJ"1J
、I"" J
"、I""、I1 "pol
"66、
I6下
J
661J"62"
??
??
?
???
門4""
にU"
にu""
氏un4"'η4
内4，
n，ι
，，
円L"'""?
?
??
?
H
ょ
ei
，4
ム
'4
よ
4ム
'"4 qu
133
3 33"13
ー
っd 4
ム
"4
ム
""4
ム
qd"F4
・、qu
??
??? ???
?????
h
七七
Jktrtttr1htdh
?
?
di
-ユ
七・1
-t
-litdiw
・1ww
dwdwwd1w
'
1
w''
・1
'
・l'
'・
1v
'
lJ
首
，、I1
JwlJwlJ1JW，lJ"'
-J""
'"
'
""'、I"
'
o
，
、I，
H44n
u
、PJnu、ト
J
nυ円U、P "nu、PJ
ハU
、俳J"、，J
01
OHnu"OOHOnul
J
O、ro、I
??
?? ?
????????
一 '1、I01J
O
'
oo'oo'o，01、ー。HO"o"
lJ1fnu-J0
1
J
nunu、Inuo、Inu--10、Inu
'
"'o，
、I
可Io""""
1r、rolJ"1f、I01J"、I1f"下〈I00"""、IJ、"1ro"、f
"1 "OJLOFt
H o'lJr1 lJO
，lJIL"、I，t"、II'下J"OF151J"51J"51J
6"イ、"
""
"
6
"
"1J""6""、IHU6"
"
6""1JlJ{""1nY1"JL1"
"0
川
'''y'''y''
JOJL
'''
〉
O
y''
;oJL'''}Oy'';oy'
'
;ort-J'""'''y'Aor';o"
'
;o
";'勾日日V
泣一けηv
z日υ日日"，H'fη日日UUJU日η竹刀"，Jf日日日muJU日日日";'訂日日日"，ffr日muηηηv紅一什竹刀M"
'mDDMJηη竹刀M"，
"a""1
n"
"1
En "OHX"""""E"""o"r"""""n"""
。
"n"""o"r2"'a""12""t"
・1""t a""t"
15216"
・116H
・1
16115a161
'
5
・116115a161
'
5
・116115116115a 6"E16"
・116d5b16d5216d5
""121Jkb21f
b211 "n211 b211""B211""b211""b211""b'15・工21fb21・1""211"
・工21
・1"
-Lb" h"""h
"
""
"ff
・-"""of""" ff
・-"""of""""fJL""""frt
-工""1b""h"""wJLJL""w，tb "wf
??????
?
??
??
??
???
?
??
??
??
?
????????????
?
???
??
"dH""
d
H"H
Ht
℃
"" ""t"" tt" """t""""t """"tt 2"rtJL""d"""下Jte""lJtr1""lJt
ddH50
・1e501
e505ddf505JLde505ddJL505Jtde505dde505ddIL"oh"50 e50"dd50"d"50"
2w 22w
d221ii 221h
-ld221il"221hld221
・1id221li"f2te22wd220
・1
。
2201e220
・1
?? ?
?
?
???????????? ?? ?? ???
?
????????
'
'
ortr11Jer--、1JC
r-JLJL'
'dIL--、JLd'crtJLr1' dJLrtrtd'er--LJL' cJLJLJL'tdtIL
- oJtJ、lJCJLJLO，-ILJLO'oJLJLO，
}1Jchh
」
h
h"
一
hhh}
〉
ohhh
・1}-hhhlJlJ0
・ahh
・11J7abb-r}
一
hhh
〉
1JodhwchhH-hh01Jphh01JchhO}
""-ttep
t
t
epttt""ctttw"pttt""ctttv"pttt""pttt"ne
- t,-tteptto"otto"-tto" 
oopdd
℃
c
ddtodddo--ddd'10dddo--ddd'oodddOODdddo--vdlJpddtoddoo"dd01pdd00 
01ci--a ii
a
"-
1
・1・101pi
-
・11I1"
・li
- 01pi
--
・11J1"1
・1101"1
・1101P?1"0
・1
・1aN
・1100?1
・100nu
--
・101
00 ww
-l'ww
・ュ，
wwWOOOwww"o，www010wwv"o'EWW01'wwwO001Jwt"wwI
'
wwoo--wwoo"wv00
01''t
d
>
't
dlJ
,,,
01"
'''
tol
J
,,
,OOH,
,
,t01J,,,O01J,,,oo""'e'
'
'dlJ,,01"''01'''01
0 1JlJ1JeH
1J
1f
e
n
-J1J、IOO ι 1J1J
国
O"lJlJ、Ioo，lJ
可IlJmO"1JlJlJ00"lJlJlJOO'elJslJlJ
可Ie"1JlJ01elJ、I01、I1JlJ01
mf引"wdJu
mm
dr
mmtud
小判刊ゲ川町山門
murtL
一円叫がげ印如何
mポハポ
L一円mv
げぽ川
L一門川町日出"戸“ぽ山
ummd
山umL一円明日山山
μ小一門戸
"fduL
一円
vdHVJn""
t
H"
"t"""V
J"
p"""""t"" y
"
pn""""t"""y
℃
""HY"p"""y"NHt"nYH2"HYHY"HVJ" 
rytJL LJLI LILJL
1
，tILJLryyJ、ILJLILY-JLJLJLryyJLJLJLJ、Y1J、J、JLry1J、ILJLZYYJ
〉
tILtJLJ、J、2JLrtryRJLJtrytILJLry
mu
吋ザザ
vrvvV-J-vvm
む批
MMMutt-umnvuvmU
恥ザ
MMMMMM訂叩dud-dH
国む
mndvdud-MU
民
UMMUttrv
叩uvumurtdumumvvmu
M d
組組担戸組組担ぱ祖祖UMmV組組担祖
-ud祖祖組一回四日γ祖担祖祖
・mM祖担祖M
-ud組組組一Mmt祖祖祖ぱ組組組ぱ組担しmm戸組担M
-uυ組組
一回
- u
p山
um
庁、
、
H B旬、唱、内庁、川、?も戸、、、、山UW1hu川ιhL
叫ν戸川uhuhu~u山um庁、、、れもも内庁、hu川ur
凡uMhu川u~“内庁、~uhum庁、~up
凡uwdh品川u?
凡U戸huhu?
凡U
"
r1
tJ
〉
t
rt
JLJ:trttJLIt-L
"
JLYJLIL--、r1JLtitJtJL
"
JLYJtrtrtrtr1tr1，tJLUJLtJLr1JL"JLPrtJLILtJLJLILtrtr1"JLJLJLJL"JLyrtr1"JL
dH
イ
、
""
"""""
rt"""d"t"""""rt" "d"t """"r1 d Jt ""d"y JL "JL"" " "d t""d"
?
?
?
?
??????
?
??????
???????????????????????
お
mmmmmmmmmmmmmm
…mmmm
叫mmmmm
叫即日
mmm
川町
mmm
お
mmmmZ
判mmmmmmmmmmm
叫町立お
mummm
一一
?
?
?? ????
?
?? ?????????????????????? ?
h
即
ノ句匂句ノ句句句
ノ句句句恥ωJ句句句句即ノ句句句恥即J句句句句ωf句句句LMmfhA句句LM即fhA句句"'h‘句匂Jh‘伽4LM
白fh‘hALm即H'b
ゐh
・L附即
""、
I
"""、
IH
H
H
1IHHH""
、土
"
""""下J"""""、r"""""、I"""""、r"""""、I"""、f"""、I""""1J""""下J""""
126 127 
}， ID1VU I {type{"R2typeつ， "OP-code"{もinaryl{10000001} ， width{1311 ， 126"日，
"Oper担d"{"n祖e吋 I rs l} ， width{"25 1 ， 121"}} ，
"0per祖d吋"n祖e"{"rt つ， width{"20" ， "16"}} ，
"Reserved"{"binary"{"0000000000"} ， width{"15" ， "6つ}，
"OP-code"{"binary"{"011011 つ， width{"5" , "O"}} 
}， "MFH1"{type{"町type"} ， "OP-code"{"binary"{勺00000つ，width{"31" , "26"}} , 
"Reserved"{"binむy"l"oooooOOOOO"} ，width{"25" , "16"}} , 
"Oper担d"{"n祖e"{"rdつ，width{"15" , "1!"}}, 
"Reserved"{"binary"{"00000"} ,width{"10" , "6"}} , 
"DP-code"{"binary"{"010000"} ,width{"5" , "0"}} }, "MFLO" {type{"MFtypeつ， "OP-code"{"binむy"{河00000つ， width{"31" ， "26つ}，
"Reserved"{"binary"{"OOOOOOOOOO"} , width{"25" , "16"}} , 
"Dperand"{'~n祖e"{"rd"} ，width{" 15" , "1!"}}, 
"Reserved"{"binary"{"OOOOOつ， width{"10" ， "6"日，
"OP-code"{"binary"{"010010つ，width{"5" ， "0"日
}， "MTH1"{type{"MTtype"} ， "OP-code"{"binaryぺ"000000つ，width{"31" ， "26つ}，
"Operand吋 "n祖e"{"rs"} ， width{"25" ， "21"}} ，
"Reserved"{"binary"{"000000000000000"} ,width{"20" , "6"}} , 
"DP-code"{"binary"{"010001つ，width{"5" ， "0つ}
}， "MTLO"{type{明Ttypeつ， "OP-code"{"bin訂y吋"000000つ，width{"31" ， "26"日，
"Dper組d吋"n祖e吋"rsつ， width{"25" , "21"}} , 
"Reserved"{"binary"{ゆ00000000000000つ，width{吃0" ， "6" 日，
"OP-code"{"binむy"{勺10011"} ,width{"5" , "0つ}}, "BEQ"{type{"Bltypeつ， "OP-code吋"binary"{"000100"} ， width{"31" ， "26つ}，
"Oper担d"{"n祖e"{"rs"} ， width{"25" ，"勺21"つ}}，
"0句pe釘r姐d吋"、B祖eぜII{II吋rt"つ}， w凶id批th{勺O
"0句pe位r姐d吋"百n祖eザ"{"oぱffおse抗t"つ}，wid批七h{"15" , "0つ〉
}, "BNE吋type{"Bltypeづ，勺P-codeべ "binary"{勺00101 つ， width{"31" ， "26 つ}，
"Oper担d"{"n担e"{"rsつ， width{"25" ， "21"日，
"Dper担d"{、祖e"{"rt つ， width{"20" ， "16"日，
"Oper担d吋"n祖e吋"offset"} ， width{"15" ， "0"}}
}}}, 
Operation{NO一札1W{}} , 
Resource{"IR"{cl回s{"register"} ， classpath{" つ，
P訂祖eter{
abstraction_level{for_simulation{"Behavior"} , for_synthesis{"Gate"}} , 
bit_width{"32"} , 
edge_trigger{"positive"}}} 
, "iU"{class{"register"} , classpath{" "}, 
P訂祖eter{
abstraction_level{for_simulation{"Behavior"} , for_synthesis{"Gate"}} , 
bit_width{"32"} , 
edge_trigger{"posi七ive"}}}
, "LO"{class{"register"} ， cl出spath{" づ，
parameter{ 
abstraction_level{for_simulation{"Behavior"} , for_synthesis{"Gate"}} , 
bit_width{"32"} , 
edge_trigger{"posi七ive"}}}
, "CSW"{class{"工egister"} ，classpath{" "}, 
P訂祖eter{
abstraction_level{for_simulation{"Behavior"} , for_synthesis{"Gate"}} , 
bit_width{"32"} , 
edge_trigger{"positive"}}} 
, "GPR"{class{"registerfile"} , classpath{""} , 
par祖eter{
abstraction_level{for_simulatエon{"Behavior"} ， for_synthesis{"Gate"}} ，
bit_width{"32"} , 
num_register{"32つ，
num_read_port{"2つ，
num_町ite_port{"lつ}}
, "ADDO"{class{"adder"} , classpath{""} , 
parameter{ 
abstraction_level{for_simulation{"Behaviorつ， for_synthesis{"Gate"}} ，
bit_width{"32"} , 
algorithm{"cla"}}} 
, "ALUO"{class{"alu"} ,classpath{""} , 
par祖eter{
128 
~bstr~~t~~Il=-~~:，el{for_simulation{"Behavior"} ， for_synthesis{"Gate"}} ， 
bit_width{"32"} , 
algorithm{"cla"}}} 
, "D1VOベcそass{"divider"} ， classpath{""} ，
parameter{ 
~~str~~t~<;Il=-~~:，el{for_simulation{"Behavior"} ， for_synthesis{"Gate"}} ， 
bit_width{"32"} , 
algorithm{"seq"} , 
adder_algorithm{日la"} ，
data_tYEe{"two_complement"}}} 
， "S打O"{class{"barrelshifterつ， classpath{" つ，
parameter{ 
~bstra~t~~n-:-ley~~{for_simulation{"Behavior"} ， for_synthesis{"RT"}} ， 
bit_width{"32"}}} 
， "EXTO"{c~ass{"extender"} ， classpath{""} ， 
par祖eter{
abstracti~n_leyel{for_simulation{"Behavior"} ， for_synthesis{"Gate"}} ， 
bit_width{"16"}}} 
，"阻止O"{c~ass{"multiplierつ， classpath{" つ，
parameter{ 
abstracti~n_leyel{for_simulation{"Behavior"} ， for_synthesis{"Gateつ}，
bit_width{"32"} , 
algorithm{"seq"} , 
adder_algorithm{"cla"} , 
data_type{"two_complement"}}} 
， "PC"{class{"pcuつ， classpath{" つ，
parameter{ 
abstraction_level{for_simulation{"Behavior"} ， for_synthesis{"Ga七 e"}} ，
bit_width{"32 つ，
increment_step{"4"} , 
adder_algorithm{"cla"}}} 
， "1阻M"{class{"imcu"} ， classpath{" つ，
pむ担eter{
abstraction_level{for_simulation{"Behaviorつ， for_synthesis{"Gateつ}，
bit_width{"32"}}} 
， "DMEMベclass{"dmcuつ， classpath{" つ，
par祖eter{
abstraction_level{for_simulation{"Behavior"} ,for_synthesis{"Gate"}} , 
bit_width{"32"}}} 
, "NOTO"{class{"not"} , classpath{""} , 
par祖eter{
abstraction_level{for_simulation{"Behavior"} ， for_synthesis{"Gateつ}，
bit_width{" 1 "}}} 
}, 
Exception{"reset"{Condition{"rst=' l' つ， Type{"External"} ， Cycles{"lつ，
Behavior{"--reset behaviorつ，Assert{" つ，Comment{""} ，
MOD{clk(1) {"PC.resetO ; GPR.rese七 0;
CSW.reset(); HI.reset(); 
LO.reset(); IR.reset(); つ
}}, 
"initO"{Condition{"int = '1' 担d intn = \勺OO\""} ， Type{"External"} ， Cycles{"l"} ，
Behavior{"--1nterrupt behaviorつ， Assert{" "}， Comment{" つ，
MOD{clk(l){"CSW := PC;"} 
}} 
}, 
MOT{mnemonic{"ADD"{clk(l){"IR := 1阻M[PC] ; 
PC. inc 0 ; "} , 
clk(2){"DECODE(IR); 
$rs := GPR.readO(rs); 
$rt := GPR.read1(rt);"} , 
clk(3){" ($result , $flag) := ALUO.add($rs , $rt);"} , 
clk(4){""} , 
clk(5){"GPR[rd] := $result; "} 
} 
, "ADDI"{clk(l){"IR := 1阻M[PC] ; 
PC .incO; "}, 
129 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$imm:=EXTO.sign(immediate);"} , 
clk(3){" ($result , $flag) :=ALUO.add($rs ， $imm); つ，
clk(4){""} , 
clk(5){"GPR[rt] :=$result;"} 
} 
, "ADDIU"{clk (1 ){"IR := 工阻M[PC] ; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$imm:=EXTO.sign(immed i ate );"} , 
clk(3){" ($result , $flag):=ALUO . add ( $rs ， $ imm); つ，
clk(4){"" } , 
clk(5){"GPR[rt] :=$resu工t ; "} 
} 
, "ADDU"{clk(l){ "IR : = 1氾M [PC] ; 
PC. inc 0 ; "} , 
clk(2){"DECODE(IR); 
$rs := GPR.readO(rs); 
$rt := GPR.read1(rt); " } , 
clk(3){" ($result , $flag) := ALUO . add($rs , $rt ); "} , 
clk(4){""} ,
clk(5){"GPR[rd] := $result;"} 
} 
, "ANDI"{ clk(l){ " IR := 1阻M[PC] ; 
PC. inc() ; つ，
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$imm :=EXTO.zero(immediate) ; "} , 
clk(3){" ($result , $flag) : =ALUO.担d($rs ， $国皿); "}, 
clk(4){""} , 
clk(5){"GPR[rt] :=$result; "} 
} 
， "BGEZ吋clk(l){"IR := 1肥M[PC] ; 
PC.incO; 
$pc:=PC ; "} , 
clk(2){"DECODE(IR) ; 
$rs:=GPR.readO(rs); 
$imm := EXTO.sign(offset);"} , 
clk(3){"$offset := $imm(29 dOYllto 0) &. ¥"00¥"; 
$target := ADDO . add($pc , $offset); 
$flag := ALUO.cmpz($rs); 
if ($rs ~3 1) ='0') then PC: =$target; end if; "}, 
clk(4){""} , 
clk(5){""} 
>
, "BGEZAL"{clk (1 ){"IR := 1肥M[PC] ; 
PC.incO; 
$pc :=PC; " }, 
clk(2) {"DECODE(IR) ; 
$rs:=GPR.readO(rs); 
S四m := EXTO.sign(offset) ; つ ，
clk(3){"$offset := $imm(29 dOYllto 0) & ¥"00¥"; 
$target := ADDO . add($pc , $offset); 
~f($rs(31) = '0 ' ) then-PC:=$target; end if; 
$pc2 := PC; "}, 
clk (4){""} , 
;比(引"GPR[\"l山町] :吻山"}
, "BGTZ"{clk ( l){"IR := 1阻M[PC] ; 
130 
PC. incO; 
$pc : =PC ;"} , 
clk(2){"DECODE(IR) ; 
$rs:=GPR.readO(rs); 
$imm _: = EXTO. sign(offset) ;つ，
clk(3){"$offset := $imm(29 dOYllto 0) & ¥"00¥"; 
$target := ADDO.add ($pc , $offset); 
$flag:=ALUO.cmpz($rs); 
if(($rs(31) = > 0 ') 批 ($f lag(2) = '0')) then PC:= $target; end if;"} , 
clk(4){ "" } , 
clk(5){""} 
>
, "BLEZ"{clk ( l){"IR := 1阻M[PC] ; 
PC. incO; 
$pc : =PC;"} , 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$imm := EXTO.sign(offset) ; "} , 
clk(3){"$offset := $imm(29 dOYllto 0) & ¥"00¥"; 
$target := ADDO.add($pc , $offset); 
$flag:=ALUO.cmpz($rs); 
if(~$!~(3~) = '1') I ($flag(2) =中)) then PC:= $target; end if; つ ，
clk(4){" つ，
clk (5){'"'} 
} 
， "BLTZベclk(1){"IR := lMEM[PC]; 
PC. incO; 
$pc:=PC;"} , 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$imm := EXTO . sign(offset);"} , 
clk(3){"$offset -:= $四m(29 dOYllto 0) & ¥"00¥"; 
$target := ADDO.add($pc , $offset); 
if(~r~Ç312='1') then ﾞC:=$target; end if ; つ，
C工k(4){""} ，
clk(5){" "} 
} 
, "BLTZ札"{clk(l){勺R := lMEM[PC]; 
PC. incO; 
$pc:=PC;"} , 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$imm := EXTO.sign(offset);"} , 
clk(3){"$offset := $imm(29 dOYllto 0) & \吻0\ " ; 
$target := ADDO . add($pc , $offset); 
if($rs(31)='1') then PC:=$target; end if; 
$pc2 := PC;"} , 
clk(4){川}，
;lk(訓"GPR[\"山1刊] :吻山つ
, "IAND"{clk (1 ){"IR := 1阻M[PC] ; 
PC.incO ; "}, 
clk(2){"DECODE(IR); 
$rs := GPR.readO(rs); 
$rt : = GPR. readl (rt) ; "} , 
clk(3){" ($result , $flag) := ALUO.and($rs , $rt) ; つ，
clk(4){""} , 
clk(5) {"GPR[rd] := $result;"} 
} 
, "INOR"{clk (1 ){"IR := lMEM[PC]; 
131 
PC.incO; "}, 
clk(2){"DECODE(IR) ; 
$rs := GPR.readO(rs); 
$rt := GPR.read1(rt); つ，
clk(3){" ($result , $flag) := ALUO.nor($rs , $r七) ; "}, 
clk(4){""} , 
clk(5){"GPR[rd] := $result;"} 
>
, "IOR"{clk (1 ){"IR :: lMEM[PC]; 
PC.incO; "}, 
clk(2){"DECODE(IR); 
$rs :: GPR.readO(rs); 
$rt :: GPR.read1(rt); 
;つ，
~lk(3){"($result ， $flag) := ALUO.or($rs , $rt);"} , 
clk(4){""} , 
clk(5){"GPR[rd] := $resu工t; " 
>
, "ISUB"{clk(l){"IR := lMEM[PC]; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$rs := GPR.readO(rs); 
$rt := GPR.readl(rt) ;"} , 
clk(3){" ($result , $flag) := ALUO.sub($rs , $rt );"} , 
clk(4){""} , 
clk(5) {"GPR[rd] := $result;"} 
} 
, "IXOR"{clk(1){"IR := lMEM[PC]; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$rs := GPR.readO(rs); 
$rt := GPR.read1(玄t) ; "}, 
clk(3){" ($result , $flag) := ALUO.xor($rs , $rt); "} , 
clk(4){" つ，
clk(5){"GPR[rd] : = $result; "} 
>
, "J"{clk(l){"$pc:=PC; 
IR := 1阻M[PC] ; 
PC.incO j つ，
clk(2){"DECODE(IR) ; 
$target := $pc(31 downto 28) & IR(25 downto 0) & \吻0\" ;つ，
clk(3){"PC := $target; つ，
clk(4){" つ，
clk(5){" "} 
} 
, "JAL"{clk(l){"$pc := PC; 
IR := 1肥M[PC];
PC.incO; つ，
clk(2){"DECODE(IR)j 
$target := $pcC31 downto 28) & IR(25 downto 0) & \"OO\";"} , 
clk(3){"PC :: $target; 
$pc2 := PC;"} , 
clk(4){""} , 
clk(5){"GPR[¥" 11111¥"] := $pc2; "} 
} 
, "JALR"{clk(l){"$pc := PC; 
IR := lMEM[PC]; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs);"} , 
clk(3){"PC:=$rs; 
$pc2:=PC;"} , 
132 
clk (4){""} , 
;出制"G則\"1山町] :吻山つ
, "JR"{clk(l){"$pc:=PC; 
IR :: lMEM[PC]; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$rs:::GPR.readO(rs);"} , 
clk(3){"PC :=$rs;"} , 
clk(4){""} , 
clk(5){""} 
>
, "LB"{clk(1){"IR := lMEM[PC]; 
PC. incO; "}, 
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base:=GPR . readO(base); つ，
clk(3){ぺ$target ， $flag):=ALUO.add($base ， $offset) ; つ，
clk(4){" ($data , $addr_err):=DMEM.lb($target); つ，
clk(5){"GPR[rt] :=$data;"} 
>
, "LBU"{clk(l){"IR : = lMEM[PC]; 
PC. incO; つ，
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base :=GPR . readO(base);"} , 
clk(3){ぺ$target ， $flag) :=ALUO.add($base ， $offset) ; づ，
clk(4){" ($data , $addr_err):=DMEM .lbu($target) ; "} , 
clk(5) {"GPR[rt] :=$data; つ
} 
, "LH"{clk (1){勺R := 1阻M[PC] ; 
PC. incO; "}, 
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base : =GPR . readO(base);"} , 
clk(3){" ($target , $flag) : =ALUO.add($base ， $offs~t); づ ，
clk(4){"($data , $addr_err):=DMEM . lh($target);"} , 
clk(5){"GPR[rt] :=$data; つ
>
，"口町"{clk(l){"IR := 1阻M[PC] ; 
PC.incO; つ，
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base:=GPR.readO(base); つ，
clk(3){" ($target , $flag) :=ALUO. add($base , $of~se~) ; "}, 
clk(4){" ($data , $addr_err):=D虻EM . lhu($target); つ，
clk(5){"GPR[rt] :=$data; "} 
>
, "LUI"{clk (1 ){"IR := 工阻M[PC] ; 
PC.incO ;"}, 
clk(2){"DECODE(IR); 
$imm:=immediate & ¥"0000000000000000¥"; "}, 
clk(3){""} , 
clk(4){""} , 
clk(5){"GPR[rt] :=$imm; つ
>
, "LW"{clk(l){"IR := lMEM[PC]; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base:=GPR.readO(己ase); つ，
clk(3){" ($target , $flag) : =ALUO.add($b~:;;e ， $off~e~~ ;"} ， 
clk?){" ($dati , $addr_err):=DMEM.read($target) ;" } , 
133 
clk(5){"GPR[rt] :=$data;"} 
} 
, "ORIべclk(l){"IR := IMEM[PC]; 
PC. inc 0 ; "} , 
clk(2){"DECODE(IR) ; 
sresult : = \"0000000000000000000000000000000\H&Sflag(1);"} , 
clk(4 ){""} , 
clk(5){"GPR[rt] := $result; つ
>
, "SLTIU"{clk (1 ){"IR := 1阻M[PC] ; 
PC. incO ; つ ，
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$i~ :~~TO. s ign (泊mediate);"} ，
~lk(3 ) {"$flag :=ALUO .cmp( $rs ，. $imm); 
szeslilt :=\'10000000000000000000000000000000\H&NOT0.nt(Sflag(3));">,
clk(4){" つ，
;比(5){"GPR[rt] := $re叫t; つ
， "SLTU吋clk ( l){ " IR := IMEM[PC]; 
PC. incO; " } , 
clk(2){"DECODE(工R) ; 
$rs:=GPR.readO(rs); 
$rt:=GPR.readl(rt);"} , 
clk(3){"$flag:=ALUO.cmpu($rs ，$口); 
sreS111t:=\"0000000000000000000000000000000\"&NOT0 . 2t(Sflag(3) ); ">,
clk(4){""} , 
:比(引"GPR[rd] := $re叫t; つ
, "SRA"{clk(l){"IR := IMEM[PC]; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR) ; 
$rs:=GPR.readO(rs); 
$imm:=EXTO.zero(immediate); つ，
clk(3){" ($result , $flag):=ALUO.or($rs ， $imm); つ，
clk(4){" "}, 
clk(5){"GPR[rt] :=$result;"} 
} 
， "SBベclk (1){勺R := 1阻M [PC] ; 
PC.incO ;"}, 
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base:=GPR.readO(base); 
$rt :=GPR.read1(rt); "} , 
clk(3){" ($target , $flag): =ALUO . add($base ， $offset) ; つ ，
clk(4){"$addr_err :=DMEM . sb($target ， $rt ); つ ，
clk (5){ " つ
} 
, "SH"{clk (1 ){"IR := 1阻M[PC] ; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base:=GPR.readO(base); 
$rt : =GPR .readl(rt ); "} , 
clk(3){べ$target ， $flag) :=ALUO.add($base ,$offset);"} , 
clk(4){"$addr_err : =DMEM . sh($target ， $rt) ; つ ，
clk(5){""} 
} 
, "SLL"{clk (1 ){"IR : = 1阻M[PC] ; 
PC.inc() ; "} , 
clk(2){"DECODE(IR); 
$rt:=GPR . readl(rt);"} , 
clk(3){"$result:=SFTO.sra($rt ， sh祖t) ; "} , 
clk(4){" "}, 
clk(5){"GPR[rd] :=$result; "} 
>
, "SRAV"{clk(l){"IR := IMEM[PC]; 
PC. inc 0 ; "} , 
clk(2){"DECODE(IR); 
$rt :=GPR .readl(rt) ; "} , 
clk(3){"$result : =SF・I・O . sll($rt ， sh祖t);"} ，
clk (4) {'"'} , 
clk(5){"GPR[rd] :=$result; "} 
} 
, "SLLV"{clk(l){"IR := 1阻M[PC] ; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$sh祖t : =GPR.readO(rs);
$rt:=GPR.readl(rt);"} , 
c~kÇ3?~"$!esult:=SFTO.sra($rt ， $shamt(4 dOYD.to 0)); つ，
clk(4){""} , 
clk(5){"GPR[rd] :=$result; "} 
} 
, "SRL"{clk(l){"IR := 1阻M[PC] ; 
PC. incO; つ ，
clk(2){"DECODE(IR); 
$shamt:=GPR.readO(rs); 
$rt :=GPR.readl(rt);"} , 
clk(3){"$result : =SFTO . sll($rt , $shamt(4 dOYD.to 0)); つ，
clk(4){" "}, 
clk(5){"GPR[rd] :=$result;" 
>
, "SLT吋clk (1 ){"IR := 1旭川PCJ;
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$rt :=GPR.read1(rt); "} , 
clk(3){"$flag : =札.uO . cmp($rs ， $rt);
$re~u~~ : ~ ¥'-'0000000000000000000000000000000¥" & $flag(l) ;"}, 
clk(4){""} , 
clk(5){"GPR[rd] := $resul t; "} 
} 
, "SLTI"{clk(l){"IR := IMEM[PC]; 
PC. incO; "}, 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$imm : =EXTO . sign(immediate) ; "} , 
clk(3){"$flag :=ALUO . cmp($rs , $imm); 
$rt:=GPR.readl(rt);"} , 
clk(3){"$result : =SFTO.srl($rt ， sh祖t); つ，
clk(4){" つ，
clk(5) {"GPR[rd] :=$result;"} 
>
, "SRLV"{clk(l){"IR := 1阻M[PC] ; 
PC. inc 0 ; "} , 
clk(2){"DECODE(IR); 
$shamt:=GPR.readO(rs); 
$rt:=GPR.readl(rt);"} , 
clk(3){"$result:=SFTO.srl($rt , $shamt(4 dOYD.to O)) ; "} , 
clk(4){" つ，
clk(5) {"GPR[rd] :=$result ; つ
>
, "SUBU"{clk (1 ){"IR := 1阻M[PC] ; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
, 
? ?、、，，? ??e申, ??h亭，，目、、?????????，、Bノ
、ト
JPU??
.
，
.，可よ
、
EJ
、，
J4
ム
S
℃
ev 
??〆g‘、，
t、，
nU4
よ
+lu
d
、ql
?????
??
DUDU
晶弘
V
P
会D
ふ
fk
円unhu"
「­
一一-一、，
J
・・つ
d，，宮、
????
134 135 
clk(4){" "}, 
clk(5){"GPR[rd] := $result;"} 
} 
, "SW"{c1k(1){"IR := 1阻M[PC] ; 
PC. inc 0 ; "}, 
clk(2){"DECODE(IR); 
$offset :=EXTO.sign(offset); 
$base:=GPR.readO(base); 
$rt : =GPR. r eadl(rt); "} , 
clk(3){べ$target ， $flag) : =ALUO. add ($base ,$offset);"} , 
clk (4 ){"$addr_e r r:=DMEM . wr ite($target , $rt);" } , 
clk( S ){" つ
} 
, "XORI"{clk(l){"IR := lMEM[PC]; 
PC. incO; " }, 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$ imm: =EXTO.zero(四mediate) ; つ，
clk(3){" ($result , $flag) : =ALUO.xor($rs ,$imm) ; "} , 
clk(4){"I} , 
clk(5){"GPR[rt] : =$resuユt ; "} 
} 
， "肌JLT"{c1k (1 ){"IR := 1阻M[PC] ;
PC. incO; "}, 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$rt :=GPR.readl(rt) ; "} , 
clk(3){" ($result , $flag) : =限凡O . mul($rs ， $rt) ; つ，
clk(4){"I} , 
clk(5){"HI:=$result(63 dOYD.to 32); 
LO:=$resu1t(31 dOYD.to 0); つ
>
， "MUL抗1吋clk (1 ){"IR := lMEM[PC]; 
PC. incO ; "}, 
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$rt :=GPR.read1(rt); "}, 
clk(3){" ($result , $flag) : =阻江O . mulu($rs ， $rt); つ，
clk(4){""} , 
clk(5){"HI:=$result(63 dOYD.to 32); 
LO:=$result(31 dOYD.to 0) ; つ
} 
，叩IV"{clk (1 ){"IR := 1肥M[PC];
PC . incO; つ，
clk(2){"DECODE(IR); 
$rs:=GPR.readO(rs); 
$rt :=GPR .readl(rt);"} , 
clk(3){"($q ， $r ， $flag) : =DIVO . d工v($rs ， $rt);"} ，
c1k(4){""} , 
;lk(訓"HI:=$r ; 比 =$q; つ
， "DlVU吋clk(1){吋R := 1阻M[PC] ; 
PC.inc();"} , 
clk(2) {"DECODE(IR) ; 
$rs:=GPR.readO(rs); 
$rt : =GPR . readl(rt);"} , 
clk(3){"($q ， $r ， $flag):=DIVO.divu($rs ， $rt); つ，
clk(4){""} , 
;は(5){"HI : =$r; LO:=$q;"} 
，"町HI"{clk(l){"IR := lMEM[PC]; 
PC. inc 0 ; " } , 
clk(2){"DECODE(IR) ; "} , 
136 
c 工k (3){"$hi :=HI; " } ，
c1k (4){"つ ，
;比ωf十{"GPR[刈 : =$hi;ゾ"
， "M町FLO'吋c1k(1){" IR :戸= 1阻M[PCの] ; 
PC. inc () ; "} ,
c1k(2){"DECODE(IR); " }, 
c1k(3){ "$10:=LO ;" } , 
c1k(4){" "} , 
c1k(5){"GPR[rd] :=$10; つ
} 
， "M百fI "{clk (1 ){"IR := 1阻M[PC] ; 
PC. inc 0 ; "} , 
clk(2) {"DECODE(IR) ; 
$rs :=GPR.readO(rs); "} , 
clk(3){" "} , 
c1k(4){"つ ，
clk(5){"HI:=$rs ; つ
>
, "MTLO"{c1k (1 ){"IR : = 1阻M[PC] ; 
PC. inc 0 ; "}, 
c1k(2){"DECODE(IR); 
$rs:=GPR.readO(rs);"} , 
c1k(3){" つ，
c1k(4){" つ，
clk(5){"LO:=$rs;"} 
} 
, "BEQ"{c1k(1){"IR := 1阻M[PC] ; 
PC. incO; 
$pC:=PC;"} , 
c1k(2) {"DECODE(IR) ; 
$rt:=GPR.readl(rt); 
$rs:=GPR.readO(rs); 
$imm := EXTO.sign(offset);"} , 
clk(3){"$offset := $imm(29 dOYD.to 0) & \勺0\" ;
$target := ADDO.add($pc , $offset); 
$flag : =ALUO.cmp($rs ,$rt); 
if(~f~~g(~)='l') then PC:=$target; end if ; つ，
c1k(4){" "}, 
clk(5){"I} 
} 
， "BNE"{clk (1){勺R := 1阻M[PC] ; 
PC.incO; 
$pc:=PC;"} , 
clk(2){"DECODE(IR); 
$rt:=GPR.readl(rt); 
$rs:=GPR.readO(rs); 
$imm := EXTO.sign(offset);"} , 
c1k(3){"$offset := $imm(29 downto 0) & ¥"00¥"; 
$target := ADDO.add($pc , $offset); 
$flag:=ALUO . cmp($rs ,$rt) ; 
if($flag(2)='0') then PC:=$target; end if ; つ，
clk(4){" つ，
clk(5){" "} 
>
}} 
>
137 
Appendix C 
Synthesis Result of PEAS R3K 
Processor 
C.l VHDL Descriptionf of PEAS R3K Datapath 
library lEEE; 
use lEEE.std_logic_1164.all; 
entity CPU is 
port C 
clk : in std_logic; 
intn : in std_logic_vectorC2 downto 0); 
in七: in std_logic; 
rst : in std_logic; 
instAB : out std_logic_vectorC31 downto 0); 
instDB : in std_logic_vectorC31 downto 0); 
dataAB : out std_logic_vectorC31 downto 0); 
dataDB : inout std_logic_vectorC31 downto 0); 
ve : out std_logic_vectorC3 downto 0)); 
end CPU; 
architecture syn of CPU is 
component cpu_ctrl 
port C 
instDB: in std_logic_vectorC31 downto 0); 
rst : in std_logic; 
int : in std_logic; 
intn : in std_logic_vectorC2 downto 0); 
clk : in std_logic; 
IR_data_out : in std_logic_vectorC31 downto 0); 
MULO_fin : in std_logic; 
DIVO_flag: in std_logic_vectorC1 downto 0); 
CSW_enb : out std_logic; 
CSW_rst : out std_logic; 
reg39_enb : out std_logic; 
reg38_enb : out std_logic; 
reg37_enb : out std_logic; 
reg36_enb : out std_logic; 
reg35_enb : out std_logic; 
reg34_enb : out std_logic; 
reg33_enb : out std_logic; 
reg32_enb : out std_logic; 
reg31_enb : out std_logic; 
reg30_enb : out std_logic; 
reg29_enb : out std_logic; 
reg28_enb : out std_logic; 
reg27_enb : out std_logic; 
reg26_enb : out std_logic; 
reg25_enb : out std_logic; 
reg24_enb : out std_logic; 
reg23_enb : out std_logic; 
reg22_enb : out std_logic; 
139 
reg21_enb : out std_logic; 
reg20_enb : out std_logic; 
sel19_ctrl : out std_logic_vectorC1 downto 0); 
se工 18_ctrl : out std_logic_vectorC1 downto 0); 
sel17_ctrl : out std_logic_vectorCO downto 0); 
sel16_ctrl : out std_logic_vectorCO downto 0); 
sel15_ctr1 : out std_logic_vectorCO downto 0); 
sel14_ctrl : out std_logic_vectorC2 downto 0); 
sel13_ctrl : out std_logic_vectorCO downto 0); 
sel12_ctr1 : out std_logic_vectorCO downto 0); 
se111_ctr1 : out std_logic_vector(O downto 0); 
se110 ctr1 : out std_logic_vector(O downto 0); 
DIVO_ctrl : out std_logic; 
LO_enb : out std_logic; 
LO_rst : out std_logic; 
HI_enb : out std_logic; 
HI rst : out std_logic; 
MULO_start : out std_logic; 
MULO_ctr1 : out std_logic; 
SFTO_mode : out std_logic_vectorCl downto 0); 
DMEM_ext_ctrl : out std_logic; 
D阻凡ac_ctrl : out std_logic_vectorC1 downto 0); 
DMEM_req : out std_logic; 
DMEM_rw : out std_logic; 
EXTO_ctrl : out std_logic; 
ALUO ctrl : out std_logic_vectorC4 downto 0); 
ALUO_cin : out std_logic; 
GPR_w_enbO : out std_logic; 
GPR_reset : out std_logic; 
IR_enb : out std_logic; 
IR_rst : out std_logic; 
PC_hold : out std_logic; 
PC_reset : out std_logic; 
PC_load : out std_logic; 
reg20_data_out : in std_logic_vector(31 downto 0); 
sys4_pO : in std_logic; 
sys2_pO : in std_logic; 
ALUO_flag: in std_logic_vectorC3 downto 0)); 
end component; 
component pcu_17 
generic (W : integer := 32; 
S : integer := 4); 
port( 
c1k : in std_logic; 
10ad : in std_logic; 
reset : in std_logic; 
hold : in std_logic; 
data: in std_logic_vector(W-l downto 0); 
q : out std_logic_vectorCW-l downto 0)); 
end component; 
component imcu_18 
generic (W : integer := 32); 
PORT( 
addr : in 
data : out 
m_addr : out 
m data : in 
std_logic_vectorCW-l downto 0 ) 
std_logic_vector(W-l downto 0 ) 
std_logic_vector(W-l downto 0 ) 
std_logic_vector(W-l downto 0 ) 
) ; 
end component; 
component register_9 
generic (W : integer := 32); 
port (clk : in std_logic; 
rst : in std_logic; 
enb : in std_logic; 
data_in : in std_logic_vector(W-l downto 0); 
data_out : out std_logic_vector(W-l downto 0) ); 
end component; 
component registerfi1e_10 
140 
generic (W : integer := 32); 
port ( c10ck : in std_logic; 
reset : in std_logic; 
w enbO : in std_logic; 
w_selO : in std_logic_vector( 4 downto 0); 
data_inO in std_logic_vector(W-l downto 0); 
r_se10 : in std_logic_vector( 4 downto 0); 
r_sell : in std_ユogic_vector( 4 downto 0); 
data_outO out std_logic_vectorCW-l downto 0); 
data_outl : out std_logic_vector(W-l downto 0) ); 
end component; 
component alu_12 
generic CW : integer := 32); 
port (a , b : in std_logic_vector(W-l downto 0); 
cin : in std_logic; 
ctrl : in std_logic_vectorC4 downto 0); 
result out std_logic_vectorCW-l downto 0); 
flag : out std_logic_vectorC3 downto 0) ); 
end component; 
component extender_15 
generic CW : integer := 16); 
port (data_in : in std_logic_vector(W-l downto 0); 
ctrl : in std_logic; 
data_out : out std_logic_vector(2州ー 1 downto 0)); 
end component; 
component adder_11 
generic(W: integer := 32); 
port (a , b : in std_logic_vector(W-l downto 0); 
cin : in std_logic; 
result : out std_logic_vector(W-l downto 0); 
cout : out std_logic); 
end component; 
component dmcu_19 
port ( rw : in std_logic; 
req : in std_logic; 
addr : in std_logic_vector(31 downto 0); 
i_data : ou七 std_logic_vector(31 downto 0); 
o_data : in std_logic_vectorC31 downto 0); 
ac ctrl : in std_logic_vector(l downto 0); 
ext_ctrl : in std_logic; 
addr_err : out std_logic; 
we : out std_logic_vectorC3 downto 0); 
m_addr : out std_logic_vector(31 downto 0); 
m_data : inout std_logic_vectorC31 downto 0)); 
end component; 
component barre1shifter_14 
generic(W: integer := 32); 
port Cdata_in : in std_logic_vectorCW-l downto 0); 
mode : in std_logic_vector(l downto 0); 
ctrl : in s七d_logic_vector(4 downto 0); 
data_out : out std_logic_vectorCW-l downto 0)); 
end component; 
component not_20 
port (data_in : in std_logic; 
data_out : out std_logic); 
end component; 
component multiplier_16 
generic (W : integer := 32); 
port (clk : in std_logic; 
reset : in std_logic; 
a , b : in std_logic_vector(W-l downto 0); 
ctrl : in std_logic; 
start : in std_logic; 
result : out std_logic_vec七or(2*W-1 downto 0); 
fin : out std_logic); 
end component; 
component divider_13 
generic (W : integer := 32); 
port (clk : in std_logic; 
141 
a , b 
ctrl 
resultO 
resuユtl
flag 
in std_logic_vectorCW-l downto 0); 
in std_logic; 
out std_logic_vectorCW-l downto O~; 
out std_logic_vectorCW-l downto_ ?); 
out std_logic_vectorCl downto 0)); 
end component; 
component selector_21 
generic (w : int~ger := 32; 
n integer := ~ 
lOgll: integer := 1); 
port(data-ho:la std-logic-1rector(w-1domto O> ;
data_in1 : in std_logic_vector(w-l downto 0); 
ctrl : in std_logic_vector(logn-l downt??); 
data out : out std_l?ic_vector(w=l downto 0)); 
end component; 
comnonent selector_22 
generic (w : integer := 5; 
n integer ,= 2; 
log主 : integer := 1); 
port(data-1nO:12 Std-logic-vector(w-1domtoO); 
data=inl : in std_logic_vector(w-l downto 0); 
ctrl : in std_logic_vector(logn-l downto.?); 
data_out : out std_lõgic_vector(w~ l downto 0)); 
end component; 
comnonent selector_23 
generic (w : int:ger := 32; 
n : integer := 8; 
lOgll: integer := 3); 
port (data_inO : in std_~ogic_vector ~w-~ downto ~(; 
data_inl : in std_logic_vector(w-l down七o 0); 
data_in2 : in s七d_logic_vectorCw-l downto O~; 
data_in3 : in std_logic_vector(w-l downto O~; 
data_in4 : in std_logic_vectorCw-l downto O~; 
data_in5 : in std_logic_vector(w-l downto O~; 
data_in6 : in std_logic_vector(w-l downto ~ ~ ; 
data_in7 : in std_logic_vector(w-l downto 0); 
ctrl : in std_logic_vector(logn-l downt??); 
data_out : out std_l?ic_vectorCw=l downto 0)); 
end component; 
component selector_24 
generic (w : integer := 32; 
n : in七 eger := 3; 
log孟: integer := 2); 
port (data_inO : in std_logic_vector~w-~ downto ~~; 
data_inl : in std_logic_vector(w-l downto O?; 
data_in2 : in std_logic_vector(w-l downto 0); 
ctrl : in std_logic_vect?Clogn-l downto.9); 
data_out : out std_l?ic_vector(w=l downto 0)); 
end component; 
component pipereg_25 
g?eric ~W- : integer := 32); 
port Cclk : in std_logic; 
rst : in std_logic; 
enb : in std_logic; 
data_in : in std_logic_vector(W-l downto O)i 
data=out : out std_l?ic_vectorCW-l downto 0) ); 
end component; 
component pipereg_26 
generic ~W- : integer := 30); 
port Cclk : in std_logic; 
rst : in std_logic; 
enb : in std_logic; 
data_in : in std_logic_vectorCW-l downto O)i 
data=out : out std_l?ic_vectorCW-l downto 0) ); 
end component; 
component pipereg_27 
g?eric ~W- : integer := 5); 
port Cclk : in std_logic; 
rst : in std_logic; 
142 
enb : in std_logic; 
data_in : in std_logic_vectorCW-l downto 0); 
data_out : out std_logic_vectorCW-l downto 0) ); 
end component; 
signal d CSW_data_out : s七d_logic_vector(31 downto 0); 
signal d_reg39_data_out : std_logic_vector(31 downto 0); 
signal d_reg38_data_out : std_logic_vector(31 downto 0); 
signal d_reg37_data_out : std_logic_vector(31 downto 0); 
signal d_reg36_data_out : std_logic_vector(31 downto 0); 
signal d_reg35_data_out : std_logic_vector(4 downto 0); 
signal d_reg34_data_out : std_logic_vectorC31 downto 0); 
signal d_reg33_data_out : std_logic_vectorC31 downto 0); 
signal d_reg32_data_out : std_logic_vectorC31 downto 0); 
signal d_reg31_data_out : std_logic_vector(4 downto 0); 
signal d_reg30_data_out : s七d_l ogi c_vector C4 downto 0); 
signal d_reg29_data_out : std_logic_vectorC4 downto 0); 
signal d_reg28_data_out : std_logic_vectorC31 downto 0); 
signal d_reg27_data_out : std_logic_vectorC31 downto 0); 
signal d_reg26_data_out : std_logic_vectorC31 downto 0); 
signal d_reg25_data_out : std_logic_vectorC31 downto 0); 
signal d_reg24_data_out : s七d_logic_vectorC31 downto 0); 
signal d_reg23_data_out : std_logic_vectorC31 downto 0); 
signal d_reg22_data_ou七: s七d_logic_vectorC31 downto 0); 
signal d_reg21_data_out : std_logic_vectorC29 downto 0); 
signal d_reg20_data_out : std_logic_vectorC31 downto 0); 
signal d_sel19_data_out : std_logic_vectorC31 downto 0); 
signal d sel18_data out : std_logic_vectorC31 downto 0); 
signal d_sel17_data_out : std_logic_vector(4 downto 0); 
signal d_sel16 data_out : std_logic_vector(31 downto 0); 
signal d_sel15 data_out : std_logic_vectorC31 downto 0); 
signal d_sel14_data_out : s七d_logic_vector(31 downto 0); 
signal d_sel13_data_out : std_logic_vectorC4 downto 0); 
signal d_sel12_data_out : std_logic_vectorC4 downto 0); 
signal d_se工11_data_out : std_logic_vectorC31 downto 0); 
signal d_sell0_data_out : std_logic_vectorC31 downto 0); 
signal d_DIVO_flag : std_logic_vectorCl downto 0); 
signal d_DIVO_resultl : std_logic_vectorC31 downto 0); 
signal d_DIVO_resultO : std_logic_vector(31 downto 0); 
signal d_LO_data_out : std_logic_vector(31 downto 0); 
signal d_HI_data out : std_logic_vector(31 downto 0); 
signal d_MULO_fin : std_logic; 
signal d_MULO_result : std~logic_vectorC63 downto 0); 
signal d_sysl0_pO : std_logic_vector(31 downto 0); 
signal d_NOTO_data_out : std_logic; 
signal d_sys9_pO : std_logic_vector(31 downto 0); 
signal d_sys8_pO : std_logic_vectorC30 downto 0); 
signal d_SF寸o data_out : std_logic_vectorC31 downto 0); 
signal d_sys7_pO : std_logユ c_vector(31 downto 0); 
signal d_sys6_pO : std_logic_vectorC15 downto 0); 
signal d_DMEM_addr_err : std_logic; 
signal d_DMEM_i_data : std_logic_vectorC31 downto 0); 
signal d_sys5_pO : std_logic_vectorC31 downto 0); 
signal d_sys4_pO : std_logic; 
signal d_sys3_pO : std_logic_vectorC4 downto 0); 
signal d_sys2_pO : std_logic; 
signal d_ADDO_cout : std_logic; 
signal d_ADDO_result : std_iogic_vectorC31 downto 0); 
signal d_sysl_pO : std_logic_vectorC31 downto 0); 
signal d_sysO_pO : std_logic_vectorCl downto 0); 
signal d_EXTO_data_out : std_logic_vectorC31 downto 0); 
signal d_ALUO_flag : std_logic_vectorC3 downto 0); 
signal d_ALUO_result : std_logic_vector(31 downto 0); 
signal d_GPR_data_outl : std_logic_vectorC31 downto 0); 
signal d_GPR_data_outO : std_logic_vectorC31 downto 0); 
signal d_IR_data_out : std_logic_vector(31 downto_ 0); 
signal d_IMEM_data : std_logic_vector(31 downto 0); 
signal d_PC_q : std_logic_vector(31 downto 0); 
signal c_PC_load : std_logic; 
signal c_PC_reset : std_logic; 
143 
、、，
J
ハU
・，.，.，.，.，.，.，.，.，.，
.，.，、，ノ、
Bノ、‘ノ、，
J
、，ノ、，ノ、，ノ、，ノ、，ノ、
BJ
???????nU+
し
nu
??????
OWO七七七七七ttttt
??????
mωmmmmmmmmmmm ???d
〆td工O00020oo--
401f、
r
、
r
、「、f、ftf、「、f、f、
r、tfkzrrrrrrrrr?? ?????c一cceeeeeeeeeeec
- evvvvvvvvvv, 
;v
・1g ;-- 一一-一一--t
.，c-
・'gDo--'c・'cccccccccc・'・
2.
，.，.，.，.，.，.，.，.，.，.，.，.，.，.，.，.，.，u
c
・1・'cc・'01
ム
CC
- c・1・1ll
・-・1・1
・1
・1・1cccccccccccccccccccc0
.，工gc
・11・'el-
-1
・lg
・-ggggggggFugi---li--
・1Eli--1・1-1
・li---li-
-1・1・
2.
，
c
・
，
.，FOO
- ggC1てdgFOG-，.，.，.，g000000000ogggggggggggggOgpuggpoggecagbbb
・1ccolgoo--gdtoolccccol--111111-1000000000000000000OO--1t'annapo--11一ol--gots141-一
・11・lil---一一-----111
占1111
占
1111141-111111
ム
1ggan-4eee
ogg-d 一014S -dgggg-dddddddddd------一---一一--一一-一一一oodif' -一­lo dt-ddl-
・ ddtoooodttttttttttdddddddddddddddddddd114τ
士
-bt987
-11七sdtt-d・・ttS1111-tssssssssss
℃℃
tttttttttttttt七七tt--R-nuns333
d-
一stssdtlss----sssssssssssssssssssssddIovexggg
℃
dd -stslr:dddd・-::::::::;tt，fLI--eee
stt・・
・
・
・ SEt
-
--
-
tttt・・・・・・・・・
・
・
・
・・・・・・・・・・・・・・・・
・ ・・・・・・・・・・・・SSBdUDwwrzz
sso
--:
tctsssS141141111111DM-ss -一
℃
bll・-c-el--rrrrrrrrrrbbbbbbbbbbbbbbbbbbbb・・・・七〉てdccccc
・
enarzq一t ra----- -rttttttttttannannnnnnannaanBEEEs'=d--
dse
- ttweexotttcccccccccceeeeeeeeeeeeeeeeeeeetbnn>CC>>>
ltb 一cc rraemcs btbc---一-一一一-一一 ----一-一 一 一 ---一一sal--''t't〉====osarw---一 一一一sn n-01234567
.8901234567890123456789rerttnku=〉〉
hre--000MMMMOo zereo 11111111122222222223333333333--t〉sn・11 g==bbbに凡九四四印印灯阻阻阻阻円札瓜
LL
札口
-wdddddddd4d
旬、何百句旬、句句旬、句句句句句句句句句句句句何百別別
c-(=r・1〉C
丸山
hbtm
叩印
PIIGGAAEDDEDS
肝
MHHLLDssssssssssrrrrrrzrrrrrrrrzrrrrccupB〉〉=〉tffas--一
------一}一--一
-JI---
一---一----一一--一一----一------一一-一--一一-paD===a-一eZ987
?????????????????????????????????
???????
4444444444444444444444
叫4444444444444444444444444
叫444:口
・mm
- uudm
町民偲岱四回目
mrpppmrppppmrppmrpppppmrppmrpppmrpppmr
伊伊
mrpFPmrppmrpEd
戸
1
・11A
・ュ
，ュェ
・1
・1
・1
・1144
・ェ
・1
・14ム
・14Al--1l
-ll・-
-ll
・-
・1・11目工・1-l
・1
・144・工
・1
・1・1・1・1
・1・1・1・1・1・1・1
-1・1
・1t
ssssssss s sssssssssss s sssssssssssssssssssssssssss
句
c
??
reg36_enb => c_reg36_enb , 
reg35_enb => c_reg35_enb , 
reg34_enb => c_reg34_enb , 
reg33_enb => c_reg33_enb , 
reg32_enb => c_reg32_enb , 
reg31_enb => c_reg31_enb , 
reg30_enb => c_reg30_enb , 
reg29_enb => c_reg29_enb , 
reg28_enb => c_reg28_enb , 
reg27_enb => c_reg27_enb , 
reg26_enb => c_reg26_enb , 
reg25_enb => c_reg25_enb , 
reg24_enb => c_reg24_enb , 
reg23_enb => c_reg23_enb , 
reg22_enb => c_reg22_enb , 
reg21_enb => c_reg21_enb , 
reg20_enb => c_reg20_enb , 
sel19_ctrl => c_sel19_ctrl , 
sel18_ctrl => c_sel18_ctrl , 
sel17_ctrl => c_sel17_ctrl , 
sel16_ctrl => c_sel16_ctrl , 
sel15_ctrl => c_sel15_ctrl , 
sel14_ctrl => c_sel14_ctrl , 
sel13_ctrl => c_sel13_ctrl , 
sel12_ctrl => c_sel12_ctrl , 
sell1_ctrl => c_sell1_ctrl , 
sell0_ctrl => c_sell0_ctrl , 
DIVO_ctrl => c_DIVO_ctrl , 
LO_enb => c_LO_enb , 
LO_rst => c_LO_rst , 
HI_enb => c_HI_enb , 
HI_rst => c_HI_rst , 
MULO_start => c_MULO_start , 
MULO_ctrl => c_MULO_ctrl , 
SF寸O_mode => c_SFTO_mode , 
DMEM_ext_ctrl => c_DMEM_ext_ctrl , 
DMEM_ac_ctrl => c_DMEM_ac_ctrl , 
DMEM_req => c_DMEM_req , 
DMEM自主v => c_DMEM_IV , 
EXTO_ctrl => c_EXTO_ctrl , 
ALUO_ctrl => c_ALUO_ctrl , 
ALUO_cin => c_ALUO_cin , 
GPR_w_enbO => c_GPR_w_enbO , 
GPR_reset => c_GPR_reset , 
IR_enb => c_IR_enb , 
IR_rst => c_IR_rst , 
PC_hold => c_PC_hold , 
PC_reset => c_PC_reset , 
PC_load => c_PC_load , 
reg20_data_out => d_reg20_data_out , 
sys4_pO => d_sys4_pO , 
sys2_pO => d_sys2_pO , 
ALUO_flag => d_ALUO_flag); 
PC : pcu_17 
port map( 
clk => clk , 
load => c_PC_load , 
reset => c_PC_reset , 
hold => c_PC_hold , 
data => d_selll_data_out , 
q => d_PC_q); 
lMEM : imcu_18 
port map( 
addr => d_PC_q , 
data => d_IMEM_data , 
m_addr => instAB , 
m_data => instDB); 
IR : register_9 
port map( 
144 145 
c1k => c1k , 
rst => c_IR_rst , 
enb => c_IR_enb , 
data_in => d_IMEM_data , 
data_out => d_IR_data_out); 
GPR : registerfile_10 
port mapC 
clock => clk , 
reset => c_GPR_reset , 
w_enbO => c_GPR_w_enbO , 
w_selO => d_sel13_data_out , 
data_inO => d_reg33_data_out , 
r_selO => d_IR_data_outC25 downto 21) , 
r_sel1 => d_IR_data_outC20 downto 16) , 
data_outO => d_GPR_data_outO , 
data_out1 => d_GPR_data_out1); 
ALUO : alu_12 
port mapC 
a => d_reg20_data_out , 
b => d_reg34_data_out , 
cin => c_ALUO_cin , 
ctr1 => c_ALUO_ctr1 , 
resu1t => d_ALUO_result , 
f1ag => d_ALUO_f1ag); 
EXTO : extender_15 
port mapC 
data_in => d_IR_da七a_out(15 downto 0) , 
ctr1 => c_EXTO_ctr1 , 
data_out => d_EXTO_data_out); 
ADDO : adder_l1 
port mapC 
a => d_reg23_data_out , 
b => d_sys1_pO , 
cin => d_sys2_pO , 
result => d_ADDO_resul七，
cout => d_ADDO_cout); 
DMEM : dmcu_19 
port mapC 
rw => c_DMEM_rw , 
req => c_DMEM_req , 
addr => d_reg24_data_out , 
i_data => d_DMEM_i_data , 
o_data => d_reg27_data_out , 
ac_ctr1 => c_DMEM_ac_ctr1 , 
ext_ctr1 => c_D斑EM_ext_ctrl ，
addr_err => d_DMEM_addr_err , 
we => we , 
m_addr => dataAB, 
m_data => dataDB); 
SF寸o : barrelshifter_14 
port mapC 
data_in => d_reg26_data_out , 
mode => c_SFTO_mode , 
ctrl => d_reg35_data_out , 
data_out => d_SFTO_data_out); 
NOTO : not_20 
port map( 
data_in => d_ALUO_flag(3) , 
data_out => d_NOTO_data_out); 
MULO : multiplier_16 
port mapC 
c1k => c1k , 
reset => d_sys2_pO , 
a => d_reg20_data_out , 
b => d_reg26_data_out , 
ctrl => c_MULO_ctrl , 
start => c_MULO_start , 
resu1t => d_MULO_result , 
fin => d_MULO_fin); 
146 
HI : register_9 
port mapC 
c1k => c1k , 
rst => c_HI_rst , 
enb => c_HI_enb , 
data_in => d_reg37_data_out , 
data_out => d_HI_data_ou七) ; 
LO : register_9 
port mapC 
c1k => clk , 
rst => c_LO_rst , 
enb => c_LO_enb , 
data_in => d_reg39_data_out , 
data_out => d_LO_data_out); 
DIVO : divider_13 
port map( 
clk => clk , 
a => d_reg20_data_out , 
b => d_reg26_data_out , 
ctr1 => c_DIVO_ctr1 , 
resultO => d_DIVO_resultO , 
result1 => d_DIVO_result1 , 
f1ag => d_DIVO_flag); 
se110 : se1ector_21 
port mapC 
data_inO => d_GPR_data_outO , 
data_in1 => d_sys5_pO , 
ctr1 => c_sel10_ctr1 , 
data_out => d_sel10_data_out); 
se111 : se1ector 21 
port mapC 
data_inO => d_reg28_data_out , 
data_in1 => d_ADDO_result , 
ctrl => c_se111_ctr1 , 
data_out => d_se111_data_out); 
se112 : se1ector_22 
port map( 
data_inO => d_IR_data_out(20 downto 16) , 
data_in1 => d_IR_data_outC15 downto 11) , 
ctr1 => c_sel12_ctrl , 
data_out => d_sel12_data_out); 
sel13 : se1ector 22 
port mapC 
data_inO => d_reg31_data_out , 
data_in1 => d_sys3_pO , 
ctrl => c_sel13_ctr1 , 
data_out => d_sel13_data_out); 
sel14 : se1ector_23 
port map( 
data_inO => d_reg25_data_out , 
data_in1 => d_LO_data_out , 
data_in2 => d_HI_data_out , 
data_in3 => d_sys10_pO , 
data_in4 => d_sys9_pO , 
data_in5 => d_SFTO_data_out , 
data_in6 => d_PC_q , 
data_in7 => d_ALUO_result , 
ctr1 => c_sel14_ctrl , 
da七a_out => d_se114_data_out); 
sel15 : se1ector_21 
port map( 
data_inO => d_reg32_data_out , 
data_in1 => d_DMEM_i_data , 
ctr1 => c_sel15_ctr1 , 
data_out => d_se115_data_out); 
se116 : selector 21 
port mapC 
data_inO => d_EXTO_data_out , 
data_in1 => d_GPR_data_out1 , 
147 
ctrl => c_se116_ctrl , 
data_out => d_se116_data_out); 
se117 : selector_22 
port mapC 
data_lnO => d_GPR_data_outO(4 dOYnto 0) , 
data_in1 => d_IR_data_out(10 dOYnto 6) , 
ctrl => c_se117_ctrl , 
data_out => d_se117_data_out); 
se118 : selector_24 
port map( 
data_inO => d_reg20_data_out , 
data_in1 => d_DIVO_result1 , 
data_in2 => d_Ml凡0_result(63 dOYnto 32) , 
ctrl => c_se118_ctrl , 
data_out => d_se118_data_out); 
se119 : selector_24 
port mapC 
data_inO => d_reg20_data_out , 
data_in1 => d_DIVO_resultO , 
data_in2 => d_MULO_resultC31 downto 0) , 
ctrl => c_se119_ctrl , 
data_out => d_se119_data_out); 
reg20 : pipereg_25 
port mapl 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg20_enb , 
data_in => d_GPR_data_outO , 
data_out => d_reg20_data_out); 
reg21 : pipereg_26 
port mapC 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg21_enb , 
data_in => d_EXTO_data_ou七 (29 downto 0) , 
data_out => d_reg21_data_out); 
reg22 : pipereg_25 
port mapC 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg22_enb , 
data_in => d_PC_q , 
data_out => d_reg22_data_out); 
reg23 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg23_enb , 
data_in => d_reg22_data_out , 
data_out => d_reg23_data_out); 
reg24 : pipereg_25 
port mapl 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg24_enb , 
data_in => d_ALUO_result , 
data_out => d_reg24_data_out); 
reg25 : pipereg_25 
port mapC 
clk => clk , 
rst => d_sys2_pO , 
enb => c骨reg25_enb ，
data_in => d_sys7_pO , 
data_out => d_reg25_data_out); 
reg26 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg26_enb , 
148 
data_in => d_GPR_data_out1 , 
data_out => d_reg26_data_out); 
reg27 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg27_enb , 
data_in => d_reg26_data ・_out ，
data_out => d_reg27_data_out); 
reg28 : pipereg_25 
port mapC 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg28_enb , 
data_in => d_se110_data_out , 
data_out => d_reg28_data_out); 
reg29 pipereg_27 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg29_enb , 
data_in => d_se112_data_out , 
data_out => d_reg29_data_out); 
reg30 : pipereg_27 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg30_enb , 
data_in => d_reg29_data_out , 
data_out => d_reg30_data_out); 
reg31 : pipereg_27 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg31_enb , 
data_in => d_reg30_data_ou七，
data_out => d四reg31_data_out);
玄eg32 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg32_enb , 
data_in => d_sel14_data_out , 
data_out => d_reg32_data_out); 
reg33 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c _reg33_enb , 
data_in => d_sel15_data_out , 
data_out => d_reg33_data_out); 
reg34 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg34_enb , 
data_in => d_sel16_data_out , 
data_out => d_reg34_data_out); 
reg35 : pipereg_27 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg35_enb , 
data_in => d_sel17_data_out , 
data_out => d_reg35_data_out); 
reg36 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
149 
enb => c_reg36_enb , 
data_in => d_se118_data_out , 
data_out => d_reg36_data_out); 
reg37 : pipereg_25 
port map( 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg37_enb , 
data_in => d_reg36_data_out , 
data_out => d_reg37_data_out); 
reg38 : pipereg_25 
port map¥ 
clk => clk , 
rst => d_sys2_pO , 
enb => c_reg38_enb , 
data_in => d_se119_data_out , 
data_out => d_reg38_data_out); 
reg39 : pipereg_25 
port map( 
c1k => clk , 
rst => d_sys 2_pO , 
enb => c_reg39_enb , 
data_in => d _reg38_data_out , 
data_out => d_reg39_data_out); 
CSW : register_9 
port map( 
clk => clk , 
工st => c_CSW_rst , 
enb => c_CSW_enb , 
data_in => d_PC_q , 
data_out => d_CSW_data_out); 
d_sysO_pO <= "00"; 
d_sys1_pO く= (d_reg21_data_out & d_sysO_pO); 
d_sys2_pO く= '0'; 
d_sys3_pO く= "11111"; 
d_sys4_pO <= '1'; d_sys5_pO く= ((d~reg22_data_out (31 downto 28) & d_IR_data_out(25 downto 0)) & d_sysO_pO); 
d_sys6_pO <= "oooooooooooooqog"; 
d-sys7-PO < =(d-IR-data-out(15downtoO)&d-sys6-Po); 
d_sys8_pO <= "OOOOOOOOOOOOOOOOOOOOOOO~O??OOOO" ; 
d_sys9_pO く= (d_sys8_pO & d_ALUO_flag(1)) ; 、
d_sys10_pOく= (d_sys8_pO & d_NOTO_data_out); 
end syn; 
reg35_enb : out std_logic; 
reg34_enb : out std_logic; 
reg33_enb : out std_logic; 
reg32_enb : out std_logic; 
reg31_enb : out std_logic; 
reg30_enb : out std_logic; 
reg29_enb : out std_logic; 
reg28_enb : out std_logic; 
reg27_enb : out std_logic; 
reg26_enb : out std_logic; 
reg25_enb : out std_logic; 
reg24_enb : out std_logic; 
reg23_enb : out std_logic; 
reg22_enb : out std_logic; 
reg21_enb : out std_logic; 
reg20_enb : out std_logic; 
sel19_ctr1 : out std_logic_vector(l downto 0); 
sel18_ctrl : out std_logic_vectorCl downto 0); 
sel17_ctrl : out std_logic_vectorCO downto 0); 
sel16_ctrl : out std_logic_vectorCO downto 0); 
sel15_ctrl : out std_logic_vectorCO downto 0); 
sel14_ctr1 : out std_logic_vectorC2 downto 0); 
sel13 ctr1 : out std_logic_vectorCO downto 0); 
sel12_ctr1 : out s七d_logic_vectorCO downto 0); 
se111_ctr1 : out std_logic_vectorCO downto 0); 
se110_ctr1 : out std_logic_vectorCO downto 0); 
DIVO_ctr1 : out std_logic; 
LO_enb : out std_logic; 
LO_rst : out std_logic; 
HI_enb : out std_logic; 
HI_rst : out std_logic; 
MULO_start : out std_logic; 
MULO_ctrl : out std_logic; 
SFTO_mode : out std_logic_vectorC1 downto 0); 
DMEM_ext_ctr工: out std_logic; 
DMEM_ac_ctrl : out std_logic_vector(l downto 0); 
DMEM_req : out std_logic; 
DMEM_rv : out std_logic; 
EXTO_ctrl : out std_logic; 
ALUO_ctrl : out std_logic_vector(4 downto 0); 
ALUO_cin : out std_logic; 
GPR_w_enbO : out std_logic; 
GPR_reset : out std_logic; 
IR_enb : out std_logic; 
1R_rst : out std_logic; 
PC_hold : out std_logic; 
PC_reset : out std_logic; 
PC_load : out std_logic; 
reg20_data_out : in std_logic_vectorC31 downto 0); 
sys4_pO : in std_logic; 
sys2_pO : in std_logic; 
ALUO_f1ag: in std_logic_vectorC3 downto 0)); 
end cpu_ctrl; 
ー_ entity_end 
C.2 VHDL Descriptionf of PEAS R3K Controller 
.，
1ょ
??1ム
a
?
?
?
??
m叶訪日z??
-司ょ
.、ム
σDFD ??
pu
‘d
・d
rh
令LV+L
V
円じ
SS
?4
・
町四国?????
?
?
一_ entity_begin 
entity cpu_ctr1 ユs
port C 
instDB : in std_logic_vectorC31 downto 0); 
rst : in std_logic; 
int : in std_logic; 
intn: in std_l?ic_vectorC2 downto 0); 
clk : in std_logic; 
IR_data_out : in std_logic_vectorC31 downto 0); 
MULO_fin : in std_logic; 
D1VO_flag: in std_l?ic_vectorC1 downto 0); 
CSW_enb : out std_logic; 
CSW_rst : out std_logic; 
reg39_enb : out std_logic; 
reg38_enb : out std_logic; 
reg37_enb : out std_logic; 
reg36_enb : out std_logic; 
architecture behavior of cpu_ctr1 is 
type Type_Itype isCI_ADD , I_ADDI , I_ADDIU , I_ADDU , I_ANDI , I_BGEZ , I_BGEZAL , I_BGTZ , I_BLEZ , I _BLTZ , 
I_BLTZAL , I_1AND , 1_INOR , I_IOR , I_ISUB , I_IXOR , I_J , I_JAL , I_JALR , I_JR , I_LB , I_LBU , I_LH , I _LHU , I_LUI , 
I_LW ， 1_ORI ， I_SB ， I_SH ， 1_SLL ， I_SLLV ， I_SLT ， I_SLTI ， I_SLTIU ， 1_SL百人工_SRA ， I_SRAV ， I_SRL ， I_SRLV ， I_SUBU ，
I_SW ， 1_XORI ， I_MULT ， I_~江TU ， I_DIV ， I_DlVU ， I_旺百1 ，I_MFLO , LM"口II ， I_MTLO ， I _BEQ ， I_BNE ， I _S_ERR);
type Type_Interruption isCINT_reset , INT_initO); 
subtype Type_Intr_Count is integer range 0 to 2; 
subtype Type_interrupt_state is integer range 0 to 2; 
signal inst : Type_Itype; 
signa1 go : std_logic_vectorCO to 5); 
signa1 valid : std_logic_vectorC1 to 5); 
signal rreset : std_logic; 
signal Interrupt_Step : Type_Intr_Count; 
150 151 
signal iinterrupt std_~ogic ~ 
signal interrupt_name : TrPe_I~terrupt工on ;
si広nal interrupt_state : Type_interrupt_state; 
signal next_multi_st_3 : st~_logic; 
signal mult工_st_3 : std_logic; 
signal lock_multi_3 : s~d_logic; 
signal lock_3 : ~t~_l~gic ; ;i~~l cw_2 : std_logic_vector~~~ dovnto ~~ ; 
ignal cw-3 :std-log1C-vector(30 doVEtoO); 
signal cw-4:std-logle-vector(9domtoO); 
;i~~l cw=5 : std_1~g~c_~ector(3 downto 0); 
sigual bbranch ~t~=~~g~~; 
sigual lock_3_ctr~_p}J~_~~ag : s~d~lo~ic; 
sigual lock_3_ctrl_MULO_fin : std_logic; 
begin 
go(O) く= '1' when (interrupt_state = 1) else _'~' ; 
go(l) <= validÇ~? 担d (not valid(2) or go(2)); 
go(2) く= valid(2) 担d(not V44d(3)or go(3)); 
go(3) く= valid(3) 担d (not valid(4) or go(4)) and not lock_3; 
go(4) <= valid(4) 担d (not valid(5) or go(5)); 
go(5) く= valid(5); 
CTRL: process(clk , rreset , bbranch) 
begin 
if(clk'event a旦d clk = '1') then 
if(rreset = '1') then 
valid <= "00000"; 
elsif(bbranch = '1') then 
valid く= go(O) & "0" & valid(2 to 4); 
else 
valid(1) 
valid(2) 
valid(3) 
valid(4) 
valid(5) 
、‘，，、
‘，
，，
、‘，，
，、‘，，，
、‘，，，、‘
EJ
、、，，，、‘，
J
勺LqdAHEEUJ't、
，，‘、
，，
E、
，，‘
、
???-1-1
・1
・1
44
心心
??・d
・d
・dd
担辺白也、、，
J
、
E，，，、‘，，，、、』ノ
門4quA
せに
JV
，，‘、，，‘、，，‘、，，.、???? ??? ? ? ? ? ???，，‘、，，屯、，，
E、，，‘、
??。。。
O
、‘，，〆，、、
a〆、、
a，J，、‘，ノ
、、目，，、、.，，、、目，，、、.11
畠234
，，‘、，
，‘、
，，
E、，，‘
、
。
O
。。
σ白
σDσDσ
白
???也祖担臼、‘，，、
‘，，〆
，、‘，ノ、‘
，，J, 
41ム内
ι
包d44
?
，，h、
r，‘
、
r，E、，，‘‘、
、
.，
dd
唱dd
‘d?
、J-1
エ
.1
.エ
O
工
11
占
111
f
屯1、
aa
aa
。
vvvv
F匂〆
k
〆
krk
〆
E・、
==-一一一=くくくくく
end i工;
end if; 
end if; 
end process INTERRUPT; 
lock~multi_3 <= '1' when((lock_3_ctrl_DIVO_flag = '1') and (DIVO_flag(O) /= '1')) 
or ((lock_3_ctrl_MULO_fin = '1') and (肌江O_fin /= '1')) 
else 
'0' , 
next_multi_st_3 <= '0' when go(2) = '1' else 
'1' , 
肌凡T_ST : process(clk) 
begin 
if(clk'event and clk = '1') then 
if(rreset = '1') then 
mult i_st_3 く= '0'; 
else 
multi_st_3 <= next_multi_st_3; 
end if; 
end if; 
end process MULT_ST; 
lock_3 <= lock_multi_3; 
inst <= 
I_ADD when (1R_data_out(31 dovnto 26) = "000000") and (IR_data_out(5 downto 0) = "100000") else 
I_ADDI when (IR_data_out(31 dovnto 26) = "001000") else 
I_ADDIU when (IR_data_out(31 dovnto 26) = "001001") else 
I_ADDU when (IR_data_out(31 dovnto 26) = "000000") and (IR_data_out(5 dovnto 0) = "100001") else 
I_ANDI when (IR_data_out(31 dovnto 26) = "001100") else 
I_BGEZ when (IR_data_out(31 dovnto 26) = "000001") and (IR_data_out(20 dovn七o 16) = "00001") else 
I_BGEZAL when (IR_data_out(31 dovnto 26) = "000001") and (IR_data_out(20 downto 16) = "10001") else 
I_BGTZ when (IR_data_out(31 dovnto 26) = "000111") 担d (IR_data_out(20 dovnto 16) = 吻0000" ) else 
I_BLEZ when (IR_data_out(31 dovnto 26) = "000110") 担d (IR_data_out(20 downto 16) =勺0000") else 
I_BLTZ when (IR_data_out(31 downto 26) =勺00001") and (IR_data_out(20 downto 16) =勺0000") else 
I-BLTZAL when (IR_data_out(31 downto 26) = "000001") and (IR_data_out(20 downto 16) = "10000") else 
I_IAND when (IR_data_out(31 downto 26) =吻00000") and (IR_data_out(5 downto 0) = "100100") else 
I_INOR when (IR_data_out(31 downto 26) =勺00000") and (IR_data_out(5 downto 0) = "100111") else 
I_IOR when (1R_data_out(31 downto 26) =吋00000") and (IR_data_out(5 downto 0) = "100101") else 
I_ISUB when (IR_data_out(31 downto 26) =勺00000") and (IR_data_out(5 downto 0) = "100010") else 
1_IXOR when (IR_data_ou七 (31 downto 26) =勺00000") and (IR_data_out(5 dovnto 0) = "100110") else 
I_J when (IR_data_out(31 downto 26) = "000010") else 
1 JAL when (1R_data_out(31 downto 26) = "000011") else 
1=JALR when (工R_data_out(31 downto 26) =勺00000") and (1R_data_out(5 downto 0) =勺01001") else 
I_JR when (1R_data_out(31 downto 26) = "000000") 担d (IR_data_out(5 downto 0) = "001000") else 
1_LB when (IR_da七a_out(31 downto 26) = "100000") else 
1_LBU when (1R_data_out(31 downto 26) = "100100") else 
I_LH when (IR_data_out(31 downto 26) = "100001") else 
I_LHU when (IR_data_out(31 downto 26) = "100101") else 
I_LUI when (1R_data_out(31 downto 26) = "001111") else 
I_LW when (IR_data_out(31 downto 26) = "100011") else 
1_OR1 when (1R_data_out(31 downto 26) = "001101") else 
1_SB when (1R_data_out(31 downto 26) = "101000") else 
1 SH when (1R_data_out(31 downto 26) = "101001") else 
I;s斗 when (1R_data_out(31 downto 26) = "000000") 祖d (1R_data_out(5 downto 0) =勺00000") else 
ピSLLV when (IR_data_out(31 downto 26) = "000000") 担d (1R_data_out(5 downto 0) =勺00100") else 
1=SLT when (1R_data_out(31 downto 26) = "000000つ担d (1R_data_out(5 downto 0) = "101010") else 
I_SLT1 when (工R_data_out(31 downto 26) = "001010") else 
I_SLT1U when (IR_data_out(31 downto 26) = "001011") else 
ILsL百J when (1R_data_out(31 downto 26) =勺00000") 組d (1R_data_out(5 downto 0) = "101011") else 
1-SRA when (IR_data_out(31 downto 26) = "000000") and (1R_data_out(5 downto 0) =勺00011") else 
I:SRAV when(IR-datιout(31 downto 26) =勺00000") and (IR_data_out(5 downto 0) =勺00111") else 
I;sE, when (IR-data-Out (31domto26)="000000")and (IR-data-out <5domto O)="000010り else
LSRLV when (1えdata_out(31 downto 26) = "000000") and (IR_data_out(5 downto 0) = "000110") else 
1_SUBU when (1R_data_out(31 downto 26) = "000000") 祖d (1R_data_out(5 downto 0) = "100011") else 
1_SW when (IR_data_out(31 downto 26) = "101011") else 
1 XOR1 when (1R_data_out(31 downto 26) = "001110") else 
1_照江T when (1R_data_out(31 downto 26) =勺00000") 祖d (IR_data_out (5 downto 0) = "011000") else 
LM1九百J when (1R_data_out(31 downto 26) = "000000") and (1R_data_out(5 downto 0) = "011001り else
1=D1V when (1R_data_outC31 downto 26) = "000000") and (1R_data_out(5 downto 0) = "011010") else 
1_D1VU when C1R_data_out(31 dovnto 26) =勺00000") and (1R_data_out(5 downto 0) = "011011") else 
L旺百1 when (1R_data_outC31 dovnto 26) = "000000") and (IR_data_out(5 dovnto 0) = "010000") else 
end if; 
if(rreset = '0') then 
end if; 
end if; 
end process CTRL; 
1NTERRUPT: process(clk) 
bel!in 
Îf (clk'event 祖d clk = '1') then 
if(ロeset = '1') then 
interrupt_state <= 0; 
Interrupt_Step <= 1; 
interrupt_name <= INT_reset; 
elsif(int靡rupt_state = 0) then 
if(Ì~t~~~pt_Step = 1) 担d (interrupt_name = 1NT_reset) then 
1nterrupt_Step く= 0; 
interrupt_state く 1; 
elsif(1nt靡rupt_Step = 1) 担d (interrupt_na皿e = 1NT_initO) then 
In七errupt_Step く 0;
interrupt_state <= 1; 
else 
Interrupt_Step く= 1nterrupt_Step + 1; 
end if; 
elsifCinterrupt_state = 1) then 
if (rst='l') orCint = '1' and i凶n = "000") then 
interrupt_state く= 2; 
end if; 
if (rst='l') then 
interrupt_name <= INT_reset; 
elsif (int = '1' and intn = "000") then 
interrupt_name <= IN・I・_initO;
end if; 
else 
if(valid = "00000") then 
1nterrupt_Step <= 1; 
interrupt_state く= 0; 
152 153 
1 MFLO yhen (1R_data_out(31 dOYnto 26) =吋00000") and (1R_data_olユt(5 dOYnto 0) =勺10010") else 
fM百f1 yhen (1R_data_out(31 dOYnto 26) =吋00000") and (IR_data_out(5 dOYnto 0) = "010001") else 
fMT工o yhen (1R_data_out(31 dOYnto 26) =吋00000") and (IR_data_out(5 dOYnto 0) =勺10011") else 
1_BEQ yhen (1R_data_out(31 doYn七o 26) =勺00100") else 
1-BNﾈ yhen (IR_data_out(31 dOYnto 26) = "000101") else 
1 S_ERR: 示:2(35) <= '1' yhen (inst = 1_ADD1) or (inst = 1_ADD1U) or (工nst = 1_BGEZ) or (inst = 1_BGEZAL) or 
(i~~t = 1_BGTZ) or (inst = 1_BLEZ) or (inst = 1_BLTZ) or (土nst = 1_BLTZAL) or 
鑛nst = 1=LB) or (inst = 1_LBU) or (inst = 1_LH) or (inst = I_L肌J) 0工
(inst=I-LW)or(1nst=I-SB)oz(inst=I-SH)or(inst=I-SLTI> oz
(Last =I-SLTIU)or(1nst=I-SM)oz(inst=I-BEQ)or(1nst=I-BNE) 
else 
,'" . 
CYーを (34) く= '1' yhen (inst = 1_J) or (inst = 1_JAL) else 
,^, Lーを (33) く= '1' yhen (inst = 1_ADD) or (inst = 1_ADDU) or (inst = 1_1AND) or (inst = 1_1NOR) or 
(inst=I-IOR)or(Inst=I-ISUB)or(1nst=I-IXOR)or(inst=I-SLL)or 
(inst = 1_SLLV) or (inst = 1_SLT) or (inst = I_SL叩) or (inst = 1_SRA) or 
(ins七 =I-SRAV)or (inst=I-SRL)or(12st=I-SRLV)oz(12st=I-SUBU)or 
(工nst = 1_町H1) or (inst = 1_貯LO) else 
, f、， . 
ふーを (32) く= '1' yhen (inst = 1_ADD) or (inst = 1_ADDU) or (inst = 1_1AND) or (inst = 1_1NOR) or 
(12st=I-IOR)or(inst=I-ISUB)or(12st=I-IXOR)or(1nst=I-SLT> oz
(1nst=LSLRJ)or(inst=LSUBU)oz(ust=LBEQ)or(1nst=LBNE) 
else 
,^' . c~_2(31) <= '1' yhen (inst = 1_SLL) or (ins七= 1_SRA) or (inst = 1_SRL) else 
"、，. 
ciG(30) く= '1' when (inst = 1_J) or (inst = 1_JAL) or (inst = 1_JALR) or (inst = 1_JR) 
else 
;:3(29) く= '1' when (inst = 1_1SUB) or (inst = 1_SLT) or (inst = 1_SLT1) or (山t =リロ1U) or 
(inst = LSL叩)or (last =LSUBU)or(inst=LBEQ> or(inst=LBNE>
else 
，(、， . 
Lーを (28) く= '1' when (inst = 1_1XOR) or (inst = 1_XORI) or (inst = I_ADD) or (inst = I_ADD1) or 
(i;;t-~'1_ADD1U) or (inst = 1_ﾃDDU) or (inst = 1_LB) or (inst = 1_LBU) or 
(ins七= 1_LH) or (inst = I_U町) or (inst = 1_LW) or (inst = 1_SB) or 
(inst = 1_SH) or (inst = I_SW) else 
,^' . ょを (27) く= '1' yhen (inst = 1_SLTU) or (inst = I_ISUB) or (inst = 1_SLT) or (inst = I_SLT1) or 
(i~~t ="1_SLTIU) or (inst = 1_SUBU) or (inst = I_BEQ) or (inst = 1_BNE) or 
(inst = 1_AND1) or (inst = 1_1AND) else 
C::j(26) く= '1' ぬ阻 (inst = 1_18UB) or (inst = I_SLT) or (inst =ロLT1) or (inst = 1_SLT1U) or 
(inst = 1_SUBU) or (inst = 1_BEQ) or (inst = 1_BNE) or (inst = 1_1NO~2. or 
èi~st = 1=BGEZ) or (inst = 1=BGTZ) or (inst = 1_BLEZ) or (inst = 1_ADD) or 
(12st=I-ADDI)or(12st=I-ADDI11)or(12st=I-ADDU)or(inst=I-LB)or 
鑛nst = 1=LBU) or (inst = 1_LH) or (inst = I_LHU) or (inst = 1_LW) or 
(inst = 1_8B) or (inst = 1_SH) or (inst = 1_SW) else 
1 (、 1.
cii(25) < ='13when (inst=IJXOR)or(inst=LXORI)or(12st=LIOR)or(inst=LORI)or 
(inst = 1_INOR) or (inst = I_AND1) or (inst = 1_IAND) else 
'0' : 
cw_2(24) く= '1' yhen (inst = 1_SRA) or (inst = I_SRAV) else 
,^' . cA-4(23) く= '1' when (inst = 1_SRL) or (inst = I_SRLV) or (inst = I_SRA) or (inst = I_SRAV) 
else 
'0' ; 
cw_2(22) <= '1' when (inst = 1_M1九T) else 
'0' ; 
cw_2(21) <= '1' when (inst = 1_D1VU) else 
'0' : 
cw_2(20) <= '1' 油en (inst = 1_BGEZ) or (inst = 1_BGEZAL) else 
'0' ; 
cw_2(19) く= '1' when (inst = 1_BGTZ) else 
'0' , 
cw_2(18) く= '1' when (inst = 1_BLEZ) else 
'0' , 
154 
cy_2(17) <= '1' when (inst = 1_BLTZ) or (inst = 1_BLTZAL) else 
'0' ・
cw_2(16) <= '1' when (inst = 1_BEQ) else 
'0' ; 
cy_2(15) く= '1' when (inst = 1_BNE) else 
'0' , 
cy_2(14) く= '1' yhen (inst = 1_ADD) or (inst = 1_ADD1) or (inst = 1_ADD1U) or (inst = 1_ADDU)οz 
(inst = 1_AND1) or (inst = 1_1AND) or (inst = 1_1NOR) or (inst = 1_10R) or 
(inst = 1_1SUB) or (inst = 1_1XOR) or (inst = 1_OR1) or (inst = 1_SUBU) or 
(inst = 1_XOR1) or (inst = 1_SLL) or (inst = 1_SLLV) or (inst = 1_SRA) or 
(inst = 1_SRAV) or (inst = 1_SRL) or (inst = 1_SRLV) or (inst = 1_SLT1U) or 
(ins七= LSL叩) or (工nst = 1_MFLO) else 
'0' ; 
cy_2 (1 3) く= '1' yhen (inst = I_ADD) or (inst = 1_ADD1) or (inst = 1_ADD1U) or (inst = 1_ADDU) or 
(inst = 1_AND1) or (inst = 1_1AND) or (inst = 1_1NOR) or (inst = 1_10R) or 
(inst = 1_1SUB) or (inst = 1_1XOR) or (inst = 1_OR1) or (inst = 1_SUBU) or 
(inst = 1_XOR1) or (inst = 1_BGEZAL) or (inst = 1_BLTZAL) or (inst = 1_JAL) or 
(inst = 1_JALR) or (inst = 1_SLT1U) or (inst = 1_SL叩) or (inst = 1_町H1)
else 
'0' ; 
cy_2(12) く= '1' when (inst = 1_ADD) or (inst = 1_ADD1) or (inst = 1_ADD1U) or (inst = 1_ADDU) or 
(inst = 1_AND1) or (inst = 1_1AND) or (inst = 1_1NOR) or (inst = 1_10R) or 
(inst = 1_1SUB) or (inst = 1_1XOR) or (inst = 1_OR1) or (inst = 1_SUBU) or 
(inst = 1_XOR1) or (inst = 1_BGEZAL) or (inst = 1_BLTZAL) or (inst = 1_JAL) or 
(inst = 1_JALR) or (inst = 1_SLL) or (inst = 1_SLLV) or (inst = 1_SRA) or 
(inst = 1_SRAV) or (inst = 1_SRL) or (inst = 1_S町.V) or (inst = 1_SLT) or 
(inst = 1_SLT1) else 
'0' , 
cy_2 (1 1) く= '1' yhen (注目= 1_D1V) or (inst = 1_D1VU) else 
'0' , 
cw_2 (1 0) く= '1' when (inst = 1_ffi凡T) or (inst = 1_ffi凡叩) else 
'0' , 
C苛ー2(9) く= '1' 百hen (inst = 1_SB) or (inst = 1_SH) or (inst = 1_SW) else 
'0' , 
cw_2(8) く= '1' when (inst = 1_LB) or (inst = 1_LBU) or (inst = 1_LH) or (inst = 1_U町) or 
(inst = 1_LW) or (inst = 1_SB) or (inst = 1_SH) or (inst = 1_SW) 
else 
'0' , 
cw_2(7) <= '1' when (inst = 1_LW) or (inst = 1_SW) or (inst = 1_LH) or (inst = 工ーL肌1) or 
(inst = 1_SH) else 
'0' , 
cw_2(6) <= '1' when (inst = 1_LW) or (inst = 1_SW) else 
'0' ; 
cw_2(5) く= '1' when (inst = 1_LB) or (inst = 1_LH) or (inst = 1_SB) or (inst = 1_SH) 
else 
'0' ; 
cy_2(4) く= '1' when (inst = 1_LB) or (inst = 1_LBU) or (inst = 1_LH) or (inst = 1_口町) or 
(inst = I_LW) else 
'0' , 
cw_2(3) <= '1' when (inst = 1_ADD) or (inst = 1_ADD1) or (inst = 1_ADD1U) or (inst = 1_ADDU) or 
(inst = I_AND1) or (inst = 1_BGEZAL) or (inst = 1_BLTZAL) or (inst =工_1AND) or 
(inst = 1_1NOR) or (inst = 1_10R) or (inst = 1_1SUB) or (inst = 1_1XOR) or 
(inst = I_JAL) or (inst = 1_JALR) or (inst = 1_LB) or (inst = 1_LBU) or 
(inst = 1_LH) or (inst = I_LHU) or (inst = 1_LU1) or (inst = 1_LW) or 
(inst = 1_OR1) or (inst = 1_SLL) or (inst = 1_S比V) or (inst = 1_SLT) or 
(inst = 1_SLT1) or (inst = 1_SLT1U) or (inst = 1_SL叩) or (inst = 1_SRA) or 
(inst = 1_SRAV) or (inst = 1_SRL) or (inst = 1_SRLV) or (inst = 1_SUBU) or 
(inst = 1_XOR1) or (inst = 1_町田) or (inst = 1_MFLO) else 
'0' ; 
cw_2(2) <= '1' when (inst = 1_MULT) or (inst = 1_阻止TU) or (inst = 1_D1V) or (inst = I_D1VU) or 
(ins七= LMTHI) else 
'0' ; 
cy_2 (1)く= '1' when (inst = 1_阻止T) or (inst = 1_阻止叩) or (inst = I_D1V) or (inst = I_D1VU) or 
(inst = 1_MTLO) else 
'0' ; 
cy_2(0) く= '1' when (inst = 1_BGEZAL) or (inst = 1_BLTZAL) or (inst = 1_JAL) or (inst = 1_JALR) 
else 
'0' , 
lock_3_ctrl_MULO_fin <= '0' when valid(3) = '0' else 
155 
cy_3(10) ; 
l o ck_3_ctrl_DIVO_flag く= '0' when valid(3) = '0' else 
cw_3(11); 
bbranch く= '0' when go(3) = '0' else 
'1' yhen (ALUO_flag(2) = sys2_pO) 祖d cy_3(15) = '1' else 
'1' yhen (ALUO_flag(2) = sys4_pO) 祖d cy_3(16) = '1' else 
'1' yhen (reg20_data_out(31) = sys4_pO) 担d cw_3(17) = '1' else 
'1' yhen (reg20_data_out(31) = sys4_pO or ALUO_flag(2) = sys4_pO) 臼d cw_3(18) = '1' else 
'1' yhen (reg20_data_out(31) = sys2_pO and ALUO_flag(2) = sys2_pO) and cw_3(19) = '1' else 
'1' when (reg20_data_out(31) = sys2_pO) and cw_3(20) = '1' else 
cw_3(30); 
PC_load く= '0' yhen go(3) = '0' else 
'1' when (ALUO_flag(2) = sys2_pO) 担d cy_3(15) = '1' else 
'1' yhen (ALUO_flag(2) = sys4_pO) a且d cy_3(16) = '1' eユse
'1' when (reg20_data_out(31) = sys4_pO) and cy_3(17) = '1' else 
'1' when (reg20_data_out(31) = sys4_pO or ALUO_flag(2) = sys4_pO) 担d cw_3(18) = '1' else 
'1' yhen (reg20_data_out(31) = sys2_pO 回d ALUO_flag(2) = sys2_pO) 祖d cy_3(19) = '1' else 
'1' yhen (reg20_data_out(31) = sys2_pO) and cy_3(20) = '1' else 
cw_3(30); 
PC_reset く= '1' yhen (Interrupt_Step = 1) 臼d ((Interrupt_name = INT_reset)) else 
'0' , 
PC_hold <= '1' yhen go(l) = '0' else 
'0' , 
IR_rst く= '1' yhen (Interrupt_Step = 1) 担d ((Interrupt_name = INT_reset)) else 
'0' , 
IR_enb <= '0' when go(l) = '0' else 
'1' , 
GPR_reset く= '1' yhen (Interrup七_Step = 1) 担d ((Interrupt_name = INT_reset)) else 
'0' , 
GPR_w_enbO く= '0' yhen go(5) = '0' else 
cy_5(3); 
ALUO_cin く= cy_3(29); 
ALUO_ctrl く= cw_3(25) & cy_3(26) & '0' & cy_3(27) & cw_3(28); 
EXTO_ctrl く= cw_2(35); 
DMEM一回 <= '0' when go(4) = '0' else 
cw_4(9); 
D肥M_req く= '0' when go(4) = '0' else 
cw_4(8)j 
DMEM_ac_ctrl く= cw_4(6) & cw_4(7); 
DMEM_ext_ctrl <= cw_4(5); 
SFTO_mode <= cw_3(23) & cw_3(24); 
阻止O_ctrl <= cw_3(22); 
MULO start <= '0' when multi st 3 = '1' else 
cw_3(10); 
HI_rst <= '1' when (Interrupt_Step = 1) and ((Interrupt_name = INT_reset)) else 
'0' ; 
HI_enb く= '0' when go(5) = '0' else 
cw_5(2); 
LO rst <= '1' when (Interrupt_Step = 1) 臼d ((Interrupt_name = INT_reset)) else 
'0' , 
LO_enb く= '0' when go(5) = '0' else 
cw_5(1); 
DIVO_ctrl <= cw_3(21); 
sel10_ctrl く= cw_2(34 downto 34); 
selll_ctrl <= "1" when (reg20_data_out(31) = sys2_pO) and cw_3(20) = '1' else 
"1" when (reg20_data_out(31) = sys2_pO and ALUO_flag(2) = sys2_pO) and cw_3(19) = '1' else 
"1" when (reg20_data_out(31) = sys4_pO or ALUO_flag(2) = sys4_pO) and cw_3(18) = '1' else 
"1" when (reg20_data_out(31) = sys4_pO) and cw_3(17) = '1' else 
"1" when (ALUO_flag(2) = sys4_pO) and cy_3(16) = '1' else 
勺" when (ALUO_flag(2) = sys2_pO) 祖d cw_3(15) = '1' else 
"0"; 
se工 12_ctrl く= cw_2(33 downto 33); 
se113_ctrl く= cw_5(0 downto 0); 
se114_ctrl <= cw_3(12) & cw_3(13) & cw_3(14); 
se115_ctrl く= cw_4(4 downto 4); 
sel16_ctrl <= cw_2(32 downto 32); 
se117_ctrl く= cy_2(31 downto 31); 
se118_ctrl く= cw_3(10) & cw_3(11); 
sel19_ctrl <= cw_3(10) & cw_3(11); 
156 
reg20_enb く= '0' when go(2) = '0  else 
'1' ; 
reg21_enb <= '0' when go(2) = '0' else 
'1' ; 
reg22_enb く= 'Q' when go(1) = '0' else 
'1' , 
reg23_enb く= '0' when go(2) = '0' else 
'1' , 
reg24_enb く= 'Q' when go(3) = '0' else 
'1' ; 
reg25_enb <= '0' when go(2) = '0' else 
'1' , 
reg26_enb <= '0' when go(2) = '0' else 
'1' ; 
reg27_enb く= '0' when go(3) = '0' else 
'1' ; 
reg28_enb <= '0' when go(2) = '0' else 
'1' ; 
reg29_enb く= '0' when go(2) = '0' else 
'1' ; 
reg30_enb く= '0' when go(3) = '0' else 
'1' ; 
reg31_enb <= '0' when go(4) = '0' else 
'1' ; 
reg32_enb <= '0' when go(3) = '0' else 
'1' ; 
reg33_enb <= '0' when go(4) = '0' else 
'1' ; 
reg34_enb <= '0' when go(2) = '0' else 
'1' ; 
reg35_enb く= '0' when go(2) = '0' else 
'1' ; 
reg36_enb く= '0' when go(3) = '0' else 
'1' , 
reg37_enb <= '0' when go(4) = '0' else 
'1' , 
reg38_enb く= '0' when go(3) = '0' else 
'1' ; 
reg39_enb く= '0' when go(4) = '0' else 
'1' ; 
CSW_rst <= '1' when (Interrupt_Step = 1) 担d ((Interrupt_name = INT_reset)) else 
'0' , 
CSW_enb く= '1' when (Interrupt_Step = 1) and ((Interrupt_name = 1町T_initO)) else 
'0' ; 
rreset <= '1' when (rst='l') else '0'; 
PIPE_REG_CTRL: process(clk) 
begin 
if(clk'event and clk = '1') then 
if(go(2) = '1') then 
cw_3 く= cw_2(30 downto 0); 
end if; 
if(go(3) = '1') then 
cw_4 <= cw_3(9 downto 0); 
end if; 
if(go(4) = '1') then 
cw_5 <= cw_4(3 downto 0); 
end if; 
end if; 
end process PIPE_REG_Cτ'RL; 
end behavior; 
157 
List of お1ajor Publications of the 
Author 
J ournal Papers 
[1] Makiko Itoh, Ak仁i比C】hi註ik凶a Shiom凶i ， Jt叩 S仇atω0 ， Yoωshi山i∞ri Tal則
Imai仁: “"Procωes岱soぽr Generation Method for Pi単peli凶ned Processors in Consideraｭ
tion with Pipeline Hazardsピ?Jう" τransactions on Information Processing Society 
of Japan, Vol. 41 , No. 4, pp. 851-862, Apr. 2000ヲ ( injapanese). 
[2] Makiko Itoh, Yoshinori Takeuchi, Masaharu Imai and Akichika Shiomi: “Syn-
thesizable HDL Generation for Pipelined Processors 台om a MicrかOperation
Description," IEICEτransactions on Fundamentals of Electronics, Communiｭ
cations and Computer Sciences, Vol. E83-A, No. 3, pp. 394-400, Mar. 2000. 
International Conference Papers 
[伊問3司] Ma北ki込氷k仁o Ito油h， Shig伊ea叫北ki Hi氾ga北ki ，ヲ Jun Sa叫もoa吋 Aki比chikμaS白hiぬom凶i ， YoωSぬhin
Akira Ki比ta勾jima， Ma邸sal油ha紅IU Ima白紅i仁: “ PEAS-I口II: An AS釘IP Design Env吋lronr立ment ，"
IEEE International Conference on Computer Design: VLSI in Computers & 
Processors, pp. 430-436, Sept. 2000. 
[4] Akira Kitajima, Makiko Itoh, Jun Sato, Akichika Shiomi, Yoshinori Takeuchi 
and Masaharu Imai: “Effectiveness of the ASIP Design System PEAS-II in 
Design of Pipelined Processors," Asia South Pacific Design Automation Conｭ
ference 2001 (ASP心AC 2001) , Jan. 2001 (to appear) 
159 
National Conference Papers 
[5] Makiko Itoh, Yoshinori Takeuchi, Masaharu Imai and Akichika Shiomi: “A 
Synthesizable HDL Generation Method for Pipelined Processor using microｭ
operations," Proc. of 12th Karuizawa Workshop on Circuits and Systems, pp. 
121-126, Apr. 1999 ヲ (injapanese). 
[伊附6司] Mal北kikωo 1toh, YoωSぬhi∞ ri Takeu恥chi ， Ma 
h凶iro Ao句y屯閣a訂m旧a:ゾ‘"Procωes岱soぽr Architecture Description Generation 仕om a Behavｭ
ioral Semantics Description of Ins坑tructions ，円 Proc. of 11th Karuizawa Workｭ
shop on Circuits and Systems, pp. 475-480, Apr. 1998, (in japanese). 
[7] Makiko Itoh, Shigeaki Higaki, Akichika Shiomi, Jun Sato, Yosh匤ori Takeuchi 
and Masaharu 1mai: “A Synthesizable HDL Generation Method for Pipelined 
Processors in Consideration of Pipeline Hazard," Proc. of Design Automation 
Symposiumヲ pp. 201-206 , July 1999ヲ (injapanese). 
[附8司] Ma叫はki氷koお Itoωoh ， y4泊os油hi凶no∞ri T，羽'ake飢叩u山叫1民chi ， Ma部sal凶t
S坑tructiぬon Set Processor Synthesis Method based on Behavioral Semantics Deｭ
S悶C口n刷pμtionα∞" ， 1凹E日1CE Technical Report , VLD 97-89, pp. 77-84, Oct. 1997, (in 
japanese). 
[問9到] Mal北ki比ko 1toωoh ヲ YoωSぬhin
hir叩o Aoyama:ソ"Processor Design Methodology based on a Behavioral Semanｭ
tics Description of 1nstructions," Proc. 1EICE Fall Conf. '97ヲ A-3- 7 ， Sept. 
1997, (in japanese). 
[10] Shige北i Higaki, Makiko 1toh, Jun Sato, Yoshinori Take凶lÌ and Masaharu 
Imai: “Proposal of an HD L Generation Method for Pipeline Processors with 
Out-of-order Completion," IPSJ SIG Notes, 99-SLDM-93 Vo1.99, No.101 , pp. 
71-78, Nov. 1999, (in japanese). 
160 

