



ARCHITECTURE (PISA) ON FPGA by ABDULLAH, ABDUL AZIM
IMPLEMENTATION OF SIMPLESCALAR
PORTABLE INSTRUCTION SET
ARCHITECTURE (PISA) ON FPGA
By
ABDUL AZIM BIN ABDULLAH
FINAL PROJECT REPORT
Submitted to the Electrical & Electronics Engineering Programme
in Partial Fulfillment ofthe Requirements
for the Degree
Bachelor ofEngineering (Hons)







Abdul Azim bin Abdullah, 2006
CERTIFICATION OF APPROVAL
Implementation of SimpleScalar
Portable Instruction Set Architecture (PISA) on FPGA
by
Abdul Azim bin Abdullah
A project dissertation submitted to the
Electrical & Electronics Engineering Programme
Universiti Teknologi PETRONAS
in partial fulfilment of the requirement for the
BACHELOR OF ENGINEERING (Hons)






Bandar Seri Iskandar, 31750 Tronoh





This is to certify that I am responsible for the work submitted in this project, that the
original work is my own except as specified in the references and
acknowledgements, and that the original work contained herein have not been
undertaken or done by unspecified sources or persons.
ABSTRACT
This report describes the current progress of final year project entitled Implementation of
SimpleScalar Portable Instruction Set Architecture (PISA) on FPGA. The objectives of
this study are to learn computer system architecture, to sharpen skill in programming and
debugging a program and to complete study in Universiti Teknologi PETRONAS.
Problem statements will explain the reasons behind of this study was conducted. Firstly,
there are few microprocessors in the market currently can be reconfigurable. Secondly,
there is a need to design a microprocessor which can be used freely for academic
purposes. Thus, in this study, we will focus on the designing of a microprocessor that is
reconfigurable, easily understood and freely available for academicals purposes.
Methodology will describe way on how this project will be carried out. There are three
main steps to be taken which are: 1) Studying the SimpleScalar instruction set
architecture; 2) Programming and simulating by using VHDL programming language 3)
Implementing the SimpleScalar architecture in VHDL and FPGA.
In the Discussion, a detail contents regarding the project will be explained. Contents
included are SimpleScalar's instruction format, register and operation cycle, software and
hardware used in the project, the SimpleScalar implementation in VHDL and VHDL
simulation. The further details will be discussed later.
Finally, this report is concluded in the Conclusion. Recommendations describe the
suggestions that can be done to the current project to improve them in the future.
-1 -
ACKNOWLEDGEMENT
AlhamduliUah, after 1 year, this project has reached its end. Lots of experience and
knowledge were gained throughout the period. There are several individuals, who should
be praised and mentioned here. Without them, this project will not able to be done.
I would like to express the greatest gratitude to Merciful God, Allah S.W.T for His
blessings and mercy, which have helped and guided me in during this project.
My almost gratitude goes to my supervisor, Mr. Lo Hai Hiung. He given me advices,
ideas, suggestions and ensured that this project will be beneficial to both parties. He
supervised me since the first appointment and always kept track of my progress. I really
appreciate all the hard work and the time spent, despite his bundle of workload.
I would also like to thank my Computer System Architecture lecturers, Mr. Patrick
Sebastianand Dr. Yap Vooi Voon. Theywere very helpful in givingtheoretical, guidance
and hands on experience regarding computer architecture. Aside from that, a special
thank you to all FYP series lecturers, who was giving me priceless advices on improving
my skills in the area of researching, writings and presenting. Not to forget, thanks also to
Ms. Siti Hawa who is always supportive.
Last but not least, million thanks to my parents and my fellow colleagues for the all
cooperation and support. The encouragement from the people above will always be






1.1 Background Study 1
1.2 Problem Statements 2
1.3 Objectives 3
2. LITERATURE REVIEW 4
2.1 SimpleScalar Instruction Set Architecture 4
2.1.1 Instruction Set 4
2.1.2 Instruction Set Architecture 5





4.1 Instruction Format 11
4.2 Register 14
4.3 Operation Cycle 16
4.4 SimpleScalar's Operation Cycle 17
4.5 Software 19
4.5.1 Crimson Editor 19
4.5.2 C Compilers 20
4.5.3 GHDL 21
4.5.4 Altera Quartus II Web Edition 22
4.6 Hardware 25
4.6.1 DSP Development Kit Cyclone II 25
- in-






4.8 VHDL Simulation 38
4.8.1 Unsigned Addition 38
4.8.2 OR Operation 39







Figure 1: FPGA Workflow 8
Figure 2: Methodology Steps 9
Figure 3: Instruction Format 11
Figure 4: Instruction Set 13
Figure 5: Register 14
Figure 6: General Operation Cycle 16
Figure 7: SimpleScalar's Operation Cycle 17
Figure 8: SimpleScalar's Operation Cycle (C Language) 18
Figure 9: Crimson Editor 19
Figure 10: Borland C 20
Figure 11: GHDL 21
Figure 12: Altera Quartus II Web Edition 22
Figure 13: Basic Design Flow 23
Figure 14: Altera Cyclone II EP2C35 FPGA 25
Figure 15: SimpleScalar's Operation Cycle (VHDL Implementation) 26
Figure 16: SimpleScalar Instruction (Register & Immediate Format) 27
Figure 17: SimpleScalar Opcodes (Register & Immediate Format) 29
Figure 18: SimpleScalar Immediate Fields (Register Format) 30
Figure 19: SimpleScalar Immediate Fields (Immediate Format) 30
Figure 20: Register Selection 31
Figure 21: Register-Memory 32
Figure 22: 32-bitFull Adder 34
Figure 23: Read Cycle TimingWaveforms 36
Figure 24: Write Cycle Timing Waveforms 37
Figure 25: Unsigned AdditionOperation 38
Figure 26: OR Operation 39






Definitions of SimpleScalarArchitecture Registers
Quartus II Web Edition Device Support









Modern processors are incrediblely complex marvels of engineering that are becoming
increasing hard to evaluate. Simplescalar tool set performs fast, flexible and accurate
simulation for modernprocessors that implement the Simplescalar architecture.
According to D. Burger [1], Simplescalar simulators can emulate the Alpha, PISA, ARM,
and x86 instruction sets. The tool set includes a machine definition infrastructure that
permits most architectural details to beseparated from simulator implementations. All of
the simulators distributed with the current release of Simplescalar can run programs from
any of the above listed instruction sets. Complex instruction setemulation (e.g., x86) can
be implemented with or without microcode, making the Simplescalar tools particularly
useful for modeling CISC instruction sets.
The advantages ofthis tool are flexibility, portability, extensibility and performance. This
tool set is portable, requiring only that the GNU tools may be installed on the host
system. The tool set has been used on multiple platforms such as Linux/x86, Win NT,
SPARC and Solaris. The tool set is easily extensible. The instruction set is designed to
support easy annotation of instructions, without requiring a retargeted compiler for
incremental changes. The instruction definition method along withthe ported GNU tools
makes new simulators easy to write and the old ones even simpler to extend. Finally, the
simulators have been aggressively tuned for performance and can run codes approaching
"real" sizes in tractable amounts of time. [1]
-1-
In this project, I will design a Portable Instruction Set Architecture (PISA)
microprocessor in VHSIC Hardware Description Language (VHDL) and implement it on
FPGA.
The PISA instruction set is a simple MlPS-like instruction set maintained primarily for
instructional use. A GNU GCC-based cross-compiler and pre-built libraries are also
available for this target. The PISA target is particularly useful for computer engineering
instruction as the tools can be built on a wide range of host platforms, including
Linux/x86, Win2000, SPARC Solaris, and others. [1]
1.2. Problem Statements
In the current design of microprocessor, there are few microprocessors which can be
reconfigurable. "Reconfigurable" term means the memory addressing and registers of the
given microprocessor can be adjusted according to the author's preferences. Currently, all
microprocessor available in the market, the function units, memory addressing and
registers are fixed and cannot be reconfigured. Therefore, this project is attempting to
design a microprocessor which is reconfigurable.
Currently, there are a lot of microprocessors designs available today from Intel,
Motorola, SPARC and others. However, not all of them are easy to be understood by
students who justbegin their learning in computer system. In the learning curve, to know
and understand the concept of computer system is by learning from the simplest form of
digital system, logic circuits until the hardest part, which is the memory system.
SimpleScalar, which is based on MIPS, provides an easy and simple architecture for
study. In addition, it is free for academic purposes and open source for development. In
this project, the simplest microprocessors willbe design.
-2-
From studies made, it is found that Simplescalar PISA can be implemented as a
microprocessor. Besides it is free for non-commercial use, it is also reconfigurable and
flexible to all platforms. PISAwhich is like MlPS-like instruction is good architecture for
study, because it is easy to understand.
1.3. Objectives
To implement SimpleScalar PISA in FPGA.
SimpleScalar tool set is used to evaluate modern processors using the SimpleScalar
architecture. However, it is only available in software based. The source code must be
compiled first before it can be executed. Up to date, there is no hardware based
implemented for SimpleScalar PISA. Therefore, in this project, I will implement the
SimpleScalar PISA in hardware called FPGA.
To design and programcircuitsusingVHDL language.
My interest is programming and I have learnt a lot of languages such as C, C++, HTML,
Visual Basic, MATLAB and PHP. I also had experienced in microcontrollers
programming. However, VHDL is one of the programming languages I did not manage
to learn. Therefore, this projectis able to help me to gainnew knowledge and experience
in programming the digital circuit using VHDL language.
To apply and relate computersystemarchitecture.
In the computer system subject, I have learnt digital logic gates, full adder system, basic
computer architecture, register design and memory design. From this project, I hope I
will be able to apply and relate the conceptof computersystem subject learnt.
-3-
CHAPTER 2: LITERATURE REVIEW
This project can be divided into two major partitions which are SimpleScalar Instruction
Set Architecture and Implementation on the VHDL and FPGA.
2.1. SimpleScalar Instruction Set Architecture
2.1.1. Instruction Set
Instruction set is a collection of all operations possible in a machine's language. There
are many types of instructions in a computer system, such as arithmetic instructions, data
movement instructions, control or branch instructions and many more.
In arithmetic instructions, it will accept one or more operands and produce a result.
Besides, it may also set a flag to indicate that the result of the operation was a negative
number. In data movement instructions, it moves data within the machine and to or from
input/output devices. In control or branch instructions, it affects the order in which
instructions are performed, or control the flow of the executing program, much as goto,
for, and function calls do in C. [2]
Every instruction must contain encodings within it to specify the following 4 things,
either explicitly or implicitly:
1. Which operation to perform.
2. Where to find the operand or operands, if there are operands.
3. Where to put the results, if there is a result.
4. Where to find the next instructions.
Source: John L. Hennessy &David A. Patterson, "Computer Architecture: A Quantitative Approach" [2]
-4-




4. Floating point Instruction
Source: DougBurger, ToddM. Austin, "The SimpleScalar Tool Set, Version 2.0"[1]
(Refer to Appendix 1: List ofSimplescalar Instruction Set for more details)
2.1.2. Instruction Set Architecture
Instruction set architecture is the collection of instructions and resources. It includes the
instruction set, the machine's memory andall of the programmer-accessible registers in
the CPU and elsewhere in the machine. [3]
The SimpleScalar architecture canbe divided intoparts:
• Instruction set principles.
• Memory hierarchy and registerdesign.
• 5 stages of pipelining.
• Level 1 and level 2 cache.
Source: Doug Burger, ToddM, Austin, "The SimpleScalar Tool Set, Version 2.0" [1]
-5-
2.2. Implementation on VHDL and FPGA
2.2.1. VHDL
VHDL is an acronym of VHSIC Hardware Description Language. VHSIC is another
acronym which stands for Very High Speed Integrated Circuits.
In digital design, the VHDL language is used for documentation, verification and
synthesis of large digital system. It allows the system can be described in the same code
to achieve these goals at one time, thus saving a lot of effort. [6]
There are three different approaches are used to describe hardware in VHDL. They are
structural, data flow and behavioral methods of hardware description. In the beginning,
the design behaviour is described (modeled) and verified (simulated). By using the
synthesis tools, the design is able to be translated into real hardware (gates and wires). At
this point, they are mapped onto a programmable logic device sucha CPLD or FPGA. [6]
The VHDL standards are developed by IEEE (Institute of Electrical and Electronics
Engineers). Currently, there are two standards widely used, which are VHDL'87 (STD
1076-1987)version and VHDL'93 (adopted in 1994). [6]
-6-
2.2.2. FPGA
FPGA is an acronym which stands for Field Programmable Gate Array. The term of
"Field Programmable" refers to the ability to change the operation of the device, while
"Gate Array" refers to the matrix of logic cell surrounded by a peripheral of I/O cells.
Simply, FPGA are programmable digital logic chips which can be program to do digital
function. [7]
FPGAs come in a wide variety of sizes and many different combinations of internal and
external features from different manufacturers. Although they are different in many
things, they have a common, which is composedof programmable logic blocks. Each of
these blocks contains registers and logic elements, which are arranged in a grid and tied
together using programmable interconnections. [7]
In a typical FPGA, the logic blocks that make up the bulk of the device are based on
lookup tables (of perhaps four or five binary inputs) combinedwith one or two single-bit
registers and additional logic elements such as clock enables and multiplexers. These
basic structures may be replicated many thousands of times to create a large
programmable hardware fabric. [7]
In more complex FPGAs these general-purpose logic blocks are combined with higher-
level arithmetic and control structures, such as multipliers and counters, in support of
common types of applications such as signal processing. In addition, specialized logic
blocks are found at the periphery of the devices that provide programmable input and
output capabilities. [7]
-7-




Download design onto FPGA
''
Run thp TJPriA
Figure 1: FPGA Workflow
First step is to describe the logic function that wants to be developed. Draw schematic or
write program to describe the particular function.
Then, compile the design. The logic function designed is compiledby using the software
provided from FPGA vendor (e.g.: Xilinix ISE, Altera Quartus, Active VHDL and etc).
This will create a binary file that can be downloaded into the FPGA.
The next step is to download the design onto FPGA. Connect cable from the computer to
the FPGA and download the binary file created to the FPGA.
Finally, run the FPGA. If successfully, the FPGA will behave according to the logic
function. If not, repeat the steps again to re-develop.
Source: fpga4fun.com, What are FPGAs? [20]
CHAPTER 3: METHODOLOGY
Studying the Simplescalar Learning VHDL programming







Figure 2: Methodology Steps
Figure 2 above shows the steps I will be taking during implementation of this project.
The first part is to study the Simplescalar Instruction Set Architecture. This involves
understanding the source code given, what instruction sets are to be used, how to set the
memory addressing and registers and many more. Besides that, I also will have to
simulate the microprocessors by using the tools given in order to help me to understand
how it works.
Parallel with the Simplescalar architecture studies, I will have to learn the VHDL
programming language. This requires understanding of the digital system design
concepts, writing the source codes and doing some programming exercises given in the
books. The software I will be using in VHDL programming is Altera Quartus II software.
Then, I will have to implement the Simplescalar microprocessor in VHDL. This step
requires me to convert from the source code given and implement it by using VHDL
programming languages I have learnt. This step requires a lot of programming and
debugging the program.
Final step of this project is to implement the Simplescalar microprocessor which has been
designed by using VHDL on the FPGA. This step requires a lot of programming,
debugging the program and troubleshooting the hardware.
The schedule of this project during Final Year Project I and II can be referred to Planning




The format of an instruction is usually depicted by a rectangular box symbolizing the bits
of the instruction, as they appear in memory words or in a control register. The bits are
divided into groups or parts called fields. Each field is assigned a specific item, such as
the operation code, a constant value, or a register file address. The various fields specify











16-opcode 8-rs 8-rt 8-rd 8-ru/shamt
32 31




Figure 3: Instruction Format
Source: Doug Burger, ToddM. Austin, "The SimpleScalar Tool Set, Version 2.0" [1]
-11-
The three instruction formats for the SimpleScalar are illustrated in the Figure 3.
SimpleScalar architecture is derived from MIPS-IV instruction set architecture.
Therefore, it has same instruction set as MIPS-IV. All instructions are 64 bits in length.
The instructions can be divided into three formats: register, immediate andjump. [1]
The register format is used for computational instructions. The immediate format
supports the inclusion of a 16-bit constant. The jump format support specification of 24-
bit jump targets. The register fields are all 8 bits, to support extension of the architectured
registers to 256 integers and floating point register. Each instruction format has a fixed-
location, 16-bit opcode field that facilitates fast instruction encodings. [1]
8 bits, 2A8 = 256 integers from 00000000 to 11111111
The bits are divided into groups or parts called fields. Each field is assigned a specific
item, such as operation code, a constant value or a register file address.
The operation code of an instruction, often shortened to "opcode", is a group of bits in the
instruction format. This determines which operations to be conducted by the processor.
The operation of instruction is differentiate by using opcode. For example, the opcode for
ADD instruction is 0x40 or 01000000 while the opcode for SUB instruction is 0x44 or
01000100. In SimpleScalar, the opcode is in hexadecimal. However, the opcodes of all
instructions are 8 bits. The instruction format for the opcode is 16 bits. Therefore, the
remaining 8 bits must be filled by using either zero fill or sign extension. In this
architecture, zero-fill is specified for the operand. [1]
Constant value is the immediate value available in the instruction. In SimpleScalar, the
value supported for the immediate value is from 0 to 65536. [1]
For full instructions, please refer to Appendix 3: SimpleScalar Instructions for.
-12-
File pisa.def defines all aspects of the Simplescalar instruction set architecture. Each
instruction set in the architecture has a DEFINST macro call. Here, shows example on



















Figure 4: Instruction Set
Source: ToddM. Austin, "SimpleScalar Hacker's Guide" [8]
Figure 4 shows on how the instruction set is defined in the pisa.def. The instruction is
ADD arithmetic operation. The operationwill involve:
1. Reading values from general purpose register of RS and general purpose register of
RT.
2. Doing the operation, adding between general purpose register of RS and general
purpose register of RT.
3. Writing (Storing) the results in the general purpose register of RD.
The opcode of this instruction is 0x40 in hexadecimal or 01000000 in binary. Different
operation will use different opcode. Since the instruction is arithmetic operation between
integers, therefore the functional unit requirement is IntALU. This operation also has
helper function which is available to assist in the construction of instruction expression.
OVER(GPR(RS), GPR(RT)) function is an overflow checking. This will check whether
the results of the operation given have overflow or not. If overflow has occured, a
function DECLARE_FAULT(md_fault_overflow) will be called. [8]
-13
4.2. Register
This module implements the SimpleScalar architected register state, which includes
integer and floating point registers and miscellaneous registers. The architected register
state is as follows:
Integer Register File:
(aka general-purpose registers, GPR's]
+ +


















Floating point Register File:
single-precision: double-precision:
+ + + +
| SfO ! Sfl (for double) | | FCC















| Mult/Div HI val
•+




Source: SimpleScalar SourceCode (regs.h) [10]
The floating point register file can be viewed as either 32 single-precision (32-bit IEEE
format) floating point values $ft) to $f31, or as 16 double-precision (64-bit IEEE format)
floating point values $f0 to $f31. [10]
-14-
Table below shows the definitions of SimpleScalar architecture register.
Hardware Name Software Name Description
$0 $zero zero-valued source/sink
$1 Sat reserved by assembler
$2-$3 $v0-$vl fn return result regs
$4-$7 Sa0-$a3 fn argument value regs
$8-$15 $t0-$t7 temp regs, caller saved
$16-$23 $s0-$s7 saved regs, callee saved
$24-$25 $t8-$t9 temp regs, caller saved
$26-$27 $k0-$kl reserved by OS
$28 $gP global pointer
$29 $sp stack pointer
$30 $s8 saved regs, caller saved
$31 $ra return address reg
Shi $hi high result register
$lo Sio low result register
$f0-$f31 $fD-$f31 floating point registers
$fcc $fcc floating point condition code
Table 1: Definitions of SimpleScalar architecture registers
Source: DougBurger, ToddM. Austin, "The SimpleScalar Tool Set, Version 2.0" [1]
These registers defined in SimpleScalar architecture with their hardware name, software
name and description. Take note, the registers used by the SimpleScalar is the same with
MIPS IV ISA. [1]
-15-
4.3. Operation Cycle
The basic operation cycle of a computer is controlled by a control unit that puts into the
following steps:
Step 1: Fetch the instruction from memory into a control register
Step 2: Decode the instruction
Step 3: Locate the operands used by the instruction
Step 4: Fetch operands from memory (if necessary)
Step 5: Execute the operation in processor register
Step 6: Store the results in the proper locations
Step 7: Repeat Step 1 with next instruction
Figure 6: General Operation Cycle
Source: M. Morris Mano, Charles R. Kime, "Logic and Computer Design Fundamentals "[9]
There is a register in the computer called the Program Counter (PC) that keeps track of
the instructions in the program stored in the memory. The PC holds the address of the
instruction to be executed next and is incremented by one each time a word is read from
the program in memory. The decoding done in the Step 2 determines the operation to be
performed and the addressing mode of the instruction. The operands in Step 3 are located
from the addressing mode and the address field of the instruction. The computer executes
the instruction, storing the results and returns to Step 1 to fetch the next instruction in
sequences. [9]
-16-
4.4. SimpleScalar's Operation Cycle
1 r






i ' 1 r
I-Cache D-Cache
Figure 7: SimpleScalar's Operation Cycle
Source: Doug Burger, ToddM. Austin, "The SimpleScalar Tool Set, Version 2.0" [1]
Figure 7 shows the operation cycle of SimpleScalar processors. The concept of
SimpleScalar's operation cycle has similar to the general operation cycle we have
discussed before. The only different is the term used in Dispatch process, Scheduler
process and Writeback process. However, their purpose is the same as the general
operation cycle. There are 6 cycles of SimpleScalar processors, which are Fetch,











Figure 8: SimpleScalar's Operation Cycle (C Language)
Source: SimpleScalarSource Code (sim-outorder.c) [10]
Figure 8 shows the C language implementation of the SimpleScalar's operation cycle. In
sim-outorder.c, this operation cycle is implemented as pipelining. It is implemented
reversely from Commit to Fetch. According to Doug Burger, this will eliminate this/next
state synchronization and relaxation problems. [1]
-ll
4.5. Software
For this project to be completed and successful, I have used software for development.
There are editors, compilers, synthesizers and simulators software. The development
platform of this project will be under Windows XP operating system.
4.5.1. Crimson Editor
;DE£-<6tf|Baia[&; ft fetaj^- n Mfttf ftinsflSPIrr ^3=1 • » •!? &\
« uffteck^vM [<J Bbtanwyhi[6 i^ratWhvhi | A EvvvJwHxACh.riii1 \ 0 ***wcK*-r*™y<rfrf ] * iBnQMukvhd] * MLcorf |
package aElttliKtie Is
—
prrteduee iTnpl add lln&I, ici2 r m ki.it_vm;tar; mtr : nut bic_7*:[iicrj;
end flrit&oecter





fai^_vcctor(; =0 tn«'ic:.'j_i;i j
-
far IndM In spr'-i'v"-v^ rn~~'' leap
cln :~ cot;
•p=(tndeKj :* opl[lBflsiri tit oplfindcx) «: cici;
cor :- Uocl(ind=t] «« cp: (index}! o- [cin ia3 j=pl(l=flc.! no: so: (index) J) 1;
inrr i- cor:
nna lapl add;






avicur of sinpiejcaLbi is
jj
133 i tswJMis"-'''H^pEcicci|BvB
Figure 9: Crimson Editor
Crimson Editor is a professional source code editor for Windows platform. It can be
downloaded free from the Internet at http://www.crimsoneditor.com/.
This software supports many programming languages such as HTML, C/C++, Perl, Java
and even VHDL. One features of this editor is it enables syntax highlighting of all




Pfto Edt Swch TO* ftWtt Mpt TDd Mug OnBoa WITdM H#
dlSTjCHBl.
^include ;cuniu.h>
typndrrf aigw.il 1*1 ayord_
typ«dof float ofloatt;
typed** daihle dfJost_t?
typedel feBdtqKed lit Hard t;
typadef word t md sddr t;
ty&edcf lut *1wrt~hal.,50;











md ijpr c. cpq A;
ind fpr t reg F;
mTctcT_t teg_C;
jnd_Addr t reg PC;
md__e[ldc_t tegJJPCr
I SlXtMJ&l » I
// ?2 Jjita, 4 byte
Figure 10: Borland C
I have used two C compilers, which are Borland C and Microsoft Visual Studio. Both
programs can be usedto edit, view and compile a C source codes. However, I will not be
using this program to compile the SimpleScalar source codes. Rather than, these
programs are used to check and test the SimpleScalar source codes. These involving





GHDL is one of software I had used in this final year project. GHDL is a VHDL
simulator, using the GCC technology and implementing the VHDL language according to
the VHDL 1987 (IEEE 1076-1987) and VHDL 1993 (IEEE 1076-1993) standards. With
GHDL, the program and designed written in VHDL can be compiled into executable
files. With the binary files created from compilation, the design can be simulated. [12]
GHDL is an open source project and is free under GNU General Public License. Under
the GNU license, this software can be redistributed and modified. It is free from
restriction and license issues that arise with commercial simulators. Currently, there are
two processors which are successfully compiled andrun by using GHDL. There are DLX
processors and LEON1 SPARC processors. [12]
However, it has disadvantage over the commercial simulator software. The design
created does not be able to synthesis. It cannot translate the design into netlist and not be
able to transfer onto FPGA. [12]
-21-
4.5.4. Altera Quartus II Web Edition




E SIMPS vhd Ig fflltfi^M | g *co**^ Ifi MoJavbJ |^ C«¥i*«Hap- Ugl SIMPS wJ |jfrSMLbhwHiaoM | B riwg<*d
libcifY inee;
uac ieee.3td_legic_ilM.aH;





X i in srd logic;




poctliiVBta s cue bit vector (31 dawned Of j
inscb : cue bit~vector (31 do-trca 0|?
PCin : In biejnsMoe (7 dovnta OJ;
cluck, cesec : in 9td_loglc):
end component
component decode
potMiMCa s 4.M bit_yeetoE(31 cSonntQ Q|;
iMfli : in hit vector 131 daanca 0) j
opcode : ouu blt_vectcir [IS doimto Q);
radatn bus ; out blc vector[31 doanto 0);
.S^im/l!?!*™™^ EtfiaMo A.lnto^Wa^A_Q!^y*i,g j^jiroi ASi4»«**wlJ
Figure 12: Altera Quartus II Web Edition
Themain software for VHDL development will be Quartus II Web Edition. This software
can be obtained free from Alterasite, http://www.altera.com/. A license is required and it
can be enquired freely at the particular website. This software supports Cyclone II of















Table 2: Quartus II Web Edition Device Support








Figure 13: Basic Design Flow
Source: Altera, Quartus IISoftware BasicDesignFlow[13]
Figure 13 shows the basic design flow for the Quartus II software. The users can set up
project and compile the design by using these steps. Altera defines 6 stages of developing
the VHDL. [13]
The first stage is creating new project. At this stage involves declaration of entity or
component, design files and libraries used in the project, and the device family and
package used by the project. Next is making assignments. This stage requires specifying
global maximum operating frequency requirements (fMAX), paths should notbereported
in timing analysis reports and others. [13]
The next step will be compile design and analyzed the results. Before the project can be
simulated andimplemented, the project must be verified first. Here, each syntax of entity,
component and architecture developed are checked. After compiling the design, a report
summary of compiled results will be shown automatically. This report shows all the
place-and-route results details andit is linked to many other software features. [13]
-23-
At the same time, if the results are not satisfied and we want to improve the current
results, it can be changed by using assignment settings assignment editors) or by
changing timing requirements in the Timing Wizard. Then, the design is compiled again
and the results are analyzed. By default, the software will automatically assign pins to the
top-level I/O signals. It also can be done by manual using the Assignment Editor. [13]
-24-
4.6. Hardware
4.6.1. DSP Development Kit Cyclone II
Figure 14: Altera Cyclone IIEP2C35 FPGA
Figure 14 shows the hardware that I will be using to implement the SimpleScalar
processor on it. The hardware is Cyclone IIEP2C35 FPGA. An overviewis summarized:
• Logic Elements: 33,216
• M4K RAM Blocks (4 kbits + 512 Parity Bits): 105
• Total RAM Bits: 483,840
• Embedded 18x18 Multipliers: 35
• PLLs: 4
• Maximum User I/O Pins: 475
• Differential Channels: 205
Source: Cyclone IIFPGA Family Overview [13]
-25-
4.7. SimpleScalar in VHDL
In this section, I will describe the VHDL implementation of SimpleScalar processors. In
the implementation, the processors are divided into five cycles, which are Fetch, Decode,
Execute and Memory.
Fetch —• Decode —• Control —• Execute —> Memory
Figure 15: SimpleScalar's Operation Cycle (VHDL Implementation)
4.7.1. Fetch
In the Fetch cycle, the stored instructions are bring out from the memory and send to the
bus line. Then, these instructions will be decoded in the Decode cycle.
constant memO : bit_vector(31 downto 0) := B"00000000000000000000000000000000"; ~ no operation
constant meml : bit^vector(31 downto 0) :=B"00000000000000000000000000000000";
constant mem2 : bit_vector(31 downto 0) :=B"00000000000000000000000001010000"; - load opcodes
constant mem3 : bit_vector(31 downto 0) := B"00000001000000010000000000000111";
constant mem20: bit_vector(31 downto 0) :=B"00000000000000000000000001010101"; - shift left
constant mem21: bit_vector(31 downto 0) :=B"00000001000000100000101000000001";
constant mem22: bit_vector(31 downto 0) :=B"000000000000000000000000010101ir'; ~ shift right
constantmem23: bit_vector(31 downto 0) := B"00000001000000100000101000000001";
Source:fete h.vhd[AppendixIV]
From the source code above, memO to mem23 represents the stored instructions. Each
instruction is 32 bit width. For ease of simplification, all the instructions are stored at the
specific memory location. Instruction 00000000000000000000000000000000 is stored at
memO, another instruction 00000001000000010000000000000111 is stored at mem3 and
consequently.
-26-
The SimpleScalar instructions can be divided into two sections as defined in the C
languages, which are SimpleScalar opcodes and SimpleScalar unsigned immediate fields.
Each section is an unsigned word data type and has 32 bit width.
typedef struct {
word_ta; /* simplescalar opcode (must be unsigned) */
word_t b; /* simplescalar unsigned immediate fields */
} md_inst_t;
Source: SimpleScalar SourceCode (pisa.h) [10]
Register & Immediate format
63 32 31
SimpleScalar opcodes SimpleScalar unsigned
immediate fields
Figure 16: SimpleScalar Instruction (Register & Immediate Format)
Source: SimpleScalar Tools Set [1]
SimpleScalar unsigned immediate fields are 32-bit from bit 0 till bit 31 of SimpleScalar
instructions and SimpleScalar opcodes are also 32-bit from bit 32 till bit 63. The
instructions are fetched from the memory accordingly to the program counter by using
function MD_FETCH_INST. (See the source code below)
^define MD_FETCH_INST(INST, MEM, PC) \
{ inst.a = MEM„READ_WORD(mem, (PC)); \
inst.b = MEM_READ_WORD(mem, (PC) + sizeof(word_t)); }
MD_FETCH_INST(inst, mem, regs.regs_NPC);

















Source code above shows on how the instructions are fetched using VHDL language.
When the PC is X"00", the instruction at memO will be sent to insta (as SimpleScalar
opcode) and another instruction at meml willbe sentto instb(as SimpleScalar immediate
fields).
entity fetch is
port(insta : out bit_vector(31 downto 0); ~ insta
instb : out bit_vector(31 downto0); - inst.b
PCin : in bit_vector(7 downto 0); -- PC
clock, reset: in std_logic);
end fetch;
Source: fetch,vhd[Appendix IV]
Infetch.vhd, there are three inputs, which are the PC (program counter), clock and reset
and two outputs, which are insta and instb. insta is the SimpleScalar opcodes and instb is
the SimpleScalar immediate fields. These outputs will be the inputs duringDecodecycle.
(Refer to Appendix IV: fetch.vhdfor the source code)
-28-
4.7.2. Decode
In Decode cycle, the fetched instructions will be translated into specific fields, which are
OP (opcode), RS (register source #1), RT (register source #2), RD (register destination)
and IMM (immediate value). The implementation of Decode cycle in C language can be
seen as follows:
/* returns the opcode field value of SimpleScalar instruction INST*/
#define MDJ)PFIELD(INST) (INST.a & Oxff)
#define MD^SET_OPCODE(OP, INST) ((OP) = ((INST).a & Oxff))
/* integer register specifiers */
#undef RS /* defined in /usr/include/sys/syscall.h on HPUX boxes */
#define RS (inst.b » 24) /* reg source #1 */
#define RT ((inst.b» 16) & Oxft) /* reg source#2 */
#defme RD ((inst.b » 8) & Oxff) /* reg dest */
Source: SimpleScalar Source Code (pisa.h) [10]











Figure 17: SimpleScalarOpcodes (Register & Immediate Format)
Source: SimpleScalar Tools Set[1]
Figure 17 shows the SimpleScalar opcodes for Register and Immediate format. In the
SimpleScalar opcodes, there will be two fields, which are annote and opcode. In the
annote field, the SimpleScalar allows new instructions to be added or implemented into
the current instruction set. The length of this field is 16 bit. In the opcode field, the
SimpleScalar operation codes are defined. In other words, any operations of the
instructions to be executing will be depending to this field. For example, in SimpleScalar,
-29-
the addition between two registers will happen when the opcode is 0x40. If the
instruction fetched having the opcode of 0x40 in this field, the addition will be executed.












8-rs 8-rt 8-rd 8-ru/shamt
SimpleScalar immediate fields
Figure 18: SimpleScalar Immediate Fields (Register Format)
Immediate format
SimpleScalar immediate fields
Figure 19: SimpleScalar Immediate Fields (Immediate Format)
Figure 18 and 19 shows the SimpleScalar Immediate Fields. For Register format, there
are 4 fields and for Immediate format, there are only 3 fields. In Register format, there
are 8 bit register source #1 (RS), 8 bit register source #2 (RT), 8 bit register destination
(RD) and 8 bit register shift arithmetic (RU/SHAMT). In Immediate format, there are 8
bit register source (RS), 16 bit immediate value (IMM) and 8 bit register destination
(RT).
(Refer to Appendix IV: decode.vhd for the sourcecode)
-30-
4.7.3. Control
The purpose of Control cycle is to control the movement of data between the register and
the memory.
ra_bus : out bit_vector(31 downto 0);




For register design, there are two outputs, which are reg_wrt and reg_dst. reg_wrt is
required to control the writing process onto the register while reg_dst is required to select





Figure 20: Register Selection
wra_bus <- rt when reg_dst-1' else rs;
Source: decode.vhd[Appendix IV]
Figure 20 shows the register selection. The purpose is to select which registers, either RS








regOwr <=T when ((wra_bus = "00000000") and(reg_jvrt=T)) else '0';
reglwr <=T when ((wra_bus = "00000001") and(reg_wrt=T)) else '0';
reg2wr <= 'V when ((wra_bus = "00000010") and (reg_wrt=T)) else '0';
muxreg(O) <= reg(O) when regOwr='0' elsewrd_bus;
muxreg(l) <= reg(l) when reglwr-0' elsewrdjros;
muxreg(2) <=reg(2) when reg2wr-0' elsewrd_bus;
Source: decode.vhd[Appendix IV]
This design will select the register output between the intermediate register and the
memory. Immediate register is the current register during operation and labeled as reg(n),
where n is between 0 to 31.
(Refer toAppendix IV: control.vhd for the source code)
-32-
4.7.4. Execute
In Execute cycle, the operation of an instruction will be carry out. The selection of which
operations will be executing is done by opcode field. In this section, only integer
instructions are implemented. Integer instruction
In arithmetic, the operation involves unsigned addition without overflow checking,
unsigned subtraction without overflow checking and unsigned multiplication without
overflow checking. In logical, the operation involves AND-operation, OR-operation,
XOR-operation and NOR-operation. Other operations are shift arithmetic left and shift
arithmetic right. Here, an unsigned addition will be explained.
Full adder is a combinational circuit that performs the arithmetic addition of three inputs
and produces two outputs. Two of the inputs are two bits to be added while another input
is the carry bit from previous adder (if any). Three inputs are denoted by A, B and Cjn.
Two outputs are needed and denoted by S and Cout. [9]




0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Table 3: Full Adder Truth Table
Source: M. MorrisMano, CharlesR. Kime, "Logicand Computer Design Fundamentals"[9]
-33-
The simplified sum-of-product functions of two outputs are:
S = ABCin + ABCin +ABCin +ABC
C = AB + BC + AC
Source: M. Morris Mano, Charles R. Kime, "Logic andComputer Design Fundamentals" [9]
This implementation requires seven AND gates and 2 OR gates. However, the operation
can be simplified intothe simplest form which is can be expressed as:
S = (AeB)0Cin
C = AB + Cin(A0B)
Source: M. Morris Mano, Charles R. Kime, "Logic andComputer DesignFundamentals "[9]
B 31 A31 B30 A30









c out S31 >30
B! Ai B0 A0





' ' ' r
So
Figure 22: 32 bit Full Adder
Figure 22shows the visual aid of 32-bit full adder implemented.
result(index) := opl(index) xor op2(index) xor carry;
carry := (opl(index) and op2(index)) or(carry and (opl(index) xor op2(index)));
Source: execute.vhd[Appendix IV]
(Refer toAppendix IV: execute.vhd for the source code)
-34-
4.7.5. Memory
In Memory cycle, the operations which memory always performs are writing andreading.
Writing is when the data is transfer into the memory to be stored. Reading is when the
data stored is retrieved out from the memory.
rd_bus stands for read data from the memory, ra_bus is the read address from bus line,
wd_bus write data to the memory and wa__bus is the write address to the bus line. rd_bus
acts as output while ra_bus. wd_bus and wa_bus acts as inputs to the memory.vhd.
mem_wrt, mem_red, mem_reg are the inputs from the Control cycle.
rd_bus : outbit_vector(31 downto 0);
ra_bus : in bit_vector(31 downto 0);
wd_bus : in bit_vector(31 downto 0);
wa_bus : in bit_vector(31 downto 0);
mem_wrt: in stdlogic;
mem_red : in stdlogic;
mem_reg : in std_logic;
clock, reset: in std logic;
Source: memory.vhd[Appendix IV]
In this project, the implementation of SimpleScalar memory is not successfully. By part,
the data is managed to be read from the memory and store into the given memory
location. However, during the Execute cycle, the data is unable to retrieve back.
Given example, a data of 32 bit of X"00001010" is stored at memory location addressing
X"00000010". During execution, the data X"00001010" is unable to be retrieve. A
further work can be done to investigate this error.
In this section, the information provided the general how the instructions are read from
and store into the memory.
-35-
Read




Data Input Data Input
Figure 23: Read Cycle Timing Waveforms
Source: M. Morris Mano, Charles R. Kime, "Logic and Computer Design Fundamentals "[9]
Figure 23 shows the readcycle timing waveforms of general memory design. Steps taken
for read operation:
1. Apply the binary address of the desired word intoaddress lines.
2. Active the Read input.
Source: M. Morris Mano, Charles R. Kime, "Logic andComputer DesignFundamentals "[9]
-36-
Write
Clock / Tl \ / T2 \ / T3 \ / T4 \ / Tl
Address V Address Input
Memory Enable
Read/Write
Data Input V Data Input
Figure 24: Write Cycle Timing Waveform
Source: M. Morris Mano, Charles R. Kime, "Logic andComputer Design Fundamentals "[9]
Figure 24 shows the write cycle timing waveforms of general memory design. The steps
that must be taken for a write operation:
1. Apply the binary address of the desired word into address lines.
2. Apply the data bits that must be storedin memory to the data input lines.
3. Active the Write input.
Source: M. Morris Mano, Charles R. Kime, "Logic andComputer Design Fundamentals "[9]
(Refer to Appendix IV: memory.vhd for the source code)
-37-
4.8. VHDL Simulation
From the implementation in VHDL, only integer instructions were implemented. They
are unsigned addition, unsigned subtraction, unsigned multiplication, AND-operation,
OR-operation, XOR-operation, NOR-operation, shift left logical andshift right logical. In
the VHDL simulation, assumptions have been made:
1. Only functional are tested.




Figure 25: Unsigned Addition Operation
Figure 25 show the result of an addition operation. The operation does not require











Figure 26: OR Operation







The result of OxOOOOlOlF will be stored at register RD.
4.8.3. Shift Right Logical
Figure 27: Shift Right Logical
Figure 27 show the shift right logical operation. The operation:
OxOOOOlOOF » 1 00000000000000000001000000001111 » 1
0x00000807 00000000000000000000100000000111
The result of 0x00000807 will be stored at register RD.
-39-
CHAPTER 5: CONCLUSION
From the project that will be done, I hope I will be able to fulfill objectives as described
above. This project is well-done and is only able to be functional in VHDL simulation.
However, this project is not yet completed within timeframe given. There were several
reasons contributing to this cause. They were development progress and synthesizable
problem.
Most of the development in this project spent on studying the C source code of
SimpleScalar and the VHDL programming. A lot of exercises and examples done in C
and VHDL programming before startedthe project. The lack of source codes available in
the Internet makes the project has to be started from scratch. Thus, it takes longer than
expected.
Another reason contributes to the project is the VHDL implemented are not
synthesizable. During the project, I found a DLX source code, which is similar to MIPS
architecture. I had developed the SimpleScalar architecture on it. However, when I tried
to compile the source code, it was not synthesizable. Before downloading onto FPGA, it
requires the source code to be synthesized first. When it comes to this, the project
schedule is delayed.
Only integer instructions were implemented. They are unsigned addition, unsigned
subtraction, unsigned multiplication, AND-operation, OR-operation, XOR-operation,
NOR-operation, shift left logical and shift right logical.
In this project, the implementation on FPGA was unsuccessful. The code developed is
able to be downloaded on the FPGA. However, when I tried to run the FPGA, the board
does not working as expected.
-40-
CHAPTER 6: RECOMMENDATIONS
Redesign the Control Module
In this project, I have implemented the Control module which is between the Decode and
Execute modules. The purpose of this module is to control the data movement between
register and memory. However, in this project, this module is not working perfectly.
Therefore, for future works, I recommend to redesign the Control module.
Implement Other Instructions
In this project, only integer instructions were implemented. Others instructions such as
control instructions, load and store instructions, and floating point instructions are not
implemented yet. In the future, I recommend implementing other types of instructions.
Program on FPGA
In this project, the program on FPGA was unsuccessful. The current source code is
divided into 5 architectures for ease of use. Each module has own purposes as described
earlier. In the future, to program on the FPGA, I recommend to test and program each




[I] DougBurger,Todd M. Austin, "The SimpleScalar Tool Set, Version2.0"
[2] John L. Hennessy & David A. Patterson, "ComputerArchitecture: A Quantitative
Approach", 2003, ThirdEdition, MorganKaufmannPublishers.
[3] VincentP. Heuring& HarryF. Jordan, "Computer Systems Design and Architecture",
2004, Second Edition, Pearson Prentice Hall.
[4] SimpleScalar LLC, http://www.simplescalar.com/
[5] SimpleScalar Tools Home Page, http://www.cs.wisc.edu/~mscalar/simt)lescalar.html
[6] VHDLTutorial, http://www.gmvhdl.com/introduc.h1m/
[7] FPGA as Computing Platform,
http://www.informit.com/articles/article.asp?p=382614&rl=l
[8] Todd M. Austin, "SimpleScalar Hacker's Guide"
[9] M. Morris Mano, Charles R. Kime, "Logic and Computer Design Fundamentals",
2004, Third Edition, Pearson Prentice Hall.
[10] SimpeScalar Source Code
[II] Homepage of Crimson Editor, http://www.crimsoneditor.com/
[12] GHDL home page, http://ghdl.free.fr/
[13] Altera, http://www.altera.com/
[14] Cyclone II FPGA Overview,
http://www.altera.com/products/devices/cvclone2/overview/cy2-overview.html
[15] Charles Price, "MIPS Instruction Set", Revision3.2, September 2005
[16] Todd Austin, Eric Larson, Dan Ernst, "SimpleScalar: An Infrastructure for
ComputerSystem Modeling", February 2002, IEEE Journal
[17] Douglas L. Perry, "VHDL Programming by Example", 2002, Fourth Edition,
McGraw Hill
[18] H.M. Deitel, P.J. Deitel, "C HowTo Program", 2001, ThirdEdition, Prentice Hall
[19] TheHamburg VHDL Archive, http://tams-www.informatik.uni-hamburg.de/vhdl/





List of Simplescalar Instruction Set
-44-
Control j -jump blez - branch <= 0
jal-jump and link bgtz - branch > 0
jr -jump register bltz - branch < 0
jalr -jump and link register bgez - branch >= 0
beq - branch == 0 bet-branch FCC TRUE
bne - branch != 0 bef- branch FCC FALSE
Load/Store lb - load byte l.d - load double-precision FP
Ibu ~ load byte unsigned sb - store byte
Ih-load half (short) sbu - store byte unsigned
lhu- load half unsigned sw - store word
lw - load word dsw - store double word
dlw - load double word s.s - store single-precision FP
l.s - load single-precision FP s.d - store double-precision FP
Integer add - integer add or - logical OR
Arithmetic addu - integer add unsigned xor - logical XOR
sub - integer subtract nor - logical NOR
subu - integer subtract unsigned sll - shift left logical
mult - integer multiply srl - shift right logical
multu - integer multiply unsigned sra - shift right arithmetic
div - integer divide sit - shift less than
divu - integer divide unsigned situ - shift less than unsigned
and - logical AND
Floating add.s - single-precision (SP) add abs.d - DP absolute value
Point add.d - double-precision (DP) add neg.s - SP negation
Arithmetic
sub.s - SP subtract neg.d - DP negation
sub.d - DP subtract sqrt.s - SP square root
mult.s - SP multiply sqrt.d - DP square root
multd - DP multiply cvt- int, single, double conversion
div.s - SP divide c.s - SP compare
div.d - DP divide c.d- DP compare





Final Year Project I
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 FYP Titles
FYP Briefing fail





Interim Report First Draft





Installing the Programs Required
Learning the Programs Required
Learning the VHDL
Programming the VHDL




Final Year Project II




Interim Report First Draft

























































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































iate: December 09, 2006 SIMPS.vhd
— Abdul Azim bin Abdullah




use ieee.std logic arith.all;
— SIMPS entity
entity SIMPS is
port(clock, reset : in std_logic;
PC_bus : in bit_vector(7 downto 0);
inst_out : out bit_vector(31 downto 0));
end SIMPS;
— SIMPS architecture
architecture structure of SIMPS is
component fetch
port(insta : out bit_vector(31 downto 0);
instb : out bit_vector(31 downto 0);
PCin : in bit_vector(7 downto 0);
clock, reset : in std_logic);
end component;
component decode
port(insta : in bit_vector(31 downto 0};
instb : in bit_vector(31 downto 0);
opcode : out bit_vector{15 downto 0);
rsdata_bus : out bit_vector(31 downto 0;
rtdata_bus : out bit_vector(31 downto 0;
rddata_bus : out bit_vector(31 downto 0;
extend : out bit_vector(31 downto 0);
wrd_bus : in bit_vector(31 downto 0);
reg_wrt : in std_logic;
reg_dst : in std_logic;
clock, reset : in std_logic);
end component;
component control
port(PCin : in bit_vector(7 downto 0);
ra_bus : out bit_vector(31 downto 0};














port(opcode : in bit_vector(15 downto 0);
extend : in bit vector(31 downto 0);
Page 1 of 3
Project: SIMPS
Revision: SIMPS
late: December 09,2006 SIMPS.vhd
rsdata_bus : in bit_vector(31 downto 0}
rtdata__bus : in bit_vector(31 downto 0}
rddata_bus : in bit_vector(31 downto 0)
data_bus : out bit_vector(31 downto 0};




















signal pc_in : bitjvector(7 downto 0);
signal insta_bus : bit_vector(31 downto 0);






: bitjvector(31 downto 0)
: bit_vector(31 downto 0)
• bit vector(31 downto 0)


















: bit_vector(31 downto 0);









Page 2 of 3
Project: SIMPS
Revision: SIMPS
late: December 09,2006 SIMPS.vhd Project: SIMPS
'.:';!, DE : decode




















mem__reg => mem_reg) ;
EX : execute


















Page 3 of 3 Revision: SIMPS
(fetch.vhd)
ate: December 09,2006 fetch.vhd
;. — Abdul Azim bin Abdullah




use ieee.std logic arith.all;
— fetch entity
entity fetch is
port(insta : out bit_vector{31 downto 0);
instb : out bit_vector(31 downto 0);
PCin : in bit_vector(7 downto 0);
clock, reset : in std_logic);
end fetch;
-- fetch architecture
architecture behaviour of fetch is
signal PC : bitjvector(7 downto 0);
constant memO : bitjvector(31 downto 0)
00000000"; -- no operation
constant meml : bit_vector(31 downto 0)
00000000";
constant mem2 : bit_vector(31 downto 0)
01010000"; — load opcodes ( load $30,7
constant mem3 : bitjvector(31 downto 0)
00000111";
constant mem4 : bitjvector{31 downto 0)
01010000"; -- load opcodes ( load $20,8
constant mem5 : bit_vector(31 downto 0)
00001000";
constant mem6 : bit_vector(31 downto 0)
01000010"; -- add unsigned
constant mem7 : bitjvector(31 downto 0)
00000000";
constant mem8 : bit_vector(31 downto 0)
01000101"; — subtract unsigned
constant mem.9 : bitjvector(31 downto 0)
00000000";
constant memlO: bit_vector(31 downto 0)
01001110"; — and
constant memll: bitjvector(31 downto 0)
00000000";
constant meml2: bit__vector (31 downto 0)
01010000"; — or
constant meml3: bit_vector(31 downto 0)
00000000";
constant meml4: bitjvector(31 downto 0)
01010010"; ~ xor





















ate: December 09,2006 fetch.vhd Project: SIMPS
:"•..; constant meml5: bit_vector (31 downto 0) := B"000000010000001000001010
00000000";
'i-1 constant mem!6: bit_vector {31 downto 0) := B"000000000000000000000000
01010100"; — nor
constant meml7: bit_vector(31 downto 0) := B"000000010000001000001010
00000000";
constant meml8: bit_vector{31 downto 0) := B"000000000000000000000000
01000111"; — multiply unsigned
--. constant meml9: bit_vector (31 downto 0) :* B"000000010000001000001010
00000000";
constant mem20: bit_vector{31 downto 0\
01010101"; — shift left logical
constant mem21: bit_vector(31 downto 0'
00000001";
constant mem22: bit__vector (31 downto 0'
01010111"; — shift right logical





wait until (clock'event) and (clock='l');
if reset='1' then
PC <= X"00";
































Page 2 of 3 Revision: SIMPS
ate: December 09, 2006
insta <= meml4;
'},: instb <= meml5;
';.":• when X"20" =>
-'•'• insta <= meml 6;





















ate: December 09,2006 decode.vhd
— Abdul Azim bin Abdullah




use ieee.std logic arith.all;
— decode entity
entity decode is
port(insta : in bit_vector(31 downto 0);
instb : in bit_vector(31 downto 0);
opcode : out bit_vector(15 downto 0};
rsdata_bus : out bit_vector(31 downto 0)
rtdata_bus : out bit_vector(31 downto 0)
rddata_bus : out bit_vector(31 downto 0)
extend : out bitjvector(31 downto 0);
wrd_bus : in bit_vector(31 downto 0);
reg_wrt : in std_logic;
reg_dst : in std__logic;







architecture behaviour of decode is
type reg_array is array (0 to 31) of bit_vector(31 downto 0);
signal annote : bit_yector(15 downto 0);
signal rs : bit_vector(7 downto 0}
signal rt : bit_vector(7 downto 0)
signal rd : bit_vector(7 downto 0)
signal ru : bit_vector(7 downto 0)
signal imm_v : bitjvector{15 downto 0);
signal reg : reg_array;
signal ireg : reg_array;
signal muxreg : reg_array;
signal regOwr, reglwr, reg2wr : std_logic;
signal wra_Jbus : bit_vector(7 downto 0) ;
begin
annote <= insta(31 downto 16};
opcode <= insta(15 downto 0);
rs <= instb(31 downto 24);
rt <= instb(23 downto 16);
rd <= instb(15 downto 8);
ru <= instb(7 downto 0);



































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































& o s: 3 rt o
"




















































































reg (8) when "



























































































wra bus <= rt when reg_dst='l' else rs;
regOwr <= *1' when ((wra_bus = "00000000") and (reg_wrt='lf)} else '0
reglwr o '1' when ((wra_bus = "00000001") and {reg_wrt='l1)) else '0
reg2wr <- '1' when ((wra_bus = "00000010") and (reg_wrt='1')) else *0
muxreg(O) <= X"00000000" when reg0wr='0' else wrd_bus;
muxreg(l) <= X"00001010" when reglwr='0' else wrd_bus;
muxreg(2) <= X"0000100F" when reg2wr='0' else wrd_bus;
extend(15 downto 0) <= imm_v;
extend(31 downto 16) <= X"FFFF" when imm v(15)='l' else X"0000";
process
begin












Page4 of 6 Revision: SIMPS







































































































































Page5 of 6 Revision: SIMPS
ate: December 09,2006 decode.vhd Project: SIMPS
r- •'} '"j
--'S end behaviour;
Page 6 of 6 Revision: SIMPS
(control,vhd)
ate: December 09,2006 control.vhd Project: SIMPS
': — Abdul Azim bin Abdullah





"10 — control entity
: entity control is
U port{PCin : in bit_vector(7 downto 0);
14 ra_bus : out bit_vector(31 downto 0);
wa_bus ; out bit_vector(31 downto 0);
l^ reg_wrt : out std_logic;
reg_dst : out std_logic;
1::• mem_wrt : out std_logic;
.'; :) mem__red : out std_logic;
7Q mem_reg : out std_logic);
>."• end control;
7. i -- control architecture






reg wrt <= *0';
reg^dst <= '0';
mem wrt <= '0';
mem red <= '1';
mem reg <= '0';
ra bus <= X"00000001";
when X"08" =>
reg wrt <= '0';
reg_dst <= '1';
mem wrt <= '0';
mem red <= '1';
mem reg <= '0';
ra bus <= X"00000010";
when X"0C" =>
reg wrt <= '0 *
reg_dst <= '1*
mem wrt <= '0'
mem red <— '0'
mem reg <= '0'
ra bus <= X"00 )00010";
when X"10" =>
reg wrt <~ '0'
reg_dst <= '1'
mem wrt <= '0'
Page 1 of 3 Revision: SIMPS












reg_wrt <= *0 *;
reg_dst <- '1 *;
mem_wrt <= '0';
mem_red <= '0';





mem_wrt <= *0' ;
mem_red <= *0';










reg_wrt <= '0 *;
reg_dst <= ' 1';
mem_wrt <= '0 ';




reg_wrt <= '0' ;
reg_dst <= '1' ;
mem_wrt <= '0';




reg__wrt <= ' 0 ';
reg_dst <= '1' ;
irtem_wrt <= ' 0 *;






mem wrt <= '0';
Page 2 of 3 Revision: SIMPS







Page 3 of 3 Revision: SIMPS
(execute,vhd)
ate: December 09,2006 execute.vhd Project: SIMPS
'. — Abdul Azim bin Abdullah
2 — Universiti Teknologi PETRONAS
,"; -- execute.vhd
>2 -- bv__arithmetic package
?. package bv_arithmetic is
function "+" (bvl, bv2 : in bit_vector) return bit_vector;
i;: function "-" (bvl, bv2 : in bitjvector) return bit_vector;
i function "-" (bv : in bit_vector) return bit_vector;
!(: function "*" (bvl, bv2 : in bit_vector) return bit_vector;
U; procedure bvjnultu (bvl, bv2 : in bit_vector;
"'. 2 bv__result : out bitjvector;
'•7 overflow : out boolean);
72 procedure bv_addu (bvl, bv2 : in bit_vector;
.-.> bv_result : out bitjvector;
•v' overflow : out boolean);
72 procedure bv_add (bvl, bv2 : in bit_vector;
bv_result : out bit_vector;
22 overflow : out boolean);
3 • procedure bv_addu (bvl, bv2 : in bit_vector;
bv_result : out bit_vector);
27 procedure bv_sub (bvl, bv2 : in bitjvector;
"7 2 bv_result : out bit_vector;
22., overflow : out boolean);
•'•'/' procedure bv_subu {bvl, bv2 : in bit_vector;
7:1 bv_result : out bit_vector) ;
id procedure bv_and {bvl, bv2 : in bit_vector;
2':. bv_result : out bit_vector) ;
H procedure bv_or {bvl, bv2 : in bit__vector;
bv_result : out bit_vector);
4: procedure bv__xor {bvl, bv2 : in bit_vector;
bv_result : out bit__vector) ;
1:; procedure bv_nor (bvl, bv2 : in bitjvector;
77 bv_result : out bit_vector);
5- function bv_sll {bv : in bit_vector;
•>••> shift__count : in natural) return bit_vector;
72 function bv_srl (bv : in bit_vector;
7b shift count : in natural) return bit vector;
Page 1 of 9 Revision: SIMPS
ate: December 09, 2006
:-. .; end bv arithmetic;
execute.vhd Project: SIMPS
-- bv_arithmetic package body










for index in result'reverse range loop














= opl(index) xor op2(index) xor carry_in;
= {opl(index) and op2(index)) or (carry in and









for index in result'reverse range loop













function "-" (bv :
constant zero
begin
return zero - bv;
end "-";
:= carry__out;
:= opl(index) xor (not op2(index}) xor carry__in;
:= {opl(index) and {not op2(index))) or (carry i
in bitjvector) return bitjvector is









(bvl, bv2 : in bit_vector} return bit_vector is
negative_result : boolean;
: bit_vector(bvl*range) := bvl;





;opl(opl'left} = '1') xor (op2(op2'left'
'1M then
Page 2 of 9 Revision: SIMPS
ate: December 09,2006 execute.vhd Project: SIMPS
113 opl := - bvl;
i:; 4 end if;
-:":• if {op2{op2'left) = '1') then
;:*'. op2 := - bv2;
2 end if;
3": 7 bv_multu(opl, op2, result);
1.; j if (negative_result) then
V30 result := - result;
1 -;' j end if;
2 27 return result;
:/7 end "*";
J33 procedure bv_multu (bvl, bv2 : in bit_vector;
12 2 bv_result : out bit_vector;
• 72": overflow : out boolean} is
constant bv_length : natural := bvl'length;
133 constant accumJLength : natural := bv_length * 2;
j33 constant zero : bitjvector(accum_length-l downto bv_
length) := (others »> '0');
variable accum : bitjvector(accum_length-l downto 0);
variable addu_overflow : boolean;
variable carry : bit;
begin
accum(bv_length-l downto 0) := bvl;
accum(accum_length-l downto bv_length) := zero;
for count in 1 to bv_length loop
if (accum{0) = 'l') then
bv_addu{ accum(accum_length-l downto bv_length), bv2,






accum := carry & accum(accum__length-l downto 1);
end loop;
bv__result := accum(bv__length-l downto 0);
overflow := accum(accum_length-l downto bv_length) /= zero;
end bv_multu;
procedure bv_addu (bvl, bv2 : in bit_vector;
bv_result : out bit__vector;
overflow : out boolean) is
variable opl : bit_vector{l to bvl'length);
variable op2 : bit_vector{l to bv2'length);
variable result : bit_vector(l to bv_result'length);




for index in result'reverse__range loop
result(index) := opl(index) xor op2(index) xor carry;




overflow := carry = '1';
Page3 of 9 Revision: SIMPS
ate: December 09,2006 execute.vhd Project: SIMPS
end bv_addu;
procedure bv_add (bvl, bv2 : in bit_vector;
bv_result : out bit_vector;
overflow : out boolean) is
variable opl : bit_vector{l to bvl'length);
variable op2 : bit_vector(l to bv2'length);
variable result : bit_vector(l to bv_result'length);
variable carry_in : bit;











overflow := carry_out /= carry_in;
—overflow := true;
end bv_add;
procedure bv_addu {bvl, bv2 : in bit_vector;
bv_result : out bit_vector) is
variable opl : bit__vector (1 to bvl'length);
variable op2 : bit_vector(l to bv2'length);
variable result : bit_vector(l to bv_result'length);




for index in result'reverse_range loop
result(index) := opl(index) xor op2(index) xor carry;





procedure bv_sub (bvl, bv2 : in bit_vector;
bv_result ; out bit_vector;
overflow : out boolean) is
variable opl : bit__vector(1 to bvl'length);
variable op2 : bit_vector(1 to bv2'length);
variable result : bit_vector{l to bv_result'length);
variable carry_in : bit;




for index in result'reverse_range loop
carry_in :- carry_out;
result(index) := opl(index) xor (not op2(index)) xor carry_in;
carry_out := (opl(index) and (not op2(index))) or (carry_in and (
opl(index) xor {not op2(index))));
Page 4 of 9 Revision: SIMPS
= carry_out;
= opl(index) xor op2{index} xor carry_in;
= (opl(index) and op2(index)) or (carry_in and (op
ate: December 09,2006 execute.vhd Project: SIMPS
end loop;
bv_result := result;
overflow := carry_out /= carry_in;
end bv_sub;
procedure bv_subu (bvl, bv2 : in bit_vector;
bvjresult : out bit_vector) is
variable opl : bit_vector(l to bvl'length);
variable op2 : bit_vector(l to bv2'length);
variable result : bit_vector{l to bv_result'length);




for index in result'reverse_range loop
result(index) := opl(index) xor op2(index) xor borrow;





313 procedure bv_and (bvl, bv2 : in bit_vector;
bvjresult : out bit_vector) is
variable opl : bit_vector(l to bvl'length);
variable op2 : bit_vector(l to bv2'length);




for index in result'reverse_range loop




procedure bvjor (bvl, bv2 : in bit_vector;
bv_result : out bit_vector) is
variable opl : bit_vector(l to bvl'length);
variable op2 : bit_vector(l to bv2'length);




for index in result'reverse_range loop




procedure bv_xor {bvl, bv2 : in bit_vector;
bv_result : out bit__vector) is
variable opl : bit_vector(l to bvl'length);
variable op2 : bit_vector{l to bv2'length};
variable result : bit__vector{1 to bv_result'length);
begin
opl := bvl;
Page 5 of 9 Revision: SIMPS
ate: December 09,2006 execute.vhd Project: SIMPS
op2 := bv2;
for index in result'reverse_range loop




procedure bvjior {bvl, bv2 : in bitjvector;
bvjresult : out bit_vector) is
variable opl : bit_vector(l to bvl'length);
variable op2 : bit_vector(l to bv2'length);




for index in result'reverse_range loop




function bv_sll (bv : in bitjvector;
shift_count : in natural) return bit_vector is
constant bv_length : natural := bv'length;
constant actual_shift__count : natural := shift__count mod bv_length;
variable bv_norm : bitjvector{1 to fov_length);
variable result : bit_vector(1 to bv_length) := (others => '0');
begin
bv_norm := bv;
result(1 to bv_length - actual_shift_count) :- bv_norm(actual_shift
_count + 1 to bv_length);
return result;
end bv_sll;
function bv_srl {bv : in bit_vector;
shift_count : in natural) return bitjvector is
constant bv_length : natural :~ bv'length;
constant actual_shift__count : natural := shift_count mod bv_length;
variable bv_norm : bit_vector(l to bv_length);
variable result : bitjvector(1 to bv_length) :== (others => '0');
begin
bvjiorm := bv;










Page 6 of 9 Revision: SIMPS
ate: December 09, 2006 execute.vhd Project: SIMPS
entity execute is
port{opcode : in bitjvector(15 downto 0);
extend : in bit_vector(31 downto 0);
rsdata_bus : in bit_vector(31 downto 0)
rtdata_bus : in bit_vector(31 downto 0)
rddata_bus : in bit_vector(31 downto 0)
data_bus : out bit_vector(31 downto 0);







architecture behaviour of execute is
signal op_impl : bitjvector(15 downto 0);
constant op_nop : bit_vector (15 downto 0) := X"0000";




constant op_add : bit_vector(15 downto 0)
d, rs, rt // add signed (with overflow check
constant op addu : bit vector(15 downto 0)
- X"0040";
// rs + rt ~> rd
= X"0042"; — ADDU
rd, rs, rt // add unsigned {without overflow check) // rs + rt -> rd
-- SUB rconstant op_sub : bit_vector(15 downto 0}
d, rs, rt // sub signed (with underflow check
constant op_subu : bitjvector(15 downto 0)
rd, rs, rt // sub unsigned (without underflow check) // rs - rt
constant op_multu: bitjvector(15 downto 0) := X"0047";
rd, rs, rt
— logical operations
constant op_and : bitjvector(15 downto 0)
d, rs, rt // rs & rt -> rd
constant op_or : bit_vector(15 downto 0)
, rs, rt // rs | rt -> rd
constant op_xor : bit_vector(15 downto 0)
d, rs, rt // rs A rt -> rd
constant op_nor : bit_vector(15 downto 0)
d, rs, rt // -(rs | rt) -> rd
= X"0044";






constant op_sll : bitjvector{15 downto 0) := X"0055";
d, rt, shamt // rt « shamt ~> rd
constant op_srl : bit_vector{15 downto 0) := X"0057";




variable rsjdata : bit_vector{31 downto 0);










Page7 of 9 Revision: SIMPS
ate: December 09,2006 execute.vhd
variable rd_data : bit_vector(31 downto 0};
begin
wait until (clock'event) and (clock='l'};
rsjdata := rsdata_bus;
rt_data := rtdata_bus;


































data_bus <= bv_sll(rt_data, 1);
when op srl =>
Page 8 of 9
Project: SIMPS
Revision: SIMPS
ate: December 09,2006 execute.vhd Project: SIMPS






Page 9 of 9 Revision: SIMPS
(memory.vhd)
ite: December 09,2006 memory.vhd
-- Abdul Azim bin Abdullah






port(rd__bus : out bit_vector(31 downto 0);
wd_bus : in bitjvector(31 downto 0)
ra_bus : in bit_vector(31 downto 0)
wa_bus : in bit_vector(31 downto 0)
meirjwrt : in std__logic;
mem_red : in std_logic;
mem_reg : in std_logic;
clock, reset : in std_logic);
end memory;
architecture behaviour of memory is
signal muxout : bitjvector(31 downto 0);
signal address : bit_vector(31 downto 0);
signal memO, meml, mem2 : bit_vector{31 downto 0);
signal imemO, imeml, imem2 : bitjvector(31 downto 0);
signal muxmemO, muxmeml, muxmem2 : bit_vector(31 downto 0);






muxout <= memO when address=X"00000000" else
meml when address=X"00000001" else
mem2 when address=X"00000010";
rd_bus <= muxout when mem_red='l' else
X"FFFFFFFF";
—memOwrt <= '1' when ((mem_wrt='1') and (wa_bus=X"00000000")) else *
0' ;
—memlwrt <= '1' when ((memlwrt-11') and (wa_bus=X"00000001")) else '
0';
—mem2wrt <= '1' when ((mem wrt='1'} and {wa bus=X"00000010")) else '
muxmemO <= wd_bus when mem_wrt='l' else memO;
muxmeml <= wd__bus when mem_wrt=*l' else X"00000000";






ate: December 09,2006 memory.vhd Project: SIMPS
;3 wait until (clock'event) and {clock='l');
3.3 if (reset='l') then
memO <= imemO;
.v3 meml <= imeml;
•3 mem2 <= imem2;
33 else
(3: memO <= muxmemO;
•'72 meml <= muxmeml;
77;, mem2 <= muxmem2;
'-.-•\ end if;
i ''•-. end process;
27 end behaviour;
Page 2 of 2 Revision: SIMPS
