A High speed 16-bit RISC processor chip by Chen, Wan-Fu
Rochester Institute of Technology 
RIT Scholar Works 
Theses 
5-1-1994 
A High speed 16-bit RISC processor chip 
Wan-Fu Chen 
Follow this and additional works at: https://scholarworks.rit.edu/theses 
Recommended Citation 
Chen, Wan-Fu, "A High speed 16-bit RISC processor chip" (1994). Thesis. Rochester Institute of 
Technology. Accessed from 
This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in 
Theses by an authorized administrator of RIT Scholar Works. For more information, please contact 
ritscholarworks@rit.edu. 






Partial Fulfillment of the





Graduate Advisor - Kenneth W. Hsu, Associate Professor
Tony H. Chang, Professor
Roy S. Czemikowski, Professor and Department Head
Department of Computer Engineering
College of Engineering
Rochester Institute of Technology
Rochester, New York
May, 1994
This document was produced using Windows version 3.1, Microsoft Word
version 2.0 for Windows, and Windows Paintbrush. All circuits are built by
using Design Architecture software. The simulation of the circuits was done
by using Quicksim II and Accusim software, and the chip design was done by
using IC program.
The following names used

















Copyright (C) 1994 byWan-Fu Chen
All rights reserved
THESIS RELEASE PERMISSION FORM
ROCHESTER INSTITUTE OF TECHNOLOGY
COLLEGE OF ENGINEERING
Title: A High Speed 16-bit RISC Processor
I, Wan-fu Chen, hereby grant pennission to the Wallace Memorial Library of
RIT to reproduce my thesis in whole or in part.
Signature: _
Date: r 7r }O-91f; fi'
ABSTRACT
The goal of this thesis is to design and simulate a high speed 16-bit
processor chip by using RISC architecture. The high computing speed is
achieved by employing a more effective four-stage pipeline. This
processor executes every instruction in one clock cycle, and it won't have
any delay of executing instructions when it executes Jump, Condition
Jump, Call, and Return instructions. Its computing speed is 4 times faster
than the speed of the Berkeley RISC II's for the 8-MHz clock. The design
includes the main architectural features of the RISC: the 4-stage pipeline,
the thirty-two 8-bit register bank, the 16-bit address and data paths, the
internal timer, the input port, and the two output ports.
The chip is designed using 2u. CMOS N well two metal layer technology.
The processor runs at a clock rate of 16 MHz. The size of the chip is











Chapter 1 Introduction 1
1.1 16-bit Architecture 2
1.2 Superpipeline Architecture
1.3 System Interface 2
1.4 CPU Register Overview 2
1.5 CPU Instruction Set Overview 4























































Chapter 3 The CPU Pipeline 50
3.1 CPU Pipeline Operation 51
3.2 CPU Pipeline Stages
......................51
3.3 Limitation of the CPU pipeline 55
Chapter 4 System Interface 56
4.1 16-bit Address(AO~A 15) 57
4.2 16-bit Data Bus(L0~115) 57
4.3 Two 8-bit Output Ports (DO0-DO7, D08-D015) 57
4.4 8-bit Input Port (DI0-DI7) 57
4.5 Interrupt Control Pin (INT) 57
4.6 Reset Pin (RESET) 58
4.7 Clock Pin (CLKIN) 58
Chapter 5 CPU Registers 59
5.1 32 General Purpose Registers (R0-R31) 60
5.2 Program Counter (PC) 60
5.3 8-bit Timer 60
5.4 16 stack registers 60
5.5 Program Status Word (PSW) Register 61
5.6 Accumulator Register 62
5.7 Higher-order 8-bit Address Register for the Program
Counter (HP) 62
Chapter 6 Memory 64
6.1 Read/Write RAM Circuit 65
6.2 Writting RAM Program 65
6.3 Reading RAM Program 66
Chapter 7 Test Program 69
7. 1-7.4 Test Program 01-04 70
7.5-7.7 Test Program 05-07 71
in




7.12-7.14 Test Program 12-14 74
7. 15-7. 17 Test Program 15- 17 75
7.18-7.20 Test Program 18-20 76
7.21-7.22 Test Program 21-22 77
Chapter 8 Logic Circuits 1 12
8.1 D flip-flop 113
8.2 16-bit Register 1 113
8.3 16-bit Register II 1 13
8.4 8-bit Register I
'
n3
8.5 8-bit Register II \ZZZZZZZZZZl 14
8.6 6-bit Register 114
8.7 4-bit Register 114
8.8 Instruction Register 1 14
8.9MEM32 ZZZZZZZZZZZZZZ.IU
8.10 Accumulator 115
8.11 PSW Circuit 115
8.12 IT Circuit 115
8.13MEM16 116
8. 14 JK flip-flop Z.ZZ.ZZll6
8.15 16-bit Counter 116
8.16 8-bit Counter 116
8.17 Program Counter 116
8.18 Timer Circuit 117
8. 19 Full Adder, XOR, AND,and OR Circuit .117
8.20 16-bit Adder and Subtracter 117
8.21 8-bit Adder and Subtracter 1 17
8.22 Stack Pointer 117
8.23 ALU 118
8.24SUB-1 118
8.25 Control Unit 118
8.26 16-bit RISC Processor 119
Chapter 9 VLSI Design 157
9.1 VLSI Design 158
9.2 Delay Time ofEach Circuit 158





Figure 1 - 1 Processor Internal Block Diagram
3
Figure 1-2 CPU Instruction Formats
4
Figure 2-1 CPU Instruction Formats Z....Z.ZZZ. 9
Figure 2-2 Clearing the extending bit program 50
Figure 2-3 Setting the extending bit program 50
Figure 3-1 Instruction Pipeline Stage
ZZZZZZ""""
51
Figure 3-2 A example of pipeline
-
Figure 5-1 Program Status Word Register
61
Figure 6-1 RAM read/write Circuit Internal Block Diagram 67
Figure 6-2 Writing Timing Diagram 67
Figure 6-3 Reading Timing Diagram 68
Figure 7-1-1-7-2-1 Test Program 01-02
.....................78
Figure 7-3-1-7-4-1 Test Program 03-04 ZZZZ.19
Figure 7-5-1-7-7-1 Test Program 05-07
........$0
Figure 7-8-1-7-9-1 Test Program 08-09 81
Figure 7-10-1 Test Program 10 82
Figure 7-11-1 -7-12-1 Test Program 11- 12 83
Figure 7-13-1 - 7-14-1 Test Program 13 - 14 84
Figure 7-15-1 - 7-16-1 Test Program 15 - 16 85
Figure 7-17-1 - 7-18-1 Test Program 17 - 18 86
Figure 7-19-1 -7-20-1 Test Program 19-20 87
Figure 7-21-1 - 7-22-1 Test Program 21 - 22 89
Figure 7-1-2 Simulation Result for Test Program 01 90
Figure 7-2-2 Simulation Result for Test Program 02 91
Figure 7-3-2 Simulation Result for Test Program 03 92
Figure 7-4-2 Simulation Result for Test Program 04 93
Figure 7-5-2 Simulation Result for Test Program 05 94
Figure 7-6-2 Simulation Result for Test Program 06 95
Figure 7-7-2 Simulation Result for Test Program 07 96
Figure 7-8-2 Simulation Result for Test Program 08 97
Figure 7-9-2 Simulation Result for Test Program 09 98
Figure 7-10-2 Simulation Result for Test Program 10 99
Figure 7-11-2 Simulation Result for Test Program 11 100
Figure 7-12-2 Simulation Result for Test Program 12 101
Figure 7-13-2 Simulation Result for Test Program 13 102
Figure 7-14-2 Simulation Result for Test Program 14 103
Figure 7-15-2 Simulation Result for Test Program 15 104
Figure 7-16-2 Simulation Result for Test Program 16 105
Figure 7-17-2 Simulation Result for Test Program 17 106
Figure 7-18-2 Simulation Result for Test Program 18 107
Figure 7-19-2 Simulation Result for Test Program 19 108
Figure 7-20-2 Simulation Result for Test Program 20
..Z..Z.ZZ.Z. 109
Figure 7-21-2 Simulation Result for Test Program 21 ZZZZZZZZa 10
Figure 7-22-2 Simulation Result for Test Program 22 1 1 1
Figure 8-1 D flip-flop 121
Figure 8-2 16-bit Register I
122
Figure 8-3 16-bit Register II
123
Figure 8-4 8-bit Register I
124
Figure 8-5 8-bit Register II
125
Figure 8-6 6-bit Register
126
Figure 8-7 4-bit Register
....... 127
Figure 8-8 Instruction Register 128
Figure 8-9 MEM32
.........................................129
Figure 8-10 Accumulator 130
Figure 8-11 PSW Register
...131
Figure 8-12 INT and Timer Circuit 132
Figure 8-13 MEM16 (Stack) ZZZ.'.'.'. 133
Figure 8-14 Jk flip-flop 134
Figure 8-15 16-bit Counter 135
Figure 8-16 8-bit Counter 136
Figure 8-17 Program Counter 137
Figure 8-18 Timer Circuit 138
Figure 8-19 Full Adder 139
Figure 8-20 XOR 140
Figure 8-21 AND 141
Figure 8-22 OR 142
Figure 8-23 16-bit Adder and Subtracter 143
Figure 8-24 8-bit Adder and Subtracter 144
Figure 8-25 Stack Pointer 145
Figure 8-26 ALU 146
Figure 8-27 SUB- 1 147
Figure 8-28 Control Unit 148
Figure 8-29 EX1 Stage of the Control Unit 149
Figure 8-30 EX2 Stage of the Control Unit 150
Figure 8-31 EX3 Stage of the Control Unit 151
Figure 8-32 16-bit RISC Processor 152
Figure 8-33 Part I of the Processor 153
Figure 8-34 Part II of the Processor 154
Figure 8-35 Part III of the Processor 155
Figure 8-36 Part IV of the Processor 156
Figure 9-1 Inverter Circuit 161
VI
Figure 9-2 Tri-State Inverter
162
Figure 9-3 Three-input NOR
163
Figure 9-4 Two-input NOR
ZZ.'""
164
Figure 9-5 Six-input NAND
165
Figure 9-6 Six-input NOR
ZZZ."""""""
166
Figure 9-7 Three-input NAND
167




Table 1-1 CPU Instruction Set: Computational Instructions 6
Table 1-2 CPU Instruction Set: Jump and Brand Instructions .....7
Table 1-3 CPU Instruction Set: Transfer Instructions 7
Table 1-4 CPU Instruction Set: Load Instructions 8
Table 1-5 CPU Instruction Set: Special Instructions 8
Table 5-1 Status Register Fields 61














Complex Instruction Set Computer
Complementary Metal-Oxide-Semiconductor
Central Processing Unit




Reduced Instruction Set Computer




The RISC (Reduced Instruction Set Computer) Architecture is a
recent trend in computer design. It just has many fewer instructions than
the CISC (Complex Instruction Set Computer) does, so the RISC
architecture reduces much space on the control area but tends to utilize
many internal registers [1]. Because the architecture uses pipeline
technology and has a simpler and smaller control system, its computing
speed is generally faster than CISC. The purpose of this thesis is to
implement a high speed 16-bit processor chip by using RISC architecture
and also increase the computing speed by using more stages of a pipeline.
This chip can execute an instruction in each clock cycle. For example, if
the frequency of the clock is 1 MHz, the chip can execute one million
instructions per second. This chip's computing speed is 4 times faster than
the RISC II's for an 8-MHz clock .
This chapter describes the following:
the 16-bit architecture of the processor
the superpipeline design of the processor (described in detail in
chapter 3)
an overview of the system interface (described in detail in chapter
4)
an overview of the CPU registers (described in detail in chapter 5)
an overview of the CPU instruction set.(described in detail in
chapter 2)
1.1 16-bit Architecture
The natural mode of operation for the processor is as a 16-bit
microprocessor; however, all registers are 8-bit registers. The processor
provides the following:




Figure 1-1 is a block diagram of the processor internal.
1.2 Superpipeline Architecture [2]
The processor exploits instruction parallelism by using a four-stage
superpipeline. Under normal circumstances, one instruction is executed
each clock cycle.
1.3 System Interface
The processor supports a 16-bit and an 8-bit system interface. The System
Interface includes:
a 16-bit address bus
a 16-bit data bus (input direction)
two 8-bit output ports
an 8-bit input port
an interrupt control pin
a reset pin
1.4 CPU Register Overview
The central processing unit (CPU) provides the following registers:








Figure 1-1 Processor Internal
Block Diagram
a Program Counter (PC)
a 8-bit Timer
16 stack registers
a Program Status Word (PSW) register
an Accumulator register
a Higher-order 8-bit Register for the program counter (HP)
1.5 CPU Instruction Set Overview








15 10 9 8 7 0
OP XX immediate I
15 10 9 0
OP XXXXXXXXXX I
15 10 9 5 4 0
OP XX register 1
XX: Don't care
Figure 1-2 CPU Instruction Formats
Each format contains a number of different instructions,which are
described further in this chapter. Fields of the instruction formats are
described in Chapter 2.
The instruction set can be further divided into the following groupings:
Transfer instructions move data between the accumulator register
and registers. They are all register (R-type) instructions.
Computational instructions perform arithmetic, logical, and shift
operations on value in registers. They include register (R-type, in
which one operand is stored in general registers, the other is stored
in the accumulator register, and the result is stored in registers) and
immediate (I-type, in which one operand is an 8-bit immediate
value) formats.
Load instructions move data between memory and some special
registers (the accumulator registered the Timer). They are all
immediate (I-type) instructions.
Jump and Branch instructions change the control flow of a
program. Those instructions are always made to paged, absolute
address formed by combining an 8-bit immediate address with the
high-order 8-bit register (HP) (I-type).
Special instructions perform program calls, returns, and changing
the PSW ( Program Status Word) register. These instructions are
always O-type.
Table 1-1 through 1-5 list CPU instructions.


























Subtract Immediate without affecting the
accumulator register




JC Branch on C flag True
I^C Branch on C flag False
JE Branch on Z flag True
JNE Branch on Z flag False
Table 1-3 CPU Ins truction Set: Transfer Instructions
Opcode Description
LDA Load registers
STA Store the accumulator register
LDP Load PSW register into the accumulator
STP Load the accumulator into PSW register






Load Immediate into the accumulator
Load Immediate into the Timer
Load Immediate into the HP register
Table 1-5 CPU Instruction Set: Special Instructions
Opcode Description
CALL Program Call
RET Return program call
RETI Return Interrupt
RETM Return Timer interrupt
NOP Nothing happens
OUT1 Load the accumulator into the output portl
OUT2 Load the accumulator into the output portll
IN Load the input port into the accumulator
HALT Stop CPU running
SET Set C, I, or M flags
CLR Clear C, I, or M flags
Chapter 2
CPU Instruction Set Details
This chapter provides a detailed description of the operation of each
instruction .











10 9 8 7
OP XX immediate j
10 9
OP xxxxxxxxxx |
10 9 5 4
OP XX register |
XX: Don't care
Figure 2-1 CPU Instruction Formats
ADDC Addc ADDC





The contents of a general register, the contents of the
accumulator, and the content of the C flag are added to form the
result. The result is placed into the accumulator. An overflow
occurs if the carries out of bits 6 and 7 differ (2's complement
overflow).
Operation:




AUDI Add Immediate ADDI





The 8-bit immediate, the contents of the accumulator, and the
content of the C flag are added to form the result. The result is
placed into the accumulator. An overflow occurs if the carries
out of bits 6 and 7 differ (2's complement overflow).
Operation:










The contents of a general register and the content of the C flag
are subtracted from the contents of the accumulator to form the
result. The result is placed into the accumulator. An overflow
occurs if the carries out of bits 6 and 7 differ (2's complement
overflow).
Operation:




^U -D-l Subtract immediate SUBI





The 8-bit immediate and the content of the C flag are subtracted
from the contents of the accumulator to form the result. The
result is placed into the accumulator. An overflow exception
occurs if the carries out of bits 6 and 7 differ (2's complement
overflow).
Operation:










The contents of general register are combined with the contents
of the accumulator in a bit-wise logical AND operation. The
result is placed into the accumulator register.
Operation:






AllDl And Immediate ANDI





The 8-bit immediate is zero-extended and combined with the
contents of the accumulator in a bit-wise logical AND operation.
The result is placed into the accumulator.
Operation:












The contents of general register are combined with the contents
of the accumulator in a bit-wise logical OR operation.The result
is placed into the accumulator register.
Operation:






vJlV 1 Or Immediate ORI
J 10 9 8_7 0
ooiooo XX immediate
Format:
OR I A, Immediate
Description:
The 8-bit immediate is zero-extended and combined with the
contents of the accumulator in a bit-wise logical OR operation.
The result is placed into the accumulator.
Operation:












The contents of general register are combined with the contents of
the accumulator in a bit-wise logical XOR operation.The result is
placed into the accumulator register.
Operation:






AAJIvl Xor Immediate XORI
15 10 9 8 7 0
ooioio XX immediate
Format:
XOR I A, Immediate
Description:
The 8-bit immediate is zero-extended and combined with the
contents of the accumulator in a bit-wise logical XOR operation.
The result is placed into the accumulator.
Operation:













The contents of the accumulator are shifted right by one bit,
inserting the value of an extending bit , which is in the
accumulator, into the highest-order bit. The value of the least-
order is shifted into the extending bit. The result is placed in the
accumulator. Figure 2-2 shows the program how to clear the
extending bit, and figure 2-3 shows the program how to set the
extending bit.
Operation:
A <- A shift right one bit
r> b7 -> b6 > b5 -> b4 -> b3 -> b2 -> bl -> bO -> EX ->
EX: Extending bit in the accumulator
Flag affected:
Z, V, and S
20
IVLA Shift Left RLA






The contents of the accumulator are shifted left by one bit,
inserting the value of an extending bit, which is in the accumulator,
into the least-order bit. The value of the highest-order is shifted
into the extending bit. The result is placed in the accumulator.
Operation:
A < A shift left one bit
<- b7 <- bG <- b5 <- b4 <- b3 <- b2
<- bl <- bO <- EX <-,
EX: Extending bit in the accumulator
Flag affected:
S, V, and Z
21
JUMP Jump JUMP





The 8-bit immediate address is combined with the HP register
(high-order bits of the address). The program unconditionally











J V> Branch on C flag True J \^





The 8-bit immediate address is combined with the HP register
(high-order bits of the address). If the C flag is true, then the
program branches to this calculated address without any delay of
executing instructions. Otherwise, the program will continue













J J^ Branch on Z flag True JJL





The 8-bit immediate address is combined with the HP register
(high-order bits of the address). If the Z flag is true, then the
program branches to this calculated address without any delay of
executing instructions. Otherwise, the program will continue





PC <- HP , Immediate
else




V^lVlx 1 Compare Immediate CIVIPI





The 8-bit immediate is zero-extended and combined with the
contents of the accumulator in a subtraction operation. The result
will not be placed into the accumulator, but just affects the Z
flag. This instruction is useful when a user just wants to check if






C <- 0, and V<-0
25
-LrfUAl Load Immediate LDAI












L1JA Load Register LDA











^ A -** Load Accumulator















jLUIVll Load Immediate LDMI











\JU11 Output to PortI OUT1






The contents of the accumulator are loaded into the output portl.
Operation:




\JU 12 Output to PortH ' OUT2






The contents of the accumulator are loaded into the output portll.
Operation:











The contents of the input port are loaded into the accumulator.
Operation:










The 8-bit immediate address is combined with the HP register
(high-order bits of the address). The program unconditionally
jumps to this calculated address without any delay of executing




Stack Pointer <r- Stack Pointer + 1











This instruction pops the return address off the stack into the
program counter.
Operation:
PC< The contents of the stack popped
























This processor stopped executing any instructions. The processor




J IN C> Branch on C flag False J IN V>






The 8-bit immediate address is combined with the HP register
(high-order bits of the address). If the C flag is not true, then the
program branches to this calculated address without any delay of






PC <- HP , Immediate
else




JINlL Branch on Z flag True JINE





The 8-bit immediate address is combined with the HP register
(high-order bits of the address). If the Z flag is not true, then the
program branches to this calculated address without any delay of












-L.Dlil Load Immediate EllHI





The 8-bit immediate is zero-extended and loaded into the HP






V^Elvl Clear I flag CLRI









































\SlL 1 C^ Set C flag SETC
















































This instruction pops the return address off the stack into the
program counter and enable the Interrupt signal and Timer
interrupt. This instruction must be used in the Interrupt program.
Operation:
PC<- The contents of the stack popped











This instruction pops the return address off the stack into the
program counter and enable the Interrupt signal and Timer
interrupt.This instruction must be used in Timer program.
Operation:
PC<- The contents of the stack popped




EDx Load PSW LDP



















The contents of the accumulator are loaded into the PSW.
Operation:




ADDRESS CODE LABEL LANGUAGE
0000 1100 LDAI #00H; Preset the accumulator with 00H
0001 0B00 RRA ; Rotate the accumulator right one bit





1 1FF LDAI #FFH; Preset the accumulator with FFH
0B00 RRA ; Rotate the accumulator right one bit
Figure 2-3 Setting the extending bit program
CLK







1100 )( 0201 \ 0202




I 01 I 03 k
ADDRESS CODE LABEL LANGUAGE
0000 1100 LDAI #00H
0001 0201 ADDI A, 01H
0002 0202 ADDI A, 02H
CLK: 0 -> Fetch, l-> Output (PC, IR)
CLK: 1 -> Fetch, 0-> Output (register C, register D,
registyer B, Accumulator)




This chapter describes the basic operation of the CPU pipeline and lists
the operation how to execute an instruction on each stage of the pipeline.
51
3.1 CPU Pipeline Operation
The CPU has a four-stage instruction pipeline;each stage takes one clock
cycle. Thus, the execution of each instruction takes at least four clock
cycles. Once the pipeline has been filled, four instructions are executed
simultaneously. Figure 3-1 shows the four stages of the instruction
pipeline. Figure 3-2 shows a example how the pipeline works.
FE EX1 EX2 EX3
FE EX1 EX2 EX3
FE EX1 EX2 EX3





Figure 3-1 Instruction Pipeline Stages
3.2 CPU Pipeline Stages
This section describes each of the four pipeline
stages:
FE -Instruction Fetch
. EX1 - First step of
execution
. EX2 -- Second step of
execution




During the IF stage, the following occurs:
The instruction (pointed by Program Counter) in memory will be
loaded into the first buffer in the controller.
Program Counter points next instruction executed.
EX1 First step of execution
During the EX1 stage, one of the following occurs:
The Address Register is loaded with the number of general register.
This operation is for ADDC, SUBC, AND, OR, XOR, LDA, and
STA instructions.
The Register C is loaded with immediate. This operation is for
ADDI, SUBI, ANDI, ORI, XORI, CMPI, and LDAI instructions.
Program Counter is loaded with Immediate and the contents of HP
register. This operation is for JMP, JC, JE, JNC, JNE, and CALL
instructions.
. Timer is loaded with Immediate. This operation is for LDMI
instruction.
. The In port catches data from input pins. This
operation is for IN
instruction.
. Program Counter is loaded with the contents
of Stack. This
operation is for RET, RETI, and RETM
instructions.
. HP register is loaded with Immediate. This
operation is for LDHI
instruction.
53
Nothing happens. This operation is for RLA, RRA, OUT1, OUT2,
NOP, LDP, and STP instructions.
The CPU is stopped mnning. This operation is for HALT
instruction.
PSW register is loaded with Immediate. This operation is for CLRI,
SETI, CLRC, SETC, CLRM, and SETM instructions.
EX2 Second step of execution
During the EX2 stage, one of the following occurs:
Register B is loaded with the contents of the general register that
Address Register points. This operation is for ADDC, SUBC,
AND, OR, and XOR instructions.
Register B is loaded with the contents of Register C. This
operation is for ADDI, SUBI, ANDI, ORI, XORI, and CMPI
instructions.
. Nothing happens. This operation is for RRA, RLA, JMP, JC, JE,
STA, STMI, OUT1, OUT2, IN, NOP, HALT, JNE, JNC, LDHI,
CLRI, SETI, CLRC, SETC, CLRM, SETM, LDP ,
and STP
instructions.
. Register D is loaded with the contents of Register C. This
operation is for LDAI instruction.
. Register D is loaded with the contents of the general
register that
Address register points. This operation is for LDA
instruction.




Stack Pointer is added by one. This operation is for RET, RETI,
and RETM instruction.
EX3 Third step of execution
During the EX3 stage, one of the following occurs:
The accumulator is loaded with the contents of ALU (arithmetic
logic unit). This operation is for ADDC, ADDI, SUBC, SUBI,
AND, ANDI, OR, ORI, XOR, and XORI instructions.
The contents of the accumulator are shifted left by one bit. This
operation is for RLA instruction.
The contents of the accumulator are shifted right by one bit. This
operation is for RRA instruction.
. Nothing happens. This operation is for JMP, JC, JE, LDMI, CALL,
RET, RETI, RETM, NOP, HALT, JNC, JNE, LDHI, CLRI, SETI,
CLRC, SETC, SETM, and CLRM instructions.
. The Z flag is changed. This operation is not only for
CMPI
instruction.
. The accumulator is loaded with the contents of Register D. This
operation is for LDAI and LDA instructions.
. The general register that Address Register points is loaded
with the
contents of the accumulator. This operation is for STA instruction.
. The output PORTI is loaded with the contents of the
accumulator.
This operation is for OUT1 instruction.
. The output PORTII is loaded with the contents
of the accumulator.
This operation is for OUT2 instruction.
. The accumulator is loaded with the
contents of the input PORT.
This operation is for IN instruction.
55
The interrupt register is cleared. This operation is for RETI
instruction.
The Timer interrupt register is cleared. The operation is for RETM
instruction.
The PSW register is loaded with the contents of the accumulator.
This operation is for LDP instruction.
The accumulator is loaded with the contents of the PSW register.
This operation is for STP instruction.
3.3 Limitation of the CPU pipeline
This CPU pipeline executes an instruction per clock cycle even if it
executes JUMP ,Condition Jump, CALL, and Return instructions. It will
not have any delay of executing instructions; however, the following rules
must be obeyed:




2. When a user uses CMPI instruction, next two instructions can not
be any condition jump instructions.
3. When a user uses STP instruction, next two instructions can not be




This chapter describes the 16-bit and an 8-bit System Interface. The
System Interface includes:
a 16-bit address bus
a 16-bit data bus (unidirectional)
two 8-bit output ports
an 8-bit input port




4.1 16-bit Address Bus (A0--A15)
The 16-bit address bus is used to choose which address in memory will
be fetched.
4.2 16-bit Data Bus (L0--L15)
The 16-bit data bus carries the 16 data bits that the 16-bit address points
in memory to the controller.
4.3 Two 8-bit Output ports(DO0-DO7,DO8-DO15)
The OUT1 and OUT2 instructions transfer the contents of the
accumulator to these two 8-bit output ports so a user can send the contents
of the accumulator or other data to extent devices by using the output
ports.
4.4 8-bit Input Port (DI0-DI7)
The IN instruction transfers the contents of the input port to the
accumulator, therefore, a user can receive data from
other devices by
using this input port.
4.5 Interrupt Control Pin (INT)[3]
Sometime it is necessary to interrupt the
execution of the main program to
answer a request from an I/O device. For instance, an I/O
device may
send an interrupt signal to the interrupt
control pin to indicate that data is
ready for input.
The computer temporarily stops what
it is doing, inputs
the data, then returns to what it was
doing. An interrupt signal must be
asserted for at least one clock cycle. This
interrupt control pin actives high
and the PC will be loaded 0007H.
58
4.6 Reset Pin (RESET)
The reset pin carries the RESET signal. This signal may come from an
operator reset button or other source. When RESET is high, the CPU will
reset the program counter (0000H), and other registers. The CPU remains
in reset until the RESET goes low.
4.7 Clock Pin (CLKIN)





This chapter describes the following :
32 general purpose registers
a Program Counter (PC) register
an 8-bit Timer
an 8-bit integer arithmetic logic unit (ALU)
16 stack registers
a Program Status Word (PSW) register
an Accumulator
a Higher-order 8-bit Address Register for the program counter (HP)
60
5.1 32 General Purpose Registers (R0--R31)
These general purpose registers are like a small on-chip RAM with
addressable memory locations. Control signals select the register for a
read or write operation. This means that CPU can either load a register
from the 8-bit internal data bus or output the register to another data bus.
5.2 Program Counter (PC) [3]
The program is stored at the beginning of the memory with the first
instruction at binary address 0000, the second instruction at address 0001,
the third at address 0002, and so on. The program counter, which is part
of the control unit, counts from 0000 to FFFFH. Its job is to keep track of
the next instruction to be fetched and executed.
5.3 8-bit Timer
This 8-bit timer is an 8-bit counter. CPU can preset the counter with a
certain number and the counter will increase one by one. When the
content of the counter is 0000, the timer will send the timer interrupt to
the controller. If the M flag is high, the computer temporarily stops what
it is doing, does a job for the timer interrupt, then return to
what it was
doing. This interrupt makes PC be loaded 0007H.
5.4 16 stack registers
The 16 stack registers are 16-bit registers.CPU
has a CALL instruction
(like Interrupt and Timer interrupt) that sends the
program to a subroutine.
As we know, before the jump takes place,
the program counter is
incremented and the address is saved at the
stack register that the stack
pointer points. The stack pointer will increase
one after this happening. At
61
the completion of a subroutine, the RET, RETM, or RETI instruction
loads the program counter with the return address, which allows the
computer to get back to the main program.
5.5 Program Status Word (PSW) Register
The Program Status Word register (PSW) is a read/write register that
contains the operating mode, and interrupt enabling. The following list
describes the more important Status register field; Figure 5-1 shows the
format of the Status register. Table 5-3 describes the status register fields.
7 6 5 4 3 2 1 0
X X V S I M Z c
Figure 5-1 Program Status Word Register
62
Table 5-1 Status register Fields
Field Description
V It is an overflow flag. When a sum or difference
lies outside the normal range of the accumulator ( -
128-127), this flag will be high.
The S flag is set when the accumulator contents




Interrupt mask: controls the enabling of external
interrupt (INT). The interrupt is taken if interrupt is
enabled,and the corresponding bit is set in the
Interrupt field of the PSW register.
0->disabled
1 -^enabled
M Timer interrupt Mask: controls the enabling of
internal timer interrupt. The interrupt is taken if
interrupt is enabled,and the corresponding bit is set




0->the result ofALU is not zero
lH>the result ofALU is zero
1 -scarry happens during
computation




The accumulator is a buffer register that stores intermediate answers during
a computer run. Its two-state output goes directly to the ALU,general
registers, two output ports,and PSW register. Its input comes from a data
bus.
5.7 Higher-order 8-bit Address Register for the program counter (HP)
This register stores the higher-order address that combined with an 8-bit
immediate address. Those JUMP,JNE,JNC,JE,JC,and CALL instructions




The processor doesn't provide any instructions for reading or writing
RAM (Random-Access Memory) because this kind of instructions will
have delay of executing instructions. It doesn't mean that a user cannot
use this processor to read or write RAM. There are two output ports and
one input port in this processor so a user can use those I/O ports to do
some things about RAM.
This chapter describes the following:
How to build a RAM read/write circuit.
How to design writing RAM program
How to design reading RAM program
65
6.1 Read/Write RAM Circuit:
Figure 6-1 is a block diagram of the circuit internal. The circuit uses
DO 15 to decide if the data from outport ports is data or address of RAM.
Nextthe circuit uses DO 14 to determine if the processor wants to read or
write. Finally,the processor can read RAM from the input port.The
following is the detail of signals:
. DO 1 5 : 1 ->ADDRESS, 0->DATA
. DO14: 1->READ, 0->WRITE
. D013-D08: ADDRESS
. DO7-DO0 : ADDRESS/DATA
. DI0-DI7: DATA (READ)
6.2 Writing RAM Program
Figure 6-2 is the writing RAM diagram. In order to write data
into RAM,
first, the program sends the address into the RAM through the two output
ports. Second, the program sends the data into the RAM through the
output portl. Third, the program sets the R/W signal to be low. Finally,
the processor writes data into the external RAM. The following is the
writing program:
LDA H-address ;Load the accumulator with
Hi-ADD
ORI COH ;SetD015 and R/W high
OUT2 A ;Output to the circuit
LDA L-address ;Load the accumulator
with Lo-ADD





OUT1 A ;Output the data to the RAM
LDAI 00H ;Clear the R/W signal
OUT1 A
6.3 Reading RAM Program
Figure 6-3 is the reading RAM timing diagram. In order to read data from
RAM, first, the program sends the address into the RAM through the two
output ports. Second, the program sets the R/W signal to be high. Finally,
the processor reads data from the external RAM through the input
port.The following is the program:
LDA H-address ;Load the accumulator with Hi-ADD
ORI COH ;Set DO 15 and R/W high
OUT2 A ;Ourput to the circuit
LDA L-address ;Load the accumulator with Lo-ADD
OUT1 A ;Output to the circuit
NOP ;Delay one clock
cycle time






























This chapter describes 22 test programs that check if each instruction is
working or not, and also shows the
simulation result to prove that the
CPU pipeline executes an instruction per clock cycle even if it executes
JUMP ,Condition Jump, CALL, and Return instructions. It will not have
any delay of executing instructions. The following is the
explanation of
each signal in the simulation results:
1. /CON(48): Clock signal
2./DO0 ~ /D07: Outputs of the output Register I.







10./P(0): Output of the Timer.
70
7.1 Test Program 01
This program tests 'NOP', 'LDA,#DATA,'0UT1 A, 'OUT2 A, 'STA
R5,#DATA, and 'JMP #DATA instructions. First, the program loads 00H
into the accumulator, and then the program transfers the values of the
accumulator to the two output registers. Second, the program loads FFH
into the accumulator, and then the program transfers the value of the
accumulator to the two output registers. Finally, the program repeats each
step for a loop. In conclusion, the two output registers produce a
rectangular waveform. Figure 7-1-1 shows the program. Figure 7-1-2
shows the simulation result.
7.2 Test Program 02
This program tests
'RRA'
instruction. First, the program loads 01H into
the accumulator. Second, the program rotates the accumulator right.
Third, the program loads the two output registers with the value of the
accumulator. Finally, the program repeats the steps from second to third.
In conclusion, the two output registers produce that only
one of eight bits
is high from b7 to bO. Figure 7-2-1 shows the program. Figure 7-2-2
shows the simulation result.
7.3 Test Program 03
This program tests
'RLA'
instruction. First, the program loads 01H into
the accumulator. Second, the program rotates the accumulator
left. Third,
the program loads the two output registers with the
value of the
accumulator. Finally, the program repeats the steps
from second to third.
In conclusion, the two output registers
produce that only one of eight bits
is high from bO to b7. Figure 7-3-1 shows
the program. Figure 7-3-2
shows the simulation result.
71
7.4 Test Program 04
This program tests 'CLR C and ADDI A, #DATA instructions. First, the
program loads 00H into the accumulator, and then clears the flag C.
Second, the program adds the value of the accumulator and one up, and
then loads the result of computation into the accumulator. Third, the
program loads the accumulator into the two output registers. Finally, the
program repeats each step again. In conclusion, the contents
of the output
registers increase one by one. Figure 7-4-1 shows the program. Figure
7-
4-2 shows the simulation result.
7.5 Test Program 05
This program tests 'SET C and 'SUBI A, #DATA instructions. First, the
program loads FFH into the accumulator, and then sets the flag C.
Second, the program subtracts the value of the accumulator with one, and
then loads the result of computation into the accumulator. Third, the
program loads the accumulator into the two output registers. Finally, the
program repeats each step again. In conclusion,
the contents of the output
registers decrease one by one. Figure 7-5-1 shows the program.
Figure 7-
5-2 shows the simulation result.
7.6 Test Program 06
This program tests 'XORI A, #DATA instructions. First, the
program
loads 00H into the accumulator. Second, the program
XOR the value of
the accumulator with FFH, and then loads the result
of computation into
the accumulator. Third, the program loads the
accumulator into the two
output registers. Finally, the program
repeats each step again. In
conclusion, the two output
registers produce a
rectangular wave form.
Figure 7-6-1 shows the program.
Figure 7-6-2 shows the simulation
result.
72
7.7 Test Program 07
This program tests 'ORI
A,#DATA'
and 'ANDI A,#DATA instructions.
First, the program presets the accumulator with 00H, and then loads the
accumulator into the output registers. Second, the program OR the
accumulator with FFH and AND the accumulator with FEH. Third, the
program loads the accumulator into the output registers. Finally, the
program repeats each step again. In conclusion, the two output registers
produce rectangular waveform, but bO of the output registers are always
low. Figure 7-7- 1 shows the program. Figure 7-7-2 shows the simulation
result.
7.8 Test Program 08
This program tests 'IN
A'
instruction. First, the program loads the
accumulator with the input register. Second, the program loads the
accumulator into the output registers. Finally, the program repeats each
step again. In conclusion, the output
registers produce the output that the
CPU fetched from the input port. Figure 7-8-1 shows the program. Figure
7-8-2 shows the simulation result.
7.9 Test Program 09
This program tests 'SET C, 'SET I', 'CLR F, 'CLR C, and 'CLR
M'
instructions. First, the program sets the flag I ,C, and M . Second, the
program clears the flag I, C , and M. Finally, the
program repeats each
step again. In conclusion,
the flag I, C, and M produces a
rectangular
waveform. Figure 7-9-1 shows the program
Figure 7-9-2 shows the
simulation result.
7.10 Test Program 10





First, the program presets the
accumulator with 00H and then loads the
73
register 0 with the accumulator. Second, the program presets the
accumulator with 01H and then loads the register 16 with the
accumulator. Third, the program presets the accumulator with 02H and
then loads the register 21 with the accumulator. Forth, the program
presets the accumulator with 03H and then loads the register 31 with the
accumulator. Fifth, the program loads the accumulator with the register 0
and then loads the output registerl with the accumulator. Sixth, the
program loads the accumulator with the register 16 and then loads the
output register I with the accumulator. Seventh, the program loads the
accumulator with the register 2 1 and the loads the output register I with
the accumulator. Eight, the program loads the accumulator with the
register 31 and then loads the output register I with the accumulator.
Finally, the program repeats the steps from fifth to eighth. In conclusion,
the output register I produces the output that is 00H, 01H, and 03H
alternately. Figure 7-10-1 shows the program. Figure 7-10-2 shows the
simulation result.
7.11 Test Program 11
This program tests ADDC A,
REGISTER'
instruction. First, the program
presets the register 0 with 01H and the accumulator with 00H. Second,
the program clears the flag C. Third, the program adds the
accumulator
with the register 0. Forth, the program loads the output registers
with the
accumulator. Finally, the program repeats the steps
from third to forth. In
conclusion, the output registers
increase one by one. Figure 7-11-1 show
the program. Figure 7-11-2 show the
simulation result.
74
7.12 Test Program 12
This program tests 'SUBC A,
REGISTER'
instruction. First, the program
presets the register 0 with 01H and the accumulator with FFH. Second,
the program sets the flag C . Third, the program subtracts the accumulator
with the register 0. Forth, the program loads the output registers with the
accumulator. Finally, the program repeats the steps from third to forth. In
conclusion, the output registers decrease one by one. Figure 7-12-1 shows
the program. Figure 7-12-1 shows the simulation result.
7.13 Test Program 13
This program tests 'XOR A,
REGISTER'
instruction. First, the program
presets the register 0 with FFH and the accumulator with 00H. Second,
the program XOR the accumulator with the register 0. Third, the program
loads the output registers with the accumulator. Finally, the program
repeats the steps from second to third. In conclusion, the output registers
produce rectangular output. Figure 7-13-1 shows the program.
Figure 7-
13-1 shows the simulation result.
7.14 Test Program 14




instruction. First, the program preset the register
0 with FFH, the register
1 with FFH, and the accumulator with 00H. Second,
the program loads
the output registers with the
accumulator. Third, the program OR the
accumulator with the register 1. Forth, the program
loads the output
registers with the accumulator. Finally,
the program repeats the
instructions from the address 0004H to the
address 000CH. In conclusion,
the output registers produce
rectangular output, but bO of the output
75
registers are always low. Figure 7-14-1 shows the program. Figure 7-14-2
shows the simulation result.
7.15 Test Program 15
This program tests 'JE
ADDRESS'
and 'CMPI A,#DATA instructions.
First, the accumulator is loaded with 66H. Second, the program compares
the accumulator with 66H. If the accumulator is equal to 66H, the
program will preset the accumulator with 66H
, otherwise, the program
will preset the accumulator with FFH. Third, the program loads the output
register I with the accumulator. Finally, the program repeats the steps
from second to third. In conclusion, the output register I produces FFH
and 66H output alternately. Figure 7-15-1 shows the program. Figure
7-
15-2 shows the simulation result.
7.16 Test Program 16





First, the accumulator is loaded with 66H. Second, the program compares
the accumulator with 66H. If the accumulator is not equal to 66H, the
program will preset the accumulator with 66H , otherwise, the program
will preset the accumulator with FFH. Third, the program loads the output
register I with the accumulator. Finally, the program repeats the steps
from second to third. In conclusion, the output register I produces
FFH
and 66H output alternately. Figure 7-16-1 shows the
program. Figure 7-
16-2 shows the simulation result.
7.17 Test Program 17





First, the program sets the flag C. Second,
if the flag C is high, the
program presets the accumulator with 66H, otherwise,
the program
presets the accumulator with 55H. Second, the
program loads the output
76
register I with the accumulator. Third, the program clears the flag C.
Forth, if the flag C is low, the program presets the accumulator with 55H,
otherwise, the program presets the accumulator with FFH. Sixth, the
program loads the output register I with the accumulator. Finally, the
program repeats the steps from first to sixth. In conclusion, the output
register I produces 66H and 55H output alternately. Figure 7-17-1 shows
the program. Figure 7-17-2 shows the simulation result.
7.18 Test Program 18





program presets the accumulator with FFH. Second, the program loads the
output register I with the accumulator. Third, the program jumps to the
address 0005H because of CALL instruction. Forth, the program preset
the accumulator with 00H. Fifth, the program loads the output register I
with the accumulator. Sixth, the program jump back the address 0004H
because of the RET instruction. Finally, the program repeats the steps
from first to sixth. In conclusion, the output register I produces 00H and
FFH output alternately. Figure 7-18-1 shows the program. Figure 7-18-2
shows the simulation result.






program sets the flag I. Second, the program presets the
accumulator with
00H. Third, the program loads the output register
I with the accumulator.
Finally, the program repeats the steps from
second to third. In conclusion,
the output register I produces 00H output. It will
produce FFH output for
short time when the interrupt (INT) signal is high. Figure
7-19-1 shows
the program. Figure 7-19-2 show the
simulation result.
77
7.20 Test Program 20





program loads the timer with FOH and then sets the flag M. Second, the
program loads the accumulator with 00H. Third, the program loads the
output register I with the accumulator. Finally, the program repeats the
steps from second to third. In conclusion, the output register I produces
00H output. It will produce FFH output for short time after the timer is
zero (after 240 clock cycles). Figure 7-20-1 shows the program. Figure 7-
20-2 shows the simulation result.
7.21 Test Program 21
This program tests
'HLT'
instruction. First, the program sets the flag I.
Second, the program loads the accumulator with 00H. Third, the program
loads the output register I with the accumulator. Finally, the program
repeats the steps from second to third. In conclusion, the program will
stop the clock when the INT signal is high.
Figure 7-21-1 shows the
program. Figure 7-21-2 shows the simulation result.





instructions. First, the program presets
the accumulator with FFH. Second, the program loads the PSW register
with the accumulator. Third, the program loads the accumulator
with the
PSW register. Forth, the program loads the output
register I with the
accumulator. Fifth, the program presets the
accumulator with 00H, and
then follows the steps from second to forth. Finally, the
program repeats
the steps from first to fifth. In conclusion,
because b6 and b7 of the PSW
register are always zero, the output
register I produces 00H and 3FH
output alternately. Figure 7-22-1 shows the














LOOP: LDAI #00H , Load 00H into the accumulator
OUT1 A
, Load the accumulator into the output register I
OUT2 A ; Load the accumulator into the output registerll
LDAI #FFH; Load FFH into the accumulator
OUT1 A; Load the accumulator into the output register I
OUT2 A; Load the accumulator into the output registerll
LDHI #00H; Load 00H into the HP register.
Figure 7-1-1 Test program 01
ADDRESS CODE LABEL LANGUAGE
0000 0000 NOP
000 1 1101 LDAI #0 1H; Load 00H into the accumulator
0002 0B00 LOOP: RRA ; Rotate the accumulator right
0003 1500 OUT1 A: Load the accumulator into the output
registerl
0004 1600 OUT2 A; Load the accumulator into the output
registerll
0005 1E00 LDHI #00H; Load 00H into HP
register
0006 0D02 JMP LOOP; Jump back the address 0002H







HOI LDAI #0 1H; Load 00H into
the accumulator







OUT1 A; Load the accumulator into output registel
OUT2 A; Load the accumulator into output registerll
LDHI #00H; Load 00H into HP register
JMP LOOP; Jump back the address 0002H












LDAI #00H; Load 00H into the accumulator
CLR C; Clear the flag C
LOOP: ADDI A,#01H; Add the accumulator and 1
OUT1 A ; Load the accumulator into the output register I
OUT2 A ; Load the accumulator into the output
registerll
LDHI #00H; Load 00H into the HP register
JMP LOOP; Jump back the address 0003H











LDAI #FFH; Load FFH into the
accumulator
SET C; Set the flag C
LOOP: SUBC A,#01H; Subtract the
accumulator with 1
OUT 1 A ; Load the
accumulator into the output register I
OUT2 A ; Load the
accumulator into the output registerll
LDHI #00H; Load 00H into the HP
register
80
0007 0D03 JMP LOOP; Jump back the address 0003H
Figure 7-5-1 Test Program 05
ADDRESS CODE LABEL LANGUAGE
0000 0000 NOP
0001 1100 LDAI #00H; Load 00H into the accumulator
0002 0AFF LOOP: XORI A,#FFH ; XOR the accumulator with FFH
0003 1500 OUT1 A ; Load the accumulator into the output register I
0004 1600 OUT2 A ; Load the accumulator into the output
registerll
0005 1E00 LDHI #00H; Load 00H into the HP register
0006 0D03 JMP LOOP; Jump back the address 0002H














LOOP: LDAI #00H; Preset the accumulator with 00H
OUT1 A ; Load the accumulator into the output
register I
OUT2 A ; Load the accumulator into
the output registerll
ORI A.#FFH; OR the accumulator with FFH
ANDI A,#FEH; AND the accumulator with FEH
OUT 1 A : Load the accumulator into the output
register I
OUT2 A ; Load the accumulator into
the output registerll
LDHI #00H; Load 00H into the HP
register
JMP LOOP; Jump back the address
000 1H
Figure 7-7-1 Test Program 07
81
ADDRESS CODE LABEL LANGUAGE
0000 0000 NOP
0001 1700 LOOP: IN A; Load the input register into the accumulator
0002 1500 OUT1 A ; Load the accumulator into the output register I
0003 1600 OUT2 A ; Load the accumulator into the output registerll
0004 1E00 LDHI #00H; Load 00H into the HP register
0005 0D03 JMP LOOP; Jump back the address 000 1H












LOOP: SET I: Set the flag I
SET C; Set the flag C
SET M; Set the flag M
CLR I; Clear the flag I
CLR C: Clear the flag C
CLR M; Clear the flag M
LDHI #00H; Load 00H into HP register
JMP LOOP; Jump back the address 000 1H







1 1 oo LDAI #00H; Preset the
accumulator with 00H
1 3()() STA RO; Load the




















#01H; Preset the accumulator with 01H
RIO; Load the register 16 with the accumulator
#02H; Preset the accumulator with 02H
RO; Load the register 2 1 with the accumulator
#03H; Preset the accumulator with 03H
RO; Load the register 3 1 with the accumulator
RO; Load the accumulator with the register 0
A; Load the output registerl with the accumulator
RIO; Load the accumulator with the register 16
A; Load the output registerl with the accumulator
R15; Load the accumulatorwith the register 21
A; Load the output registerl with the accumulator
R1F; Load the accumulator with the register 3 1
A; Load the output registerl with the accumulator
#00H; Load 00H into HP register
LOOP; Jump back the address 0009H








UOi LDAI #0 1H; Preset the
accumulator with 0 1H
1 3()0 STA RO; Load the
register 0 with the accumulator
1 1 oo LDAI #00H; Preset the
accumulator with 00H
2 100 CLR C; Clear the flag C
1 500 LOOP OUT 1 A ; Load the








0UT2 A ; Load the accumulator into the output registerll
LDHI #00H; Load 00H into the HP register
JMP LOOP; Jump back the address 0004H












LDAI #01H; Preset the accumulator with 01H
STA RO; Load the register 0 with the accumulator
LDAI #FFH; Preset the accumulator with 00H
SET C; Set the flag C
LOOP: SUBC ARO; Subtract the accumulatorwith register 0
OUT1 A ; Load the accumulator into the output register I
OUT2 A ; Load the accumulator into the output registerll
LDHI #00H; Load 00H into the HP register
JMP LOOP; Jump back the address 0004H










1 1 ff LDAI #FF; Load the accumulator with FFH
1 300 STA RO; Load the register 0 with the
accumulator
1 1 oo LDAI #00H; Load the
accumulator with 00H
0000 LOOP NOP
0AFF XOR A.R0; XOR the
accumulator with the register 0
1 500 OUT 1 A ; Load the
accumulator into the output register I
1600 OUT2 A ; Load the






LDHI #00H: Load 00H into the HP register
JMP LOOP; Jump back the address 0003H
















LDAI #FF; Load the accumulatorwith FFH
STA RO; Load the register 0 with the accumulator
LDAI #FF; Load the accumulatorwith FEH
STA Rl ; Load the register 1 with the accumulator
LOOP: LDAI #00H; Preset the accumulator with 00H
OUT1 A ; Load the accumulator into the output register I
OUT2 A ; Load the accumulator into the output registerll
OR A,R0 ; OR the accumulator with the register 0
ANDI A.R1 ; AND the accumulator with the register 1
OUT1 A ; Load the accumulator into the output register I
OUT2 A ; Load the accumulator into the output registerll
LDHI #00H; Load 00H into the HP register
JMP LOOP; Jump back the address 0004H







1E00 LDHI #00H; Load the HP
register with 00H
1 1 66 LDAI #66H; Preset the
accumulator with 66H
1066 LOOP3 CMPI A,#66H;











0F08 JE LOOP 1 ; Produce a jump when the flag Z is high
1166 LDAI #66H; Preset the accumulator with 66H
0D09 JMP LOOP2; Jump to the address 0009H
1 1 FF LOOP 1 : LDAI #FFH; Preset the accumulator with FFH
1 500 LOOP2 : OUT 1 A; Load the output register I with the accumulator
0D02 JMP LOOP3; Jump back the address 0002H














LDHI #00H; Load the HP register with 00H
LDAI #66H; Preset the accumulator with 66H
LOOP3 CMPI A,#66H; Compare the accumulator with 66H
NOP
NOP
JNE LOOP 1 ; Produce a jump when the flag Z is low
LDAI #FFH; Preset the accumulator with FFH
JMP LOOP2; Jump to the address 0009H
LOOP 1 : LDAI #66H; Preset the accumulator with 66H
LOOP2 : OUT 1 A; Load the output register I with the
accumulator
JMP LOOP3 ; Jump back the address 0002H
Figure 7-16-1 Test Program 16
ADDRESS CODE LABEL
LANGUAGE
Oooo 1E00 LDHI #00H;















L00P5:SET C; Set the flag C
JC LOOP1: Produce a jump when the flag C is high
LDAI #55H; Preset the accumulator with 55H
JMP LOOP2; Jump to the address 0006H
LOOP 1 : LDAI #66H; Preset the accumulator with 66H
LOOP2: OUT1 A; Load the output registerl with the accumulator
CLR C; Clear the flag C
JNC L00P3. Produce a jump when the flag C is low
LDAI #55H; Load the accumulator with 55H
JMP LOOP4; Jump to the address 000CH
LOOP3 : LDAI #FFH; Preset the accumulator with 66H
LOOP4: OUT1 A; Load the output registerl with the accumulator
JMP LOOP5 ; Jump back the address 000 1H











1 E00 LDHI #00H; Preset the HP register with 00H
1 1 FF LOOP2 : LDAI #FFH; Preset the accumulator
with FFH
1 500 OUT1 A; Load the output registerl
with the accumulator
1 805 CALL LOOP 1 ; Jump to the address 0005H
0D0 1 JMP LOOP2 ; Jump back the address
000 1H
1 1 00 LOOP 1 : LDAI #00H; Preset
the accumulator with 00H
1 5()() OUT 1 A; Load the
output registerl with the accumulator
1900 RET ; Return to
the address 0004H
Figure 7-18-1 Test Program 18
87
ADDRESS CODE LABEL LANGUAGE
0000 1E00 LDHI #00H; Preset the HP register with 00H
0001 2001 SET I; Set the flag I
0002 1100 LOOP 1: LDAI #00H; Preset the accumulator with 00H
0003 1500 OUT1 A; Load the output registerl with the accumulator
0004 0D02 JMP LOOP 1; Jump back the address 0002H
The following program is for the interrupt
0007 1 IFF LDAI #FFH; Preset the accumulator with FFH
0008 1 500 OUT1 A; Load the output registerl with the accumulator
0009 2500 RET I; Return to the address where is interrupted













1 E00 LDHI #00H; Preset the HP register with 00H
14F0 LDMI #F0H; Preset the timer with FOH
240 1 SET M; Set the flag M
1 1 00 LOOP 1 : LDAI #00H; Load the accumulator with 00H
1 5oo OUT1 A; Load the output registerl with the
accumulator
0D03 JMP LOOP 1 ; Jump back the address 0003H
The following is the timer program
1 1 ff LDAI #FFH; Load the
accumulator with FFH
1 5oo OUT 1 A : Load the
output register I with the accumulator
1 4F() LDMI #F0H; Load the
timer with FOH
26oo RETM ; Return to
the address interrupted.
Figure 7-20-1 Test Program 20
88
ADDRESS CODE LABEL LANGUAGE
0000 1E00 LDHI #00H; Preset the HP register with OOH
0001 2001 SET I; Set the flag I
0002 1100 LOOPLLDAI #00H; Preset the accumulator with OOH
0003 1500 OUT1 A; Load the output register with the accumulator
0004 0D02 JMP LOOP1; Jump back the address 0002H
The following is the interrupt program
0007 1B00 HLT ;Stop the clock
Figure 7-21-1 Test Program 21
CODE LABEL LANGUAGE
LDHI #00H; Preset the HP register with OOH
L IFF LOOP LDAI #FFH ;Preset the accumulator with FFH
STP ;Load the PSW register with the accumulator
LDP ;Load the accumulator with the PSW register
OUT I A; Load the output registerl with the
accumulator
LDAI #00H; Preset the accumulator with OOH
STP ;Load the PSW register with the
accumulator
LDP :Load the accumulator with the PSW
register
OUT 1 A; Load the output registerl with the
accumulator
JMP LOOP ; Jump back the address 000 1H












































o o o o






























































i ^ i ' I
^
l M l5 e + +
llpj o
| ^































J J J J





















































































































































































































































































































































+ + + + + + ? +
o
p



















5 + + + + + + +
+
i o





t- + ? + +
i o
|








































































m CN ~H o
n O o O O o c )





























































































































































This chapter describes every logic circuit in this processor. All logic
circuits are level-clocked so they can alleviate clock skew and race
problems.
112
8.1 D flip-flop [3]
Figure 8-1 shows a D master-slave flip-flop. A master-slave flip-flop is a
combination of two clocked latches; the first is called the master, and the
second is the slave. While the clock is high
,
the master is active and the
slave is inactive. While the clock is low, the master is inactive and the
slave is active. Therefore, the D flip-flop fetches an input signal during
the negative half cycle of the clock and changes its output after the clock
goes high. The D flip-flop can be controlled without considering the clock
by using the SET and CLEAR signal. When the SET signal is low, the
two latches will become to high immediately. The two latches will
become to low when the CLEAR signal is low.
8.2 16-bit Register I
Figure 8-2 shows a 16-bit register. This register is for MEM32 circuit.
There are 16 D flip-flops in this register. Each D flip-flop's output is
connected a three-state inverter. While the ENC signal is high, the register
cannot be used to store and output data. While the ENC is low ,and ENI is
high, the register will store data from input signal (DI0-DI15) during
negative half cycle of the clock. The register will show up its contents
when the ENC and ENO are both low.
8.3 16-bit Register II
Figure 8-3 shows the other 16-bit register. This
register is for SUB-1
circuit. Its structure is almost same with the other but there is
not the ENC
signal in this register and the outputs from DO
10 to DO 15 are not
controlled by the ENO signal.
113
8.4 8-bit Register I
Figure 8-4 shows an 8-bit register. This register is for MEM32 circuit. Its
structure is same with the 16-bit register I
,
but just it has 8 D flip-flop
inside the register.
8.5 8-bit Register II
Figure 8-5 shows the other 8-bit register. This register is for the
ADDRESS REGISTER, REGISTER D, REGISTER B, REGISTER C ,
INPUT REGISTER, two OUTPUT Registers , and the accumulator
circuits. Its structure is same with the 16-bit register II, but it just has 8 D
flip-flops inside the register.
8.6 6-bit Register
Figure 8-6 shows a 6-bit register. This register is for the CONTROL
UNIT circuit. Its structure is same with the 16-bit register II, but it just has
6 D flip-flops inside the register.
8.7 4-bit Register
Figure 8-7 shows a 4-bit register. This register is for the STACK
POINTER circuit. Its structure is same with the 6-bit register, but it just
has 4 D flip-flop inside the register.
8.8 Instruction Register
Figure 8-8 shows an instruction register. This
register is for the
CONTROL UNIT circuit. Normally, this register fetches
an instruction
from external memory. While the INT
signal is high, the register will
fetch 39 instead of external memory. While
the TIMER signal is high, the
register will fetch 40 instead of external
memory.
8.9 MEM32
Figure 8-9 shows the MEM32 circuit.
This memory circuit is the
general
registers. There are 32 8-bit registers
in this circuit so the CPU has 32
114
general registers. The signals A0-A4 are used to select the general
register wanted.
8.10 Accumulator Circuit
Figure 8-10 shows the accumulator circuit. The following is the
explanation for SI, S2, and S3 signals.
1. While the SO is high, the accumulator will shift right one bit.
2. While the S 1 is high, the accumulator will shift left one bit.
3. While the S2 is high, the accumulator will store the data from the input
signal.
Therefore, only one of SO, S 1, and S3 signals can be high at once.
8.11 PSW Circuit
Figure 8-11 shows the PSW circuit. This circuit is for the PSW register. It
stores each flag status. The following is the explanation for some control
signal:
1. While the A signal or the B signal is high, the flag C can be changed.
2. While the B signal is high, the flag C , V, and S can be changed.
3.While the C signal is high, the flag Z can be changed.
4. While the D signal is high, the flag M can be changed.
5.While the E signal is high , the flag I can be changed.
6.While the ENI is high, every flag will be loaded with D0-D5. This
signal is for the STP instruction.
8.12 IT Circuit
Figure 8-12 shows an IT circuit. This circuit
is for CONTROL UNIT
circuit. The function is the following:
1 . The INT and TIMER signal can be
masked by the flag I and M.
2. The INT signal is prior to the TIMER
signal.
The following is the explanation
of some signals
115
1. The INT signal is from external interrupt signal.
2. The I signal is the flag I.
3. The TIMER signal is from the internal timer.
4. The M signal is the flag M.
If the INT signal and the TIMER signal go high at same time, Only the
INT signal will active.
8.13 MEM16 Circuit
Figure 8-13 shows the MEM32 circuit. This memory circuit is for the
stack registers. There are 16 16-bit registers in this circuit so the CPU has
16 stack registers. The signals A0-A3 are used to select the stack register
wanted.
8.14 JK flip-flop [3]
Figure 8-14 shows a JK master-slave flip-flop. Its structure is almost same
with the D master-slave flip-flop, but just it has feedback from its output.
8.15 16-bit Counter [3]
Figure 8-15 shows the 16-bit counter circuit. This counter is for the
PROGRAM COUNTER. There are 16 JK flip-flops in this counter.
Normally, the contents of the counter increase one by a clock cycle.
When
the LOAD signal is high, the counter will be loaded with a value
(L0-L15).
8.16 8-bit Counter [3]
Figure 8-16 shows an 8-bit counter circuit. This
counter is for the TIMER
circuit. Its structure is same with the 16-bit counter,
but just it has 8 JK
flip-flops.
8.17 Program Counter
Figure 8-17 shows the program
counter circuit. The following is
explanation for some control signal:
116
1. While the S3 goes high, the counter will be loaded with a value
(L0-L15).
2. While the S2 goes high
,
the counter will be loaded with 001 OH (Timer
interrupt)
3. While the SO goes high
,
the counter will be loaded with 0007H ( INT
interrupt).
Therefore, only one of SO, S2, and S3 signals can go high at once.
8.18 Timer Circuit
Figure 8-18 shows the Timer circuit. When the 8-bit counter counts to
zero, the Z0 signal will go high.
8.19 Full Adder, XOR, AND, and OR Circuits [3]
Figure 8-19
, 8-20, 8-21, and 8-22 show a full adder, a XOR , an AND,
and an OR circuit. Those circuits are for the ALU circuit.
8.20 16-bit Adder and Subtracter
Figure 8-23 shows a 16-bit adder and subtracter circuit. This circuit is for
the SUB-1 circuit. There are 16 adders in this circuit. When the
SUB-
ADD goes high, the circuit will be a 16-bit subtracter. The
circuit will be
a 16-bit adder while the
SUB-ADD goes low.
8.21 8-bit Adder and Subtracter
Figure 8-24 shows an 8-bit adder and
subtracter circuit. This circuit is for
the ALU circuit. Its structure is same
with the 16-bit adder and subtracter ,
but just has eight adders
8.22 Stack Pointer
Figure 8-25 shows the stack
pointer circuit. While the SUB-ADD signal is
high, the contents of this
pointer will decrease one. In contrast, the stack
pointer will increase
one when the
SUB-ADD signal is low. The CIN
117
signal controls the output of the pointer. While the CIN is high, the output
will decrease one. If the CIN is low, the output will not change.
8.23 ALU
Figure 8-26 shows the ALU circuit. There are an adder and subtracter, a
XOR, an OR ,and an AND circuit in this circuit so the ALU can do
addition, subtract, AND, OR ,and XOR operations. The following is the
explanation for some control signals:
1. While the SO, SI, and SUB-ADD signals are low, the ALU does add
operation.
2. While the SO and SI signals are low, and the SUB-ADD signal is high,
the ALU does subtract operation.
3.While the SO is low and the SI is high , the ALU does OR operation.
4. While the SO is high ,and the SI is low, the ALU does AND operation.
5. While the SO and SI are high, the ALU does XOR operation.
8.24 SUB-1
Figure 8-27 shows the SUB-1 circuit. This circuit is to help the stack
stores the right address when the CALL , INT , and
TIMER instructions
are executed. When the CPU executes the CALL instruction,
the SUB-1
signal must go high in order to increase the output
of the 16-bit register
one. If the SUB-1 signal is low , the output
will not change.
8.25 Control Unit
Figure 8-28 shows the Control Unit
circuit. The three stages of the
four-
stage pipeline are in this
circuit. They are EX1, EX2, and
EX3 stages.
During the EX1 stage,
the Instruction Register loads an
instruction from
the external memory ,
decodes the instruction ,and then sends the
correct
control signals out. During the EX2 stage,
a 6-bit register loads an
instruction from the Instruction
register and then does the same job with
118
the EX1 stage. Finally, during the EX3 stage, a 6-bit register loads an
instruction from the 6-bit register in the EX2 stage and then does the same
job with the EX1 stage. It takes three clock cycles for an instruction to be
loaded into the register in the EX3 stage from the external memory. In
order to see the circuit clearly, this circuit is divides three parts. Each part
represents a stage. Figure 8-29 shows the EX1 stage circuit. Figure 8-30
shows the EX2 stage. Figure 8-3 1 shows the EX3 stage.
8.26 16-bit RISC Processor [3]
Figure 8-32 shows the 16-bit RISC processor circuit. The following is the
explanation for every symbol numbered:
Ml. Stack Pointer
M2. 16 Stack registers (MEM 16 circuit)
M3. Program Counter
M4. ROM for test program ( not part of the circuit).
M7. Address Register
M8. General Registers (MEM32 circuit)
M9. Control Unit
M10. HP Register (8-bit Register II)
Mil. Accumulator
M12. ALU
M13. Register B (8-bit Register II)
M14. Timer
M15. PSW Register
M16. Output Register I (8-bit
Register II)
M17. Output Register II (8-bit
Register II)
Ml8. Input Register (8-bit
Register II)
M19. Register D (8-bit
Register II)
119
M20. Register C (8-bit Register II)
M21. INT and Timer Circuit (IT circuit)
M22. SUB-1 circuit
On the right side of each symbol numbered is output pins, and on the left
side is input pins. There are four kinds of buses inside the circuit. The
following shows each bus and symbols connected to the bus.
l.BUS A: Output data ofM9, M7 input, M20 input, M14 input, M3 input,
M10 input and output, M2 output, M15 input.
2. BUS B: M8 output, M20 output, M19 input, M13 input.
3. BUS C: M19 output, Mil input, M12 output. M18 output, M15 output.
4. BUS D: Ml 1 output, M8 input, M16 input, M17 input, M15 input.
The BUS A is used during the EX1 stage, the BUS B is used during the
EX2 stage, and the BUS C and the BUS D are used during the EX3 stage.
In order to see the circuit clearly, this circuit is divided into four parts that










































































































































































































































































































..iiiiiiiiiiijiiiil|| i iui iiiiiiiiii 51 siiia i iiiiii
iiiSiiiiiiiiiiiiffllUiaii.jiiiaiiiiihiiil -IfffffffTffHTTTrfrrr1 =' ?tTff ?f f TTTTTTTfT?ff T
















































































































































































































































































































































































































i I a a 1 13






i . - ,
s = i s a s *
*
! s s a s 5 o a o o o o





























































































24, JAN, 1994 151
152
























This chapter describes VLSI design by using CMOS technology. The
delay time of each circuit is calculated, and then the
execution speed of
this CPU chip was computed. Finally, how to speed up
this CPU chip are
presented. The size of this chip is 10535um X
14677um. There are
24,982 transistors on this chip.
157
9.1 VLSI Design [4] [5]
All circuits in this processor are built by using inverters , 2-input NAND
gates, 3 -input NAND gates, 6-input NAND gates, 2-input NOR gates,
3-
input NOR gates, 6-input NOR gates, and Tri-state inverters. These gates
can easily be extended to build more complex logic gates.
Figure 9-1
shows the inverter circuit. Figure 9-2 shows the Tri-state inverter. Figure
9-3 shows the three-input NOR circuit. Figure 9-4 shows the two-input
NOR gate. Figure 9-5 shows the six-input NAND circuit. Figure 9-6
shows the six-input NOR circuit. Figure 9-7 shows the three-input NAND
circuit. Figure 9-8 shows the two-input NAND circuit.
9.2 Delay Time of Each Circuit [5]
Table 9-1 lists the delay time of each circuit. During the
IF stage, the
longest delay time is 1.26ns of the Program
Counter. During the EX1
stage
,
the maximum delay time is 18. 19 ns when the Stack
stores the data
from the SUB-1 circuit. During the EX2 stage, the
longest delay time is
10.07 ns while the data is transferred from the
General Registers to the
Register D or the Register B. During the EX3 stage, the
longest delay
time is 15.69 ns while the General Register is
loaded the data from the
accumulator. Therefore, the period of the half
cycle of the clock must be
greater than 18.19 ns. The processor can
run about 16.67MHz while the
period of the clock is 60 ns. The
performance is 16.67 million instructions
per second (MIPS).
9.3 Speed up the
Processor [4] [5]
The following are some ways
to speed up the
processor:
1. Using dynamic CMOS




2. Using the look-ahead carry technology to speed up the 16-bit Adder
and Subtracter circuit and the 8-bit Adder and Subtracter circuit.
3. Adding some powerful drivers on the inputs of Stack and General
Register circuits.
4.Trying to reduce the length of buses as soon as possible.
After done this improvement
,
the CPU could be run at 40 MHz.
Table 9-1 Delay Time ofeach Circuits












4-bit adder and subtracter
8-bit adder and subtracter






























INT and Timer 1.91















































































































The processor has been designed and simulated successfully. The
RISC processor executes every instruction in a clock cycle even if it
executes JUMP, Condition Jump, CALL, and Return instructions. This is
the advantage of this processor in comparison with other RISC processors.
The most difficult part of this thesis is the Control Unit design because
the Control Unit includes three stages of the pipeline such as stages EX1,
EX2, and EX3 . A designer must know how to arrange an instruction into
each stage without any delay of executing other instructions. According to
the simulation and the calculation, the processor can run at 16.67 MHz.
The performance is 16.67 million instructions per second (MIPS). The
size of this chip is 10535um X 14677um, and there are 24,982 transistors
on this chip that consumes 200 mw.
Finally, the following are some suggestions to speed up the processor
and reduce the size of this chip:
1. Use dynamic CMOS logic instead of static CMOS logic.
2. Use a smaller feature size, for example 0.8 urn instead of 2 urn. It will
shrink the size considerably.
3. Use the look-ahead carry technique to speed up
the Adder and
Subtracter circuits.
4. Add drivers on the inputs of Stack and General
Register circuits.
5. Reduce the length of buses as much as
possible to increase the
bandwidth.
These suggested improvements may increase the CPU
speed to the
rate of 40 MHz.
169
References
[1] Daniel Tabak., Reduced Instruction Set Computer -RISC-
Architecture, Published by Letchworth, Hertfordshire, England : Research
Studies Oress ; New York : Wiley, cl987.
[2] Daniel Tabak., RISC Systems, Published by Taunton, Somerset,
England : Research Studies Press ; New York ; Wiley, cl990.
[3] Albert Paul Malvino, Ph.D...Digital Computer Electronics: An
Introduction to Microcomputers. Second Edition, Published by McGraw-Hill,
Inc. C1983.
[4] Amar Mukherjee., Introduction to NMOS and
CMOS System
Design, Gordon Osbourne, cl986.
[5] John P. Uyemura., Circuit Design for
CMOS VLSI., Kluwer
Academic Publishers, c 1992.
170
