Clown: a Microprocessor Simulator for Operating System Studies by Zinoviev, Dmitry
ar
X
iv
:1
20
7.
51
76
v1
  [
cs
.O
H]
  2
1 J
ul 
20
12
San Jose, July 2004 Proceedings of Summer Computer Simulation Conference
Clown: a Microprocessor Simulator for Operating System
Studies
Dmitry Zinoviev
Computer Science Department, Suffolk University
32 Derne St., Boston, MA, 02114 USA
Dmitry@mcs.suffolk.edu
Keywords: operating system, microprocessor, virtual
machine, assembly language
Abstract
In this paper, I present the design and implementation
of Clown — a simulator of a microprocessor-based com-
puter system specifically optimized for teaching operat-
ing system courses at undergraduate or graduate levels.
The package includes the simulator itself, as well as a col-
lection of basic I/O devices, an assembler, a linker, and a
disk formatter. The simulator architecturally resembles
mainstream microprocessors from the Intel 80386 family,
but is much easier to learn and program. The simulator
is fast enough to be used as an emulator — in the direct
user interaction mode.
A NEED FOR A SIMULATOR
An important part of the agenda of a college-level
operating system course is to examine the interaction be-
tween an operating system and computer hardware. As-
sembly programming teaches students to think logically,
waste no byte and no CPU cycle. Knowing the hard-
ware helps students to understand the operation of such
foundational mechanisms as memory protection, process
dispatching, input/output, and file system organization.
It also makes it clearer the motivations behind certain
OS design decisions. Last, but not least, from the prac-
tical point of view, exposing students to low-level pro-
gramming prepares them for potential projects involving
embedded systems and hand-held devices.
Elements of low-level assembly programming can be
also found in computer architecture courses. Many uni-
versities1 continue to offer general assembly program-
ming courses where students learn how to extract the
ultimate performance from the computer hardware.
Traditionally, colleges have been using various RISC
architectures (such as MIPS or RS6000) or Motorola 68x
family as their primary hardware platforms. RISC cores
are reasonably simple and regular. However, this trend
1Suffolk University being one of them.
seems to be rapidly disappearing in favor of the indus-
trial mainstream Intel32 architecture. It should be also
noted that from the OS development point of view, RISC
cores lack many important features, such as segmen-
tation (for superior memory protection) and non-trap-
based system call support.
On the other hand, Intel32 CISC architecture is hard
to learn. The instruction set is redundant, and the in-
struction format is highly irregular. This makes Intel32
system programming challenging, especially for under-
graduate students. A need clearly exists for a good
microprocessor simulator that could be used in an OS
course (and possibly in other related courses).
EXISTING SIMULATORS
Many microprocessor simulators have been devel-
oped, but most of them do not address the topic from
the OS study point of view.
Some of them simulate RISC or otherwise “inap-
propriate” targets (e.g., Ant-32 [4], MicSim [6], Micro-
processor Trainer Simulator [2], and various Intel 8085
simulators, such as described in [11]).
Other simulators are too detailed (such as
VMware [12] and SID [8]). They are simulating the com-
puter hardware as close as possible, thus defeating the
whole purpose of using a simulator in an undergraduate-
level class. On the other hand, many simulators designed
for educational purposes, are oversimplified (MSFB [1],
also [5] and [2]). Being good for an introductory com-
puter hardware course, they fail to provide substantial
mechanisms for building advanced operating systems.
To summarize, existing simulators are either opti-
mized to be used in industry or in a hardware-oriented
course [7], but not in a “classic” OS course [9], or they are
intentionally hiding hardware from the upper OS layers.
A wish list for an OS-optimized simulator includes
the following requirements:
• Rich support for OS concepts.
• Little or no support for application-specific features,
99
Proceedings of Summer Computer Simulation Conference San Jose, July 2004
WR/ID RD
timer
WR/ID RD READY
DMA controller
CTRL STS DATA
hard disc drive
WR/ID RD READY
terminal
I/O BusSystem Bus
EhDhChBhAh9h8h7h6h5h4h3h2h1h0h
IRQ=9h
IRQ=8h IRQ=Bh
IRQ=Ah
(256)
x 1024 words)
RAM (up to 2 Mframes 
ca
ch
e
CLOWN CPU
Bridge (implied)
Ports
Figure 1. Clown system architecture
such as string operations and floating-point unit (to
reduce complexity and learning time).
• Reasonably detailed simulation (to make sure that
the simulator could also be used in a computer ar-
chitecture course).
• A collection of basic I/O devices, with a mechanism
for adding more devices, if needed.
• Fast (preferably real-time) execution, ideally in real
time emulation mode.
• A simple interface.
• A substantial set of development tools (such as as-
sembler, linker, disk editor, debugger, C compiler).
To satisfy these requirements, I developed Clown
— a new simulator of an Intel-style microprocessor and
computer system specifically tuned to the needs of the
courses mentioned above.
CLOWN OVERVIEW
Clown simulator suite is partially based on the Sim-
ple Hard Disk Emulator (SHaDE [13]) written at Suffolk
University as a simple vehicle for teaching the low-level
organization of file systems.
The system architecture of Clown is shown in Fig-
ure 1.
The simulator consists of a Clown CPU with a
single-level direct-mapped write-back cache (included to
simulate DMA transfers accurately), one bank of 32-bit
non-interleaved memory, 32-bit system and I/O buses
with an implied bridge (the bridge is not simulated, and
both buses are treated as one bus), 256 I/O ports (Intel
has 65,536 ports), 16 interrupt channels, and one DMA
channel (Intel typically has 7 DMA channels). Four basic
I/O devices are included in the standard configuration.
The architecture of the Clown CPU is shown in Fig-
ure 2.
The CPU has sixteen 32-bit general-purpose regis-
ters (Intel: 8 GPR and two control registers, CR0 and
CR3), eight 32-bit segment registers2 (Intel: 6 segment
and 4 memory management registers), one 16-bit flag
register, an instruction register and a program counter.
A 16-entry direct-mapped Translation Look-aside Buffer
contributes to the accuracy of DMA transfers (Intel
80386 has a 32-entry 4-way set-associative TLB). There
is no dedicated stack pointer register, page table base
register, and page fault address register. Their functions
are assigned to general-purpose registers %R13 throught
%R15.
Clown supports only one data type: signed 32-bit
word (for comparison, Intel supports at least 12 data
2This number is redundant and can be reduced to six.
100
San Jose, July 2004 Proceedings of Summer Computer Simulation Conference
IR PC
I O S Z CCPL IOPL
Flags
%R0
%R1
%R2
%R3
%R4
%R5
%R6
%R7
%R8
%R9
%R10
%R11
%R12
General Purpose Registers
%R13 (%SP)
%R14 (%PAGE)
%R15 (%FAR)
%ISR
%GDT
%LDT
%CS
%SS
%DS
%ES
%FS
Segment registers
Linear Address Physical Address W P
.....................
TLB (16 entries)
Figure 2. Clown CPU architecture; shaded registers
and flags are not accessible from programs
types [10]). This feature drastically simplify system pro-
gramming. On the other hand, it poses interesting chal-
lenges to compiler developers (such as type representa-
tions and conversions, and implementation of floating
point arithmetics).
The Clown CPU supports both paging and segmen-
tation. Either memory organization mechanism can be
turned off (by disabling the page table or by declaring
all memory to be one big implicit segment).
In the Intel architecture, an interrupt vector (IV)
is always treated as an array of segment descriptors,
each identifying an entry into an interrupt service rou-
tine (ISR). In pure paging mode, it would be highly de-
sirable to have no segments whatsoever, including the
ISRs. This is accomplished by forcing all ISRs to be
8-word aligned, and treating the least significant bit of
an IV entry as a mode bit. When this bit is clear, the
IV entry is treated as a segment descriptor. If the bit
is set, the entry is treated as the direct address of the
entry point followed by two protection bits. While this
approach does not seem elegant enough, it nevertheless
allows the development of segment-free operating sys-
tems.
Compared to i386, Clown has significantly fewer in-
structions, which reduces the learning time (Table 1).
Explicitly omitted are data conversion instructions, dec-
imal arithmetics, address manipulation, string, and
translation instructions, and high-level language support
instructions.
A Clown instructions consist of either one or two
words. The second word, if present (recognized by the
MSB of the first word, or by the “x” prefix in the
mnemonics), is always the immediate operand.
The number of flags has also been minimized. There
are only 7 externally visible flags: Carry, Zero, Sign,
Overflow, Interrupts (enabled), and two I/O Privilege
Level flags (compared to 13 flags in Intel 30386).
Clown runs a simple fetch-decode-execute loop. The
execution of each instruction takes exactly one Clown cy-
cle. External interrupts are reported and queued at the
end of a cycle. Nested interrupts are permitted, with
high-precedence interrupts preempting low-precedence
interrupts. This simulation model may change in the
future to better reflect modern pipelined architectures
and their impact on process context switches.
PERIPHERAL DEVICES
Currently, Clown has four peripheral devices: in-
terval timer, terminal, hard disk controller, and direct
memory access (DMA) controller. Each device has a
configurable I/O base and a configurable IRQ channel.
All devices can operate in both polling and interrupt
modes.
The interval timer works both in interval and single-
shot modes. It generates an interrupt upon expiration,
and can be stopped at any time. The following assembly
code fragment programs the timer to expire (once) in
1000 cycles:
#include "config.h"
; reset timer
out 1, (IOBASE TIMER + 0)
; set the counter and trigger timer
out 1000, (IOBASE TIMER + 0)
; wait for an interrupt
hlt
The terminal combines a keyboard and a sequen-
tially accessible (not memory-mapped) display. When
in the interrupt mode, it generates interrupts on
keystrokes. The terminal does not echo characters (echo-
ing is left to the programmer).
The hard disk controller carefully simulates the me-
chanical behavior of a relatively simple hard disk (includ-
ing track-to-track and maximum seek latency, and rota-
tional latency). The dynamic parameters of the disk are
run-time configurable. Inter-sector gaps make it possible
to optimize file systems for high-speed streaming opera-
tions. When in the interrupt mode, the controller gen-
erates interrupts on completion of seek, read, and write
operations. A one-block read-write buffer is prehistori-
cally tiny, but yet sufficient to study the foundations of
disk I/O subsystem. At most one I/O request can be
pending at any time, so no disk scheduling is provided.
The DMA controller is the most intelligent periph-
eral device. Clown carefully simulates DMA transfers;
data are transferred only when the bus is not used by
the main CPU. A transfer unit is fixed and equal to
one disk block (one virtual memory page). When in the
101
Proceedings of Summer Computer Simulation Conference San Jose, July 2004
Table 1. Comparison of the Clown and i386 instruction sets
Group i386 Clown Group i386 Clown
Data movement 8 13 Arithmetic 12 12
Shift / Rotate 12 8 Logical 6 11
Bits and Bytes 39 8 Flag control 11 6
Processor control 4 3 Flow control 77 18
Memory protection 1 5 I/O 4 3
Other 25 0
Total 200 87
interrupt mode, the controller generates an interrupt on
completion of the transfer (which happens after the com-
pletion of the respective disk read operation or before the
completion of the respective disk write operation).
The DMA controller works concurrently with the
rest of the Clown system. Its implementation as a part of
the main fetch-decode-execute loop would involve com-
plex serialization and synchronization issues. For in-
stance, one Clown out instruction triggers a disk-to-
memory transfer which takes a significant and uncertain
number of instructions to complete (due to seek and ro-
tational latencies). Calling a read sector function is not
an option, because it would hinder the main loop.
As a result, the controller is implemented using
µCVM (Microcontroller Virtual Machine) to enable true
concurrent execution of the main simulator and the sim-
ulator of the controller and also to make the controller
potentially reconfigurable. µCVM is a “micro-Clown”:
it has 8 general-purpose registers, a one-bit flag register,
and a program counter. The instruction set consists of
10 commands (see Table 2).
The main loop of the simulator first executes the
next Clown instruction. If it is not a memory reference
or an I/O instruction, then the next µCVM instruction
is executed. Otherwise, the next µCVM instruction is
R0 R1
R2 R3
R4 R5
R6 R7
EQPC
R
O
M
ho
st 
de
vi
ce
uCVM
Clown bus
Figure 3. µCVM system architecture
executed only if it is not a memory reference or an I/O
instruction.
The main program of the µCVM that controls both
disk-to-memory and memory-to-disk transfers, fits in
just 132 bytes of the controller memory.
The code simulating the peripheral devices is orga-
nized as dynamically loaded libraries (one device per li-
brary). This organization turned out to be flawed: while
it did not significantly contribute to the reconfigurabil-
ity of the simulator, it undermined its integrity. In the
future releases, all code pieces will be linked together.
ASSEMBLY LANGUAGE
Clown assembly (cas) language uses a mixture of
“Intel-style” and “MIPS-style” syntax. It has 53 com-
mands and 90 modifications. The language allows deci-
mal, octal, and hex numbers (in prefix and postfix nota-
tion), and ASCII characters and strings. Because Clown
does not have a byte data type, characters and strings
are translated into words and arrays of words, one char-
acter per word. This cumbersome conversion leads to
“sparse” strings and poor memory utilization, but sig-
nificantly reduces the number of machine instructions.
Before assembling, the source code is run through a
standard C preprocessor (cpp).
The cas assembler supports multiple segments (if
needed) and global symbols. It can produce raw “bin”
executable files, without headers and symbol tables, and
structured multi-segment “exe” files, with symbol tables
and provisions for further linking with other files of the
same kind, using the Clown linker (clink). “exe” files are
fully relocatable. The displacement of the entry point
into a “bin” file can be specified at the assembly time.
“Bin” files can be used as directly loadable ROM/RAM
images. Clown simulator can simultaneously load sev-
eral executable images (for instance, to simulate several
processes without writing an OS loader).
So far, Clown does not include a run-time loader.
Loaders are heavily OS-dependent, and should be writ-
ten by OS developers.
102
San Jose, July 2004 Proceedings of Summer Computer Simulation Conference
Table 2. µCVM Instruction Set
Opcode Instruction Description Opcode Instruction Description
Single-word instructions Double-word instructions
Arithmetic and control instructions (AC)
0h NOP Do nothing 4h xMOVI reg val Store a constant
1h JEQ dest Conditional jump 5h xADDI reg val Add a constant
2h JMP dest Unconditional jump 6h xCMPI reg val Compare
3h END Stop the VM 7h reserved
I/O and memory instructions (IOM)
8h OUT port reg Output to the port Ch xOUTI port val Output a constant
9h IN port reg Input from the port Dh reserved
Ah ST [reg] reg Store indirectly Eh reserved
Bh LD [reg] reg Load indirectly Fh reserved
FEASIBILITY AND PERFORMANCE
EVALUATION
The Clown architecture is meant to be feasible in
the sense that, whether a need arises to implement it in
either FPGA or directly in hardware, it will not pose sig-
nificant risks and challenges. This estimation is based on
the author’s experience with the FLUX superconductor
microprocessor design [3].
In the experiments, the Clown system simulated 4
million instructions per second (4 MIPS) on a 1.3 GHz
Pentium host CPU (native performance 2600 MIPS).
The following code was used for performance evaluation:
mov %r1, 10000000
again: dec %r1
jnz again
stop
This performance is roughly equivalent to Intel 8086
(4.77 MHz), which is reasonably good for real-time user
interaction.
CLOWN IN A CLASSROOM
The Clown system was “field tested” in an under-
graduate Operating Systems course during the Spring
2004 semester. The following eight assignments were
given throughout the semester to the students who had
taken an introductory assembly language course based
on Intel 8086 architecture. For each assignment, the
typical length of the program code in non-commented
lines of code (NLOC) is given.
• kputs — display a character string, using the ter-
minal; 60 NLOCs
• boot — load the first sector of the first track, using
polling, and execute its contents; 25 NLOCs
• boot-dma — load the first sector of the first track,
using DMA, and execute its contents; 15 NLOCs
• int-timer— populate and test the interrupt vector
(timer ISR); 60 NLOCs
• int-kbd — populate and test the interrupt vec-
tor (timer ISR and keyboard ISR, competing for
a counter variable); 85 NLOCs
• page-table— populate and test the page table; 15
NLOCs
• page-fault— populate and test the interrupt vec-
tor (page fault handler); 25 NLOCs
• file— traverse a disk file organized as a linked list;
30 NLOCs
Out of 14 students taking the class, nine success-
fully completed all assignments, three completed 7 as-
signments, and the remaining two completed 5 assign-
ments.
CONCLUSION AND FUTURE WORK
Clown system is a powerful, simple, fast, config-
urable, and extensible microprocessor simulator which
can be used in various college-level courses, especially in
those dealing with operating systems.
Future work includes developing a debugger, a col-
lection of sample run-time loaders for multi-segment ex-
ecutable files, a C compiler, and a graphical user inter-
face. The simulator may be further optimized for speed.
Pipelining support needs to be added to provide more
realistic simulation if the package is to be used in a com-
puter architecture class. Networking, mouse, and graph-
ics mode support would enable many other uses of the
simulator (such as a vehicle in a PDA graphics study).
103
Proceedings of Summer Computer Simulation Conference San Jose, July 2004
ACKNOWLEDGMENTS
I would like to thank Professors D. Stefana˘scu,
D. Cohn, and A. Thomo of Suffolk University for their
encouraging support for the project, Mrs. P. Salla for
the partial implementation of the first SHaDE emulator,
and all students of my Spring 2004 Operating Systems
class for taking on the burden and pain of testing the
package.
BIOGRAPHY
Dr. D. Zinoviev received his Ph.D. in Computer Sci-
ence from SUNY at Stony Brook in 1997. He was work-
ing as a post-doc on the DARPA/NASA/NSA-sponsored
Petaflops project of a hybrid technology, multi-threaded
hypercomputer. In 2000, he joined the Computer Sci-
ence Department of Suffolk University in the rank of
Assistant Professor. His current research interests in-
clude simulation and modeling (network simulation, ar-
chitectural simulation), operating systems, and software
engineering (complexity metrics).
REFERENCES
[1] C. N. Bauers. A microprocessor simulator for beginners. On-
line at http://softwareforeducation.com/simulator.htm.
[2] C. W. Caldwell, D. L. Andrews, and S. S. Scott. A graphical
microcomputer simulator for classroom use. In Proceedings
of FIE, 1995.
[3] M. Dorojevets, P. Bunyk, and D. Zinoviev. FLUX project:
Design of a 20-GHz 16-bit ultrapipelined processor prototype
based on 1.75µm LTS RSFQ technology. IEEE Trans. on
Appl. Supercond., 11(1):326–332, March 2000.
[4] D. Ellard, D. Holland, N. Murphy, and M. Seltzer. On the de-
sign of a new CPU architecture for pedagogical purposes. In
Proceedings of the 9th Workshop on Computer Architecture
Education, 2003.
[5] S. Hill. The functional simulation of a simple microproces-
sor. Technical Report 17-94*, University of Kent, Canter-
bury, UK, September 1994.
[6] J. Merz. MICSIM: Concept, developments, and applications
of a PC microsimulation model for research and teaching.
In K. G. Troitzsch, U. Mueller, G. N. Gilbert, and J. E.
Doran, editors, Social Science Microsimulation, pages 33–
65. Springer, 1996.
[7] D. A. Patterson and J. L. Hennessey. Computer Architecture.
A Quantitative Approach. Morgan Kaufmann Publishers,
Inc., 2nd edition, 1996.
[8] Red Hat, Inc. SID simulator user’s guide. Online at
http://sources.redhat.com/sid/sid-guide/book1.html, 2001.
[9] A. Silberschatz, P. B. Galvin, and G. Gagne. Operating Sys-
tem Concepts. Wiley, 2002.
[10] B. E. Smith and M. T. Johnson. Programming the Intel
80386. Scott, Foresman and Co., 1987.
[11] Infotech Solutions. Microprocessor simulator 8085 for Win-
dows. Online at http://insoluz.com/Micro/Micro.html, 1998.
[12] VMware, Inc. VMware Workstation. Online at http://vm-
ware.com.
[13] D. Zinoviev and P. Salla. SHaDE — simple hard disk emu-
lator. Unpublished, August 2002.
104
