Abstract
Introduction
Processor architectures are becoming more and more complex. Pipelines, memory hierarchies, and superscalar features are implemented in very different ways. Besides that, a wide range of architecture classes is available (RISC processors, DSPs, VLIW processors, microcontrollers), each of them with their particular solutions for classes of application. This situation imposes severe requirements for computational environments intended for analyzing or exploring the processor design space.
These environments may serve two very different goals: teaching environments, not only to understand concepts but also to experiment alternative architectural solutions; and processor design environments, where processor design space exploration is associated to performance evaluation, for instance in the design of embedded electronic systems, where the most adequate processor must be found or designed for a given set of design requirements.
These two different goals have been mainly addressed by two distinct classes of environments. Teaching environments [1, 2, 3, 4] are usually restricted to a given processor architecture, where a particular processor configuration may be defined by selecting parameter options before the simulation starts. A rich user interface, specialized for the possible range of processor configurations, is usually supported. Some of these environments, indeed, offer a very rich set of alternative configurations [5] , so that they apply to a wide range of processor architectures, but this flexibility restricts the development of specialized user interface support.
Processor design environments [7, 8] , in turn, usually offer some specialized language, which gives great flexibility in the definition of the processor architecture. These environments are also aimed at a retargetable generation of supporting tools, such as simulators, compilers, and debuggers. This paper presents SimPL, an object-oriented methodology for modeling processor architectures that supports features addressing the modeling needs of both teaching and processor design environments. It is based on the SIMOO simulation framework [9] , which offers a rich environment for object-oriented modeling of discrete systems and is being applied to the design of electronic embedded systems [10] . SimPL allows a very flexible modeling of various processor architecture classes. The precise timing behavior of the processor instrcutions may be modeled, if desired, but more abstract models may be also developed. Object-orientation supports an easy reuse, extension, and modification of processor models. The user interface gives specialized support for the modeling methodology. This paper is organized as follows. Section 2 discusses related work and put SimPL in perspective. The SIMOO framework is briefly introduced in Section 3. The SimPL modeling methodology is presented in Section 4, and its particular strategy for modeling processor control is discussed in detail in Section 5. Section 6 illustrates the application of the modeling methodology to the well known DLX processor [11] . Section 7 briefly presents the SimPL supporting environment. Finally, Section 8 concludes and discusses future work.
Comparison with related work
There are various simulation tools oriented to education in processor architecture and organization, such as PCSpin [1] , WinDLX [2] , DLXview [3] , and ESCAPE [4] . These tools offer a pre-defined set of design alternatives that can be explored by an easy, mostly interactive setting of parameters. Visualization tools allow the user to analyze the impact of these alternatives on instruction execution and processor performance. These tools are dedicated to a specific base processor, which can be modified in a very limited way.
Other simulation tools are more intended for design exploration, offering a richer set of parameters and design alternatives. SimpleScalar [5] is a very well known tool of this class. Its parameters allow the design of very different pipelined and superscalar architectures. This feature is enhanced by the fact that the simulator source code is open, so that design alternatives that have not been initially considered can be also implemented. As a drawback, the great flexibility makes difficult the construction of a specialized built-in user interface for interaction with all possible architectural models.
On the other side, tools that allow the specification of any conceivable processor architectures and organizations are usually based on Architectural Description Languages (ADLs). Many of these languages automatically generate a retargetable tool suite, containing at least a simulator (either cycle-accurate or functional) and a compiler. Some of these languages, such as ISDL [6] , are intended only for the description of the instruction set behavior, but cannot generate cycle-accurate simulators for current processors because they do not support pipeline specifications. Other languages, such as EXPRESSION [7] and LISA [8] , allow the description of both behavior and structure.
In EXPRESSION, an RT-level netlist is described, together with the specification of instructions as lists of slots filled with operations corresponding to functional units. The RT-level netlist is composed of primitive components, such as functional units, storage elements, ports, and buses. A pipeline description specifies components in each pipeline stage, while a data-transfer path description specifies valid data transfers in the pipeline. The language also has several primitive constructions for the detailed description of the memory subsystem. Many language features are intended for the automatic generation of a compiler producing code of high quality.
LISA also supports the description of pipelined operations. The language definition, however, is mainly intended for the generation of a very fast, cycle-accurate simulator using compiled code (as opposed to conventional interpreted simulators).
It must be noticed that, although languages such as LISA and EXPRESSION are powerful tools for the development of new processors in the context of embedded systems design, for instance, they may become cumbersome if used in a teaching environment, where the main requirements are easy modification of the processor configuration, for instance in an interactive way, and good interaction resources.
SimPL is a processor design methodology (and not a language with some fixed set of constructs), which takes a different approach to processor modeling, so that it may serve well both application domains -teaching and processor design environments.
SimPL uses object-orientation as a basic paradigm for a flexible modeling of processor behavior. The methodology is based on a specialized class library developed on top of a general-purpose modeling and simulation framework. This library allows the definition of various processor architecture classes and supports an easy modeling of precise timing of pipelined processors, if necessary. Benefiting from classic object-oriented concepts (such as inheritance, hierarchy, and reuse), accessed through an interactive and visual user interface, the designer may easily develop new processor models from existing ones.
SimPL does not explictly describe the processor structure. Instead, datapaths are implicitly defined by the control behavior. This feature enhances the reusability of a datapath block, since datapath elements are independent from each other. Modifying the instruction set and timing behavior of a processor may imply only very localized changes in certain elements of the control block, as shall be explained.
Since SimPL is built on top of a modeling and simulation framework, the processor modeling process automatically results in a simulator with built-in interactive resources for tracking and controlling experiments. On the other hand, at this moment the methodology does not generate other tools, such as compilers or assemblers, and the goal of the simulator code generation is not high performance, as in the case of LISA, for instance. SIMOO [9] is a general-purpose, object-oriented framework for multi-paradigm, multi-threaded, discrete simulation. Simulation entities in SIMOO are mapped to autonomous elements, which are objects with their own execution threads. Each object in a given system may be modeled by a different simulation paradigm -event or process-oriented behavior, communication by ports or messages, active or passive reaction to messages, etc.
The SIMOO framework
The static structure of the model is graphically specified by hierarchical class and instance diagrams, while the behavior of the objects is described in C++ and make use of a specialized simulation library. From the specified diagrams and entities behavior, an executable model is automatically generated. It already contains default resources for visualization, interaction and experiment control.
The class diagram defines various types of relationships between objects, such as knowledge (so that objects may exchange messages), aggregation, and creation (so that an object may create a new instance of another class in the model). Knowledge relationships, also called connections, may be dinamically created. The aggregation relationship implicitly defines a knowledge relationship between the aggregating object and all its aggregated sub-objects.
Visual interaction facilities are implemented by interface elements that belong to a domain that is completely independent from the model logic domain. The monitor is a special entity that receives copies of all messages sent to autonomous elements whose behavior is being followed. The monitor thus redirect each of those messages to the interface element that has been associated to a given autonomous element. The mapping between autonomous and interface elements is managed by the monitor, so that it may be dinamically modified during the experiments.
SimPL modeling methodology
The SimPL modeling methodology explicitly divides a processor into MAIN_CONTROL and DATAPATH blocks. MAIN_CONTROL aggregates two other types of control elements: INSTRUCTION_SET and EXECUTION. The datapath is an aggregation of functional elements. Furthermore, information elements are used for building a bridge between the control and datapath blocks.
The datapath block does not model physical connections between the functional elements, which are not aware of each other. Connections are implictly defined by means of the processor behavior specified by the control block. This behavioral approach makes the architecture definition much more flexible. Functional elements may be more easily replaced or introduced, because their interconnections do not need to be modeled.
Each functional element contains various parameters and may execute different operations. Each operation is defined as a method of the class that corresponds to the element. Figure 1 shows the definition of a memory element in SIMOO. Besides the memory contents, it has two auxiliary internal variables (an input and an output variable). It offers four operations: read (put the memory contents into an output variable), write (put a new value into an input variable), contents (returns the value that was written to the output variable), and update (update the memory contents with the value written to the input variable). Its parameters are memory size (in words) and word size (in bits).
Control elements do not directly call methods of the functional elements. They communicate only with the aggregating DATAPATH element. This element thus redirects the messages to its aggregated functional elements, by calling the corresponding methods.
With this approach, the class diagram that models the processor does not contain direct connection relationships between a control element and all functional elements it may control. If these direct relationships were necessary, the class diagram would be much more complex.
DECODE and INTERPRET are information elements. DECODE can be used to check the contents of the Instruction Register to obtain an instruction opcode, for instance. INTERPRET may be used to interpret an expression written in an assembly language, for instance in order to calculate a memory address. These information elements are linked to control and functional elements by means of the monitor object, so that they can inspect the contents of other elements without having to send them messages. 
Micro-operations
Control elements define the processor behavior by micro-operations, that indicate the execution of operations within functional or even other control elements. Micro-operations typically define interactions between functional elements. Suppose we need to transfer the contents of register REG to input Inp1 of the functional element ALU. A control element would execute the following micro-operation, which is implemented as a method that is local to this element:
ALU_SET (Inp1, REG_CONTENTS ( ) ) As a result of this method, a message is sent to the aggregating element DATAPATH. ALU_SET(par1,par2) is a micro-operation that requests DATAPATH to execute method SET of functional element ALU. Parameter par2 of ALU_SET is another micro-operation REG_CONTENTS, which requests DATAPATH to execute method CONTENTS for reading the contents of functional element REG. These micro-operations implicitly create a path from the register to the ALU. Figure 2 illustrates the micro-operation REG_CONTENTS, implemented as a method that is local to a control element. This method is automatically created by the environment, from the element definition.
Defining processor control
Control elements, from types MAIN_CONTROL, INSTRUCTION_SET, and EXECUTION, interact with each other to define the overall processor control behavior.
INSTRUCTION_SET is an element that defines each processor instruction as a set of instruction steps. In turn, each instruction step is a set of micro-operations to be executed in parallel. As already explained, each micro-operation specifies operations in functional elements and implicitly define datapaths between the functional elements. An instruction step in executed at once each time the instruction is fired. However, the INSTRUCTION_SET element does not enforce any timing relationship between the steps. // Sends a messsage ACCESS to the Datapath. Its parameters are used by Datapath to // define which functional element must be accessed and which method of this element // must be called, as well as to check if a value must be returned to the calling control // element. An INSTRUCTION_SET element may define complete instructions as well as instruction fragments, to be hierarchically used within other instructions or instruction fragments. Instruction fragments are also defined as sets of instruction steps. This feature may be used to model instruction fragments that are common to various instructions, such as a fetch operation or a memory access using a particular addressing mode. Figure 3 illustrates the definition of an INSTRUCTION_SET element that implements an instruction fragment corresponding to a fetch operation in the DLX processor [11] . This fragment is defined as a single instruction step containing 5 micro-operations: read instruction memory position addressed by the program counter (PC); add 4 to the PC; assign PC+4 to the PC; assign PC+4 to the NewPC field of the IF/ID pipeline register; and assign the contents read from memory to the IR field of the IF/ID pipeline register. These micro-operations are implemented as methods that are local to the INSTRUCTION_SET class, as shown in the methods windows (upper left in Figure 3) .
EXECUTION is the control element that orders the execution of all instructions steps, according to a selected execution mode. Depending on the mode, several firings may be necessary to complete the execution of all steps defined for an instruction. Each firing may be executed, for instance, at a consecutive clock cycle, but alternative timing schemes may be implemented. EXECUTION may use one of the following execution modes: a) single-cycle mode -all instruction steps of an instruction are executed in a single firing; b) multi-stepped mode -each firing executes a certain number of instructions steps; and c) multi-cycle mode -each firing executes a single instruction step. Suppose an instruction that is composed by 6 micro-operations that are grouped into 3 steps. Now suppose that an element EXECUTION fires this instruction using a multi-cycle mode. The code shown in Figure 4 is automatically generated for the method that implements the execution of the instruction. The execution of the instructions or instruction fragments is ordered by the MAIN_CONTROL element, which is the topmost level of the control block hierarchy and defines the overall processor control, as well as a concrete timing behavior. The EXECUTION element makes micro-operations available that correspond to the step firings. These micro-operations may have one of the formats below:
EX (instruction, stage) -for the multi-stepped and multi-cycle execution modes; EX (instruction) -for the single-cycle execution mode.
MAIN_CONTROL uses these micro-operations to implement the overall control, according to a given timing behavior. In a synchronous processor, firings will typically be synchronized with the clock cycles. Several cycles, however, may be associated to each firing, so that the latency of a functional element may be easily modeled. Two small examples illustrate possible processor models that are defined based on this methodology. Figure 5 shows some of the control and datapath elements for the design of a single-cycle, nonpipelined processor. The topmost level of the design hierarchy contains three elements: MAIN_CONTROL, DATAPATH, and an information element DECODE. MAIN_CONTROL aggregates an EXECUTION element, which in turn aggregates a single INSTRUCTION_SET element, containing the definition of all processor instructions. INSTRUCTION_SET is related to DATAPATH by a connection (a knowledge relationship, as introduced in Section 3). INSTRUCTION_SET is connected to DATAPATH, and DATAPATH aggregates the functional elements that implement these micro-operations, the messages corresponding to them are automatically redirected by DATAPATH to their correct destination elements.
A single-cycle processor
Time is advanced by a single time unit by function Wait (1) , which belongs to the basic SIMOO library for defining object behavior in a process-oriented approach. In Figure 6 , the first Wait(1) is used to sequentialize EX (FETCH) and EX (DECODE () ), in order to avoid their parallel execution. The second Wait(1) is used to effectively advance the processor clock. Figure 7 illustrates how a microprogrammed processor would be modeled with the SimPL methodology. Using a hierarchical approach, a first INSTRUCTION_SET element named INST_SET_1 would define instructions as sets of micro-instructions. For a given instruction, for instance, this element would invoke the execution of three micro-instructions:
A microprogrammed processor
EX(micro-instruction1); EX(micro-instruction2); EX(micro-instruction3); As usual, each instruction defined in INST_SET_1 would be invoked by an EXECUTION element EXECUTION_1, according to some desired execution mode. Each of these micro-instructions would be then defined as a set of SimPL micro-operations (to be executed in the datapath block), by means of a second INSTRUCTION_SET element named INST_SET_2. The invocation of these micro-instructions would be controlled by a separate element EXECUTION_2. Modeling the DLX datapath with the SimPL methodology is straightforward. The DATAPATH element aggregates all functional elements we can see in Figure 8 , including memories. The only exception concerns the multiplexers, which are not necessary, since connections are not explicitly modeled in SimPL.
The basic SimPL library contains classes that implement all datapath elements of the DLX processor. These classes must be only instantiated and parameterized. Exceptions are the zero-test and carrypropagation elements, for which special classes had to be defined from scratch with the SIMOO objectoriented modeling facilities.
After all datapath elements have been instantiated or created, they make a large set of micro-operations available to the control elements. Examples of micro-operations are int PC_CONTENTS () DATA_MEM_WRITE (address, value) int INST_MEM_READ (address) EX_REGISTER (subregister, operation) The MAIN_CONTROL element contains an EXECUTION element. This element will be responsible for invoking micro-operations that correspond to each instruction, as defined in the INSTRUCTION_SET element, which is instantiated within EXECUTION.
INSTRUCTION_SET defines the four DLX instruction types in a hierarchical way, using the features of the SIMOO modeling environment, as illustrated in Figure 9 . For each instruction, the INSTRUCTION_SET element defines three instruction steps. They correspond to the pipeline stages EX, MEM, and WB, which are distinct for each instruction. Each of these stages is defined as set of micro-operations. The pipeline stages IF and ID, which are identical for all instructions, are modeled by two instruction fragments InstrFetch and InstrDecode and invoked directly from the MAIN_CONTROL element.
The EXECUTION element invokes instructions (in fact: invokes micro-operations that are made available by INSTRUCTION_SET) in a multi-cycle mode, where each processor cycle executes a single instruction step. Besides that, the EXECUTION element makes micro-operations of the form EX (par1, par2) available, where par1 represents an instruction and par2 an instruction step. These micro-operations are invoked by MAIN_CONTROL.
The overall control behavior and timing are defined within MAIN_CONTROL, as shown in Figure  10 The information element DECODE, as introduced in Section 5.1, monitors all updates to INSTRUCTION_REGISTER, a datapath element defined within the hierarchy of DATAPATH, and returns an instruction inst to be executed by micro-operation EX(op1=DECODE(),1). This information element is necessary because the control block, according to the SimPL approach, does not have a direct association to each functional element within the datapath block. The information element, using a SIMOO monitor, may obtain the information and pass it to the control block.
The movement of instructions through the pipeline is modeled by auxiliary variables op1 thru op5 in Figure 10 . All 5 pipeline stages are executed in parallel, and instructions move to the next stage also in parallel. The SIMOO function wait(1) advances the simulation clock and leads the processor to the next cycle.
Specialized information elements, that monitor the state of the pipeline, inform MAIN_CONTROL about the occurrence of branch and data dependencies. When one of these dependencies is detected, MAIN_CONTROL may invoke a method that implements a desired penalty procedure. The invocation of these information elements is not shown in Figure 10 .
New instructions may be easily introduced, and existing instructions may be easily modified. This can be done by implementing these changes in the control element INSTRUCTION_SET. The new microoperations that are thus created may be used by EXECUTION. If needed, new functional elements may be created within DATAPATH. The micro-operations that are made available by these new elements may be invoked from INSTRUCTION_SET.
The supporting environment
A dedicated environment, built on top of SIMOO, offers resources that give additional support to the SimPL modeling methodology. SIMOO already offers various modeling resources, as introduced in Section 3. SIMOO also builds a default user interface for each simulation model, including read and write access to all model variables and full control over the simulation execution.
For helping the modeling process, this environment offers windows for each control element. These windows display all micro-operations that are available for the definition of the corresponding control element. As an example, the large set of micro-operations defined in the datapath block, which are available for the specification of instructions or instruction fragments, are shown in the window corresponding to INSTRUCTION_SET, so that the programmer is aware of their existence. The same happens for EXECUTION, so that the micro-operations that are made available by INSTRUCTION_SET can be browsed by the programmer.
As already mentioned in Section 5, the environment automatically generates the code of the method belonging to class INSTRUCTION_SET, which executes instruction steps according to the execution mode that has been chosen in the control element EXECUTION.
The SimPL supporting environment also gives the user the possibility of accessing and modifying values of parameters that have been defined for the various functional elements.
When a model is built and element parameters are defined for a given processor configuration, SIMOO classes are automatically generated. These classes may be edited by means of all object-oriented features of the basic SIMOO framework. By polimorphism and inheritance, for instance, elements may be specialized, the behavior of methods may be changed, or entirely new elements may be built. If these new classes follow the SimPL modeling paradigm, all resources and advantages of the SimPL environment become available.
Conclusions and future work
This paper presented the SimPL methodology for modeling the precise behavior of processor architectures. SimPL is built on top of SIMOO, a general-purpose framework for object-oriented modeling and simulation of discrete systems. Benefiting from object orientation, the methodology aims at a very flexible modeling approach. This feature is extremely useful for a fast exploration of the processor architecture design space and also for teaching purposes.
The methodology is based on the following main concepts: ¥ datapath and control are explicitly separated, and each of them is modeled as an hierarchy of elements;
¥ data paths between functional elements are implictly defined by the control behavior, so that datapath connections are not explicitly modeled; ¥ a basic library of highly parameterized functional elements is already available; ¥ micro-operations are offered both by datapath and control elements, corresponding to the functions they may execute; ¥ the control behavior is modeled by three different control elements -INSTRUCTION_SET defines micro-operations that must be executed for each instruction; EXECUTION organizes micro-operations of the instructions into steps; and MAIN_CONTROL defines the overall timing by invoking instruction steps and associating them to clock cycles; and ¥ instructions may be hierarchically defined as combinations of instruction fragments. The separation of the control behavior into three different types of elements, together with the hierarchical definition of instructions, allows an easy modeling of a large variety of micro-architectures and timing behaviors.
From a given processor model, the methodology makes it extremely easy to obtain a new one, with a different timing behavior, or with a different set of instructions. Full exploitation of object orientation greatly enhances the capabilities for deriving new processor models from old ones.
As a validation strategy, a complete model of the DLX processor has been developed. Other architectural classes, such as VLIW and DSP, will be modeled later on. The SimPL supporting environment is being implemented. Through this environment, classes and methods that are necessary for the methodology will be automatically generated. The environment is being linked to CSDSim [12] , which is an environment offering a specialized user interaction with a processor model. This environment emulates a classroom, following a client-server approach, where the server runs the processor model and clients implement the interaction of the instructor and students with the model.
The dynamic modification of processor models during experimentation will be implemented in the near future. With this feature, the user will be capable of changing the processor behavior dynamically, by assigning new values to element parameters and even replacing methods that specify the behavior of control or datapath elements.
