The memory system is one of the core topics in computer architecture and organization. An important problem in teaching this topic is how to help students connect their theoretical knowledge of memory system concepts with the practical problems facing the designer of various parts of a memory system. A common approach to tackling this problem is to organize practical exercises in the laboratory using a memory system simulator. The existing simulators mainly focus on subtle memory system issues, such as cache performance, latency, coherence and consistency models in multiprocessor systems and do not admit a view of design details and internal activities within various parts of a memory system. This paper presents an originally developed memory system for education and its web-based simulator. The memory system includes the virtual memory and translation lookaside buffer, the cache memory and the interleaved main memory. The simulator facilitates a web-based clock-by-clock interactive simulation of the memory system, its visual presentation at the register-transfer level and navigation through parts of the system. They can be used for exercises in the laboratory and self-learning from home.
INTRODUCTION
The memory system is one of the topics extensively covered in the third-year course in Advanced Computer Architecture and Organization at the Faculty of Electrical Engineering, University of Belgrade, attended by the students at the Department of Computer Engineering. The students acquire the knowledge necessary for taking such a course through the first-year course in Digital Circuits and the second-year course in Computer Architecture and Organization. The first-year course covers switching functions, combinational and sequential switching circuits, the analyses and syntheses of switching circuits and the standard modules such as multiplexers, decoders, arithmetic and logic units, registers, counters etc. The second-year course covers the basic concepts related to the most commonly found structures of a computer, which include the processor, the memory, the input/output subsystem and the bus. The course in Advanced Computer Architecture and Organization goes one step further covering topics such as the architecture and organization of CISC and RISC processors, the organization of pipelined processors, the storage system, the interconnection networks and the memory system. These courses have been lectured for a few years and cover the core topics in computer architecture and organization identified by the joint IEEE Computer Society and ACM Computer Engineering Task Force [1] .
The memory system is treated as a hierarchical one, made up of the virtual memory and translation look-aside buffer (TLB), the cache memory and the main memory [2] . The virtual memory and TLB study introduces the notions of virtual and physical address spaces and the need for mapping addresses. Various techniques for mapping segment or page descriptors from the segment and/or page tables into the TLB and the replacement algorithms are presented with special attention being given to the separation of activities carried out by the operating system and the TLB. The considerations of the cache memory concentrate on the behaviour of programs, which display temporal and spatial locality, the techniques used to map main memory blocks into cache memory blocks, the cache memory replacement algorithms, the main memory updating policies and some other techniques that improve the performance of the cache memory. The main memory presentation deals with the techniques used to increase the memory bandwidth, such as memory interleaving.
The memory system is taught with the aim of achieving three equally important objectives. The first is to provide students with a complete theoretical treatment of all fundamental memory system concepts. The second is to help them use their knowledge of memory system concepts and digital circuits in solving practical problems facing the designer of various parts of a memory system. The third is to prepare students for the fourth-year courses in VLSI design, Distributed Shared Memory Systems and Performance Evaluation of Computer Systems, where subtle memory system issues are dealt with. An approach to achieving the stated objectives is to organize practical exercises in the laboratory using a software simulator of a memory system. However, this approach raises the question of the requirements that the simulator should meet. The basic
The Computer Journal Vol. 48 No. 6, 2005 A Memory System for Education 631 one is that the simulator should cover all relevant aspects of the memory system. In addition to that, during a simulation run students should be able to see all design details and internal activities, which make the simulation at the registertransfer level another important requirement. However, this level of simulation entails that the simulation should be clockby-clock and interactive. Since in our case, the emphasis is more on the issue of understanding the functioning of a memory system and less on its performance, the visual presentation of simulation is given advantage over the textual one. Finally, the current trend in the implementation of educational systems is that they are web-based.
There are several simulators of memory systems in the open literature. However, most of them are just parts of simulators of very complex computer systems where the memory system is not the main part of the simulator [3] . Among the simulators that focus primarily on the memory system, the most interesting ones are Dinero-Hase [4] , Cachesim [5] , LDA [6] , Fast-Cache [7] , JCachesim [8] , CHESS (cache hierarchy estimator using scalable simulator) [9] and CACTI [10] . 'Dinero-HASE' is a cache memory simulator obtained by loading the Dinero module within the HASE simulator, which has two modes of operation. In animated mode, all simulation activity is visualized during simulation playback, but there is a trace file limit of 500 lines. However, the trace file limit in fast mode is 10,000,000 lines, but there is no animation of simulation activity. 'Cachesim' is a web-based cache memory simulator that models the effectiveness of various cache memory algorithms and configurations using static trace files. The cache memory parameters and trace files should be submitted to the simulator to generate the output simulation. 'LDA' is a flexible memory hierarchy simulator, which uses the latency-of-data-access (LDA) Model. The simulator supports shared memory multiprocessor systems with different cache coherence protocols.
It can be configured for systems with various cache architectures and run with the results from benchmarks, which measure the architectural properties of a memory hierarchy. 'FastCache' provides a framework for implementing memory system simulators. It is a SPARC implementation of the active memory abstraction where memory references logically invoke a user-specified function depending upon the reference's type and accessed memory block state. Since the abstraction hides implementation details, implementations can be carefully tuned for particular platforms, permitting much more efficient on-the-fly simulation than the traditional trace-driven abstraction. 'JCachesim' is a Java-based tool used for experimenting with cache behaviour using simple assembly programs while varying cache features. It allows a user to observe the CPU and the cache activities during the execution of a program, evaluate the system performance, and analyse the reference locality and the distribution of memory accesses. 'CHESS' is a trace-driven simulator, which simulates the work of a hierarchy of cache memories and includes some timing simulation that explores latency of the system. The simulator offers some timing information, gives statistics for each cache alone as well as overall Table 1 . The first column gives the high (H), medium (M) and low (L) coverage of the memory system topics (TC), whereas the remaining ones give the availability of simulator features, such as the register-transfer level of the presentation (RT), the clock level simulation (CL), the user interactive flow (UI), the visual presentation (VP) and the web-based accessibility (WB). Dinero-Hase includes the tools needed to model all topics, whereas all other simulators do not deal with the virtual memory. However, none of them includes implementation details down to the registertransfer level representation. The clock level and user interactive simulation flow are available only with DineroHase. The visual presentation is provided with Dinero-Hase and JCachesim. Finally, the web-based simulators are only Dinero-Hase, Cachesim and JCachesim. As a result of a critical analysis of these simulators, we have concluded that each of them fulfills its own design objectives, but none of them fully meets with the above-stated requirements.
At the Faculty of Electrical Engineering, University of Belgrade, significant effort has been invested in developing educational systems and simulators for various topics in the area of computer architecture and organization including the memory system [11] . The initial version of an originally developed memory system was designed and the standalone simulator was written in Visual Basic a few years ago [12] . Since then they have been successfully used for laboratory exercises. In the course of this period, some ideas on how to enhance the memory system and the simulator have emerged based on experience gained, both by the authors and the students working with them. In addition to that, the possibility to run the simulator as a web-based application has become a new but very important requirement. Having all the above-stated issues in mind, a decision was taken to redesign the memory system and develop a completely new simulator software in the Java programming environment. The memory system, the simulator features, the simulator software and the use of the simulator for laboratory exercises are presented in the following sections.
MEMORY SYSTEM
The memory system is made up of the virtual memory and TLB, the cache memory and the main memory. The initial idea was to develop a hierarchical memory system where the process of accessing the TLB, the cache memory and the interleaved memory is realized as a whole and as a part of one of the educational systems. Although the realization of such a memory system was feasible, it was felt that such a memory system would be too complex and very difficult for students to use. Therefore, it was decided to split up the memory system into three separate functional parts, which are the virtual memory and TLB, the cache memory and the interleaved memory, and design each part separately. The theoretical aspects of the memory system and the implementation details of the designed memory system are given in the accompanying book [13] . Its basic characteristics are briefly explained below.
The virtual memory and the TLB are demonstrated with three types of virtual memory and three types of TLB. The TLB holds a certain number of descriptors of most recently accessed pages or segments and performs the virtual-tophysical address translation for requests coming from the processor. If the descriptor is in the TLB, the address translation is carried out according to the mapping technique implemented, and the physical address returned to the processor. In the case of the TLBs for segmented and segment-paged virtual memories, the access and address violation checks are performed and if any or both checks fail, the TLB generates an interrupt. If the descriptor is not in the TLB, the appropriate segment or/and page table(s) in the main memory are accessed. When the segment or page is in the main memory, its descriptor is loaded in the appropriate entry in the TLB and the address translation resumed as for the above explained case when the descriptor is in the TLB. Otherwise, the TLB generates an interrupt. The TLB entry to be replaced with a new descriptor is selected according to either the first-in-first-out (FIFO) or the least recently used (LRU) algorithm for the TLBs with the associative or set-associative mapping. A part of the processor, which generates virtual addresses using the table of requests and the main memory are simulated.
The cache memory is demonstrated with three types of cache organizations: the direct, associative and setassociative mapping. The cache memory holds a certain number of the most recently accessed blocks from the main memory and performs a write, read, selective flush or complete flush operation for requests coming from the processor. When the block is cached the operation is carried out according to the mapping technique implemented and the data read returned to the processor for the read operation. The block that is not cached is loaded from the main memory and the activities resumed as for the above explained case when the block is cached. The cache memory block to be replaced is selected according to either the FIFO or the LRU algorithm for the cache memories with the associative and setassociative mapping. In addition to that some other activities in the cache memory may occur depending on whether the write-back or store-through technique for updating the main memory is used. The cache memory with the direct mapping uses the write-back technique and the cache memory block selected-to-be-replaced is first returned to the main memory if dirty. The cache memory with the associative mapping uses the store-through technique and there is no need to return a dirty block. The cache memory with the set-associative mapping uses the write-back technique with buffering and if the cache memory block selected to be replaced is dirty, this block is first written only into a buffer block, then the new block is loaded from the main memory, and finally the dirty block is returned from the buffer block. The processor and the main memory are realized in a similar way as for the virtual memory.
The interleaved main memory is demonstrated with a system made up of 16 units, 16 memory modules, the bus and the arbiter. A unit includes all circuitry necessary to perform the appropriate bus cycle either as a master or a slave. When the unit is a master, it sends a bus request to the arbiter and performs a read or write request bus cycle when it gets the grant. These bus request cycles are generated by the unit, which simulates related parts of a processor or a direct memory access controller and is realized in a similar way as the processor for the virtual and cache memories. When the unit is a slave it accepts data in a data available bus cycle. A memory module includes all circuitry necessary to perform the appropriate bus cycle when the memory module is either a slave or a master. When the memory module is a slave, it accepts a write or read request from the bus and performs the write or read operation. When the read operation is completed, the memory module sends a bus request to the arbiter and performs a data available cycle when it gets the grant. The memory module is simulated in a similar way as the main memory for the virtual and cache memories. Five ways of interleaving memory modules can be specified: consecutive addresses in the same module, consecutive addresses in consecutive modules and three cases of mixtures of the first two. The bus is realized as a split transactions synchronous bus with the write request, the read request and the data available bus cycles. The arbiter supports the parallel arbitration between all units and memory modules.
SIMULATOR
The simulator is run from a web browser. It includes various features to select the simulator, initialize the simulator and control the simulation. The Selection screen (Figure 1) , which is the first screen that appears, makes it possible to select the simulator and move to the Initialization screen (Init). The initialization and simulation features of simulators for the virtual memories and TLBs, the cache memories and the interleaved memory are similar. Therefore, they are briefly presented using the cache memory with direct mapping, which has the simplest organization with respect to other parts of the memory system. The Initialization screen ( Figure 2 ) facilitates both the interactive initialization, using the Processor, Memory and Registers tab panels and the initialization from a file, using the File panel. The Processor panel allows a user to fill, update, delete and reorder requests in the table of requests of the processor (Figure 2 ), whereas the Memory and Registers panels make it possible to examine and/or fill the values of memory locations of the main memory (Figure 3 ) or the registers in the cache memory system respectively. The File panel facilitates the initialization of all of them by reading a web server file. There is also a possibility to go to either the Simulation screen (Sim) or return to the Selection screen (Exit).
The simulator visually presents parts of the cache memory and values of signals, simulates the behaviour of the cache memory and displays simulation results in a user-friendly manner. During a simulation run, four windows are present on the screen (Figure 4) . The larger window in the upper part of the screen, named the Unit diagram window, contains a composition of combinational and sequential circuits of the selected part of the cache memory system, which is made up of three modules: the simulated processor (CPU), cache memory (CACHE) and main memory (MEM). The CPU module contains the request generator (req_gen) and cache interface (cache_int) units and the CACHE module, the processor interface (cpu_int), tag and data memory (tag_data), valid and dirty flags (flags), block replacement logic (repl) and memory interface (mem_int) units. The control unit (cont_unit) is common to all three modules. Since a limited number of combinational and sequential circuits can be placed on a screen, only one part of the cache memory system, such as the units of the CPU and CACHE modules, the MEM module and the control unit, might appear in the Unit diagram window. of the cache memory system that cannot completely fit in the screen, the vertical and horizontal scroll bars are used ( Figure 5) . A single signal is presented with a thin line in blue or red depending on whether the signal is inactive or active, respectively. Groups of signals are presented with thick green and grey lines, depending on whether they are in the state of high impedance or not respectively. The values of signals on thick grey lines can be displayed by activating the line, whereas the number system can be selected from the pop-up menu. For each signal or a group of signals coming from or going to another part of the cache memory system, there is a button, which makes it possible to navigate through the cache memory system.
The smaller window in the lower part of the screen contains the Structure, Command and Information windows. The Structure window includes a separate button for each part of the cache memory system that can be shown in the Unit diagram window. The button for the part shown in the Unit diagram window is shaded in red (cpu_int) and the others in grey (Figure 4) . Additionally, they can be used to navigate through parts of the cache memory system. The Command window contains buttons that facilitate carrying out the simulation (Simul), following the status of the simulation (Miscel) and navigating (Navig). The Simul command buttons are used to activate the simulation for one clock period (Clk), for as many clock periods as is needed to perform a complete operation in the cache memory system (Opr) and return the simulation to the beginning (Reset). The Miscel command button Show allows one to examine and set values in the table of requests of the simulated processor (Figure 2) , the main memory locations (Figure 3 ) and the cache memory system registers ( Figure 6 ) and see the timing diagrams of signals ( Figure 7 ). In addition, the Save and Help buttons save the current state of simulation in a web server file and activate the help system where all details concerning the cache memory system and the simulator software are given (Figure 8 ). The Navig command buttons allow one to move back to the previous screen in the Unit diagram window (Back) and exit the simulation and go back to the Initialization screen (Init). The Information window gives the value of the step counter and the control signals generated for a particular clock period and a brief explanation of the actions that are going to take place during that clock period.
SIMULATOR SOFTWARE
The initial version of the simulator software, being written in Visual Basic as a standalone application, could be used only in the laboratory. However, positive experiences in using the simulator and emerging trends in distance learning, imposed the need to make it web based. Therefore, the new version of the simulator software had to be developed using the programming environment that meets the following requirements: the web-based accessibility, the presentation of screens with a large number of visual elements, the objectoriented nature of the implementation language, the minimal cost of the simulator software development and use, and the software and hardware independence of the client side.
The main candidates were ASP (active server pages), PHP and the programming language Java. ASP is the most popular Microsoft development tool for Internet applications. However, it is tied to the Microsoft operating systems and its applications are executed with the Microsft Information Server, the cost of which is a limiting factor. In addition, the presentation of screens with a large number of visual elements is rather difficult. PHP is a new generation development tool for Internet applications, which is free of charge. However, it is not suitable for the development of such systems where the client side is very complex. Therefore, Java, which meets all above stated requirements, has been used for the developed system.
The realization of simulator software requires solutions for the problems: the visual presentation of combinational and sequential circuits, the drawing of lines for signals, the presentation of screens for parts of the cache memory system that appear in the Unit diagram window, the simulation of behaviour of sequential and combinational circuits, the drawing of timing diagrams of signals, the return of simulation to the beginning, the saving of the simulator state in a file, the initialization of the simulator from a file etc. The most interesting problems and the solutions adopted are briefly presented in the following paragraphs.
The visual presentation of combinational and sequential circuits requires a solution for the problem of drawing the usual graphical symbols for various types of components, such as registers, counters, multiplexers, decoders, flip-flops, AND, OR, XOR and NOT circuits etc. First a class for each type of component is created and then separate objects of this class are created for every place on the screen where this type of component appears. Within a class, a set of properties, which defines the behaviour of the component, is given and in each object of the class the desired values of the properties are defined. In this way for each component are defined its position on the screen, size, direction etc. This solution assumes that any modification in the graphical symbol of a component of the appropriate type is carried out at the level of classes.
The drawing of lines for signals is done by drawing separately horizontal and vertical segments of lines, which are presented as objects of the Myline class. In this class are defined its possible properties, such as the size, thickness, position, direction, value etc., references to other objects, methods for drawing and colouring the line etc., whereas in the objects are given specific values for each of the properties. Within this class are also kept references to the objects of those components which are linked by this line. References are created at the moment when the line is drawn. The value of the property thickness is taken by the drawing method to draw thin and thick lines. The value of the signal(s) is specified by the property value. Depending on the value, the colouring method draws thin lines in blue or red and thick lines in green or grey. In the case of a thick line the object of the java.awt.Label class uses the property value to show the value of signals.
The solution that each element on the screen is realized as a separate object makes it possible to move components with the mouse. This feature is realized by implementing the event listener within the class which presents the simulator's window. The event listener recognizes the action and the object of the component being moved. As a result, new coordinates are passed to the object and the component is drawn at the new position. New coordinates are also passed to the line object which comes from or goes into the moved component and the line is drawn at the new position. It is also possible to change some parameters of the components, such as the size of registers, memory width etc. This is achieved by entering a new value of the appropriate property in the object of the component.
The presentation of screens for parts of the cache memory system that appear in the Unit diagram window requires solutions for problems of drawing and replacing screens in the window. To this end, a separate screen for each part of the cache memory system is represented with the screen object of the java.awt.Frame class. Since a screen contains a number of components and lines that connect them, objects of the graphical symbol classes for components and the line classes for lines are defined within screen objects. These screen objects are visually presented on the screen using the CardLayout manager. When a new screen is chosen to replace the current screen in the Unit diagram window, which is initiated by using one of the navigation features, the initialization of all objects defined within the screen object is carried out.
The simulation of behaviour of sequential and combinational circuits requires special care since sequential and combinational circuits are interconnected. Therefore, the problem that requires to be solved is the order in which the behaviour of the sequential and combinational circuits should be simulated. The approach adopted follows their behaviour in real digital systems where, when a clock occurs, first sequential circuits change their values and then, during a clock period, combinational circuits change their values. Therefore, the simulation of sequential and combinational circuits is performed as follows. When a system reset occurs, the values of signals at the output of sequential circuits are first initialized and then the values of signals at the outputs of combinational circuits are calculated. When a clock occurs, the values of signals at the outputs of sequential circuits are first calculated, based on the values of signals at the outputs of appropriate combinational circuits, and then the values of signals at the outputs of combinational circuits are calculated, based on the values of signals at the outputs of appropriate sequential circuits.
The drawing of timing diagrams of signals requires solutions for several problems: first the choice of data structure in which values of signals on each clock during the simulation are kept, second the selection of signals to be drawn and third the realization of drawing itself. The data structure should be a sparse matrix, since on a clock very few signals change their values. The operations used with matrices in this particular case are the operations of sequential insertion and reading, and not the operations of deletion and editing. Therefore, the vector rather than the linked list representation of sparse matrices is chosen. In addition to that, there is a need to provide the dynamic reallocation of the space, since the size of such a matrix grows during the simulation. Therefore, the values of signals during the simulation are kept in objects of the java.util.Vector class. A new element of the vector is added only for the clock when the value of the signal has changed. The element contains the value of the signal and the clock number when the value of the signal changes. Since it was decided to facilitate the drawing of timing diagrams of all signals, the values of all signals are kept during the simulation, which requires a separate object for each signal. It should be noted that, although a timing diagram can be drawn for any of the signals, the space on the screen is limited (Figure 7) . The selection of signal to be drawn at a place is realized with the combo boxes, where within each box are all available signals. The drawing of timing diagrams is performed by drawing lines between two points which belong to the clocks when a change of the signal takes place and is done by using the object of the java.awt.Canvas class and its paint() method.
The simulator software includes two data structures for keeping the state of the simulation. At the beginning, both data structures are initialized to the same values. During the simulation all accesses and changes are made only to one of them, whereas the other one remains unchanged. When the return of the simulation to the beginning should be made, the content from the unchanged data structure is copied to the one accessed and changed during the simulation.
The realization of simulator features which make it possible to save the state of the simulator in a file and to initialize the simulator from a file requires solutions for the problems which stem from the fact that the simulator software is written in the form of Java applets. The applet is not
The Computer Journal Vol. 48 No. 6, 2005 allowed to read, write or delete files in the file system of the client local system where it is executed and to establish the connection through the network with any other system but its server. Therefore, the state of simulator can be saved in a file only on the system from which the applet was retrieved, which is the web server. The same applies for the file, which is used to initialize the simulator.
The files are sent from the client to the server and retrieved from the server to the client by applying a modern concept of working with servlets. Servlets are Java classes inherited from the HttpServlet class of the java.servlet.http package. They are executed on the web server and are responsible for receiving a request in the form of an HttpRequest objects, process the request and return the reply in the form of HttpResponse object. For the establishment of the connection, the client requests to read and write are made using objects of the java.net package and classes of the java.io package. The servlet itself is initialized when the web server is started and placed in the so-called servlet container, ready to process the requests. When the client requests the servlet via its Uniform Resource Locator, the container of the servlet intercepts such a request and converts it into a request of the HttpRequest type, which is recognizable by the servlet. The appropriate methods doGet() or doPost() are then executed. At the end, the HttpResponse object is formed with the replies that the servlet sends. The HttpResponse is recognized on the client side by the web browser.
The initial idea was to execute each screen within its own applet. The advantage of this approach is the modularity of the simulator software. However, since signals are exchanged between screens, there is a need for the interapplet communication. This causes very intensive client-server communication, which slows down the simulation. Therefore, it was decided that all screens should be executed within one applet. With this approach only the start of simulation is slightly slow, whereas the rest of simulation is fast.
The display of various warnings is performed using objects of the Dialog box class. When an object of such a class is defined there is a need to specify the parent object to which the warning belongs. The parent object must be an object of the Frame class. The adopted approach is to link the warning for the object that represents the web browser. The class of this object inherits the Frame class. Although the warning is displayed, a user can still activate the simulator features. In order to inhibit this, method setEnabled(false) of the screen object is called just before the warning is displayed, which results in the inhibition of any further action within the screen. The event listener of this object reacts to some events within the Dialog box and calls the setEnabled(true) method of the screen object when there is a need to permit further user actions.
MEMORY SYSTEM IN EDUCATION: AN EXAMPLE
Our students use the memory system both in the laboratory and from home. Its use in the laboratory is organized through seven exercises of 2 h each. The first three exercises deal with the TLB with the associative, direct and set-associative mappings for the segmented, paged and segment-paged organizations of the virtual memory respectively. The next three exercises cover the cache memory with the associative, direct and set-associative mappings, the write-back and the store-through memory updating policies, the FIFO and LRU block replacement algorithms, and the techniques for enhancing the cache memory performance, such as the write buffering, the accessing critical word first, the bypassing and the early processor start. The last exercise demonstrates the parallel arbitration and the bus cycles on the split transactions synchronous bus for each of the five possible ways of interleaving memory modules. The exercises are prepared with the aim of covering all the situations that might appear in the memory system and are accompanied with written instructions on how to carry out the simulation. The students are allowed to do an exercise in the laboratory only if they pass an entry test, which contains a number of questions relevant for the topic considered in the exercise. At the beginning of each exercise the students first initialize the simulator from an earlier prepared file. Then they use the simulation features to carry out the simulation at the clock level, locate relevant parts of the memory system and examine the values of signals, registers and memory locations and see the timing diagrams of signals (Section 3). During each exercise they also write reports, which they submit and defend at the end. Five of seven successfully done and defended laboratory exercises are the precondition for the exam to be taken. The students also use the memory system from home in order to prepare for the laboratory. To this end, they are allowed to go through all exercises from home in the same way as they do in the laboratory. In addition to that, they are free to prepare their own exercises and carry out the simulation using the initialization and simulation features of the simulator (Section 3). Finally, the memory system can be used from home by those interested in self-learning. In order to achieve this, they can either get the needed explanations of the topics of interest using the Help system of the simulator or carry out the simulation with the exercises used by students in the laboratory or carry out the simulation with the simulator initialized according to their own wishes.
A typical use of the simulator is briefly explained in the following using a part of the laboratory exercise for the cache memory with the direct mapping. We create the exercise by filling the table of requests of the processor and relevant memory locations. To this end, the simulator is first selected (Figure 1 ) and then each request is separately prepared and filled in the table (Figure 2 ). If some of the requests are not filled correctly or if the requests in the table do not create the simulation scenario we want to have, the requests are also updated or deleted using the appropriate buttons or reordered by marking and dragging them from one to another place in the table. In a similar way, relevant memory locations are initialized (Figure 3) . At the end the desired contents of the table of requests and main memory is saved in a file (Section 3). During the simulation these requests create the scenario typical for the direct mapping briefly discussed as follows. It is deliberately taken that all the requests are for the memory locations belonging to the memory blocks that map into entry 2 of the cache memory. Since the cache memory is empty at the beginning and all entries are not valid, request 0 gives a cache miss. Consequently, the appropriate block is first loaded from the main memory into entry 2 of the cache memory and then data sent to the processor. Requests 1 and 2 are for the same block giving the cache hits. In the case of request 1, data are read from the cache memory, whereas in the case of request 2 data are written into the cache memory causing this block to become dirty. Request 3 is chosen to demonstrate the main disadvantage of the cache memory with the direct mapping. This read request is for the memory location belonging to the main memory block that also maps into entry 2 of the cache memory. Thus, although all other entries in the cache memory are free, this main memory block can be loaded only in entry 2 of the cache memory. Since the block in entry 2 is dirty and the write back-algorithm for updating the main memory is used, this block is first returned to the main memory and then the new block is loaded from the main memory. Request 4 causes an effect similar to request 3. However, this time the block in entry 2 of the cache memory is not dirty and only the new block is loaded from the main memory. In a similar way, all other requests are dealt with.
The students first select the exercise (Figure 1 ), then carry out the initialization from a file ( Figure 2 ) and finally start the simulation. During the simulation, the cache memory receives a certain number of requests from the simulated processor, generated on the basis of the contents of the table of requests (Figure 2) . The first simulation screen is the processor interface unit screen (Figure 4) . Here the students need to use Command button Show and examine the table of requests in the simulated processor (Figure 2 ).
The simulation itself is carried out clock-by-clock using Command button Clk (Figure 4) . When a request is received, they observe the values of flip-flops and registers specifying the request (Figure 4 ). Then they have to move to the tag and data memory screen ( Figure 9 ) using the tag_data button either in the Unit diagram window or in the Structure window (Section 3). Here they can see how a hit/miss signal H/M is generated in the TAG part. In the case of a hit, the DATA part is observed in order to see how the data are either read or written. If this is a read request, Command button Back can be used to return to the previous screen ( Figure 4 ) and see how data are sent to the simulated processor. However, in the case of a miss they go to the valid and dirty flags unit screen ( Figure 5 ), see the values of the valid and dirty flip-flops and conclude whether the block from the appropriate entry of the cache memory has to be returned to the main memory or only the new block has to be loaded from the main memory. These transfers are followed using the tag and data screen ( Figure 9 ) and the memory unit interface screen (Figure 10 ). The effects of a block transfer can be verified using Command button Show. It provides facilities to examine, before and after the transfer, the values of relevant memory locations ( Figure 3 ) and cache memory system registers ( Figure 6 ) and see the timing diagrams of signals (Figure 7) . In a similar way the simulation of the rest of the exercise is carried out.
CONCLUSION
The originally developed memory system, the simulator features, the simulator software and the use of the simulator for laboratory exercises and self-learning have been presented in the paper. The initial version of the simulator software, written in Visual Basic a few years ago, could be used only
The Computer Journal Vol. 48 No. 6, 2005 in the laboratory. However, positive experiences in using the simulator and emerging trends in distance learning, imposed the need to make it web based.
The new simulator software has been developed with the programming language Java in the form of applets. This decision has brought a few very important advantages. The simulator software has been developed as a web based application in a fairly natural way. Java offers facilities for the presentation of screens with large number of visual elements. The object oriented concept has made it easy to maintain code and carry out its testing. The Java environment is free, so that the development and use of the simulator software does not incur any additional expenses. Java is absolutely software and hardware independent on both the client and server sides.
The memory system, which includes the virtual memory and TLB, the cache memory and the main memory, meets the design objectives and there are no plans for further enhancements. The simulator software has been developed and is currently being used by students, who use their personal accounts to access the simulator via the Internet. The simulator is executed within the Jacarta Tomcat Web server, the use of which is free of charge, and the Microsoft Internet Explorer and the Netscape Navigator browsers. The system also works with the Microsoft Information Server and the HP Web Server.
Evaluations of the benefits in using the simulator software for teaching various aspects of memory system have been conducted. The students who used system were better prepared and had deeper understanding of basic concepts, so the time needed for revision could be reduced. Students have shown great interest in using the memory system with the average number of 18 accesses per day during the last few months. We believe that as a direct benefit of the system being used for self-learning the results in the last few exams have improved with average mark being increased from 7.89/10 to 8.23/10.
