4 research outputs found
A Simple Multi-Core Functional Cache Design Simulator
This paper presents a flexible multi-core cache memory simulator to design and evaluate memory hierarchies for general-purpose or embedded processors. The proposed simulator needs to work with Pin, which is an open-source dynamic instrumentation tool provided by Intel. The Pin intercepts the execution of instructions and generates a sequence code (traces) to feed into the simulator for any selected benchmark programs, such as SPEC2006, SPLASH2, or PARSEC. We have a plan to release this simulator as an open-source (like Pin) to support research and/or academic community for their simulation works. In addition, we expect more functions can be updated on top of this simulator to share by the research community
Recommended from our members
Evaluating the Scalability of SDF Single-chip Multiprocessor Architecture Using Automatically Parallelizing Code
Advances in integrated circuit technology continue to provide more and more transistors on a chip. Computer architects are faced with the challenge of finding the best way to translate these resources into high performance. The challenge in the design of next generation CPU (central processing unit) lies not on trying to use up the silicon area, but on finding smart ways to make use of the wealth of transistors now available. In addition, the next generation architecture should offer high throughout performance, scalability, modularity, and low energy consumption, instead of an architecture that is suitable for only one class of applications or users, or only emphasize faster clock rate. A program exhibits different types of parallelism: instruction level parallelism (ILP), thread level parallelism (TLP), or data level parallelism (DLP). Likewise, architectures can be designed to exploit one or more of these types of parallelism. It is generally not possible to design architectures that can take advantage of all three types of parallelism without using very complex hardware structures and complex compiler optimizations. We present the state-of-art architecture SDF (scheduled data flowed) which explores the TLP parallelism as much as that is supplied by that application. We implement a SDF single-chip multiprocessor constructed from simpler processors and execute the automatically parallelizing application on the single-chip multiprocessor. SDF has many desirable features such as high throughput, scalability, and low power consumption, which meet the requirements of the next generation of CPU design. Compared with superscalar, VLIW (very long instruction word), and SMT (simultaneous multithreading), the experiment results show that for application with very little parallelism SDF is comparable to other architectures, for applications with large amounts of parallelism SDF outperforms other architectures
A Study of a Simultaneous Multithreaded Processor Implementation
This paper describes an approach to the implementation and the operation of a Simultaneous Multithreaded processor. We propose an architecture which integrates a software mechanism to handle contexts, a rapid communication system, as well as a locking system to ensure mutual exclusion. We explain how the architecture manages the running threads as well as the software interface visible to the programmer. Finally, we provide a few indications on the e#ciency of such an architecture.