22 research outputs found

    Fast simulation techniques for microprocessor design space exploration

    Get PDF
    Designing a microprocessor is extremely time-consuming. Computer architects heavily rely on architectural simulators, e.g., to drive high-level design decisions during early stage design space exploration. The benefit of architectural simulators is that they yield relatively accurate performance results, are highly parameterizable and are very flexible to use. The downside, however, is that they are at least three or four orders of magnitude slower than real hardware execution. The current trend towards multicore processors exacerbates the problem; as the number of cores on a multicore processor increases, simulation speed has become a major concern in computer architecture research and development. In this dissertation, we propose and evaluate two simulation techniques that reduce the simulation time significantly: statistical simulation and interval simulation. Statistical simulation speeds up the simulation by reducing the number of dynamically executed instructions. First, we collect a number of program execution characteristics into a statistical profile. From this profile we can generate a synthetic trace that exhibits the same execution behavior but which has a much shorter trace length as compared to the original trace. Simulating this synthetic trace then yields a performance estimate. Interval simulation raises the level of abstraction in architectural simulation; it replaces the core-level cycle-accurate simulation model by a mechanistic analytical model. The analytical model builds on insights from interval analysis: miss events divide the smooth streaming of instructions into so called intervals. The model drives the timing by analyzing the type of the miss events and their latencies, instead of tracking the individual instructions as they propagate through the pipeline stages

    Interval simulation: raising the level of abstraction in architectural simulation

    Get PDF
    Detailed architectural simulators suffer from a long development cycle and extremely long evaluation times. This longstanding problem is further exacerbated in the multi-core processor era. Existing solutions address the simulation problem by either sampling the simulated instruction stream or by mapping the simulation models on FPGAs; these approaches achieve substantial simulation speedups while simulating performance in a cycle-accurate manner This paper proposes interval simulation which rakes a completely different approach: interval simulation raises the level of abstraction and replaces the core-level cycle-accurate simulation model by a mechanistic analytical model. The analytical model estimates core-level performance by analyzing intervals, or the timing between two miss events (branch mispredictions and TLB/cache misses); the miss events are determined through simulation of the memory hierarchy, cache coherence protocol, interconnection network and branch predictor By raising the level of abstraction, interval simulation reduces both development time and evaluation time. Our experimental results using the SPEC CPU2000 and PARSEC benchmark suites and the MS multi-core simulator show good accuracy up to eight cores (average error of 4.6% and max error of 11% for the multi-threaded full-system workloads), while achieving a one order of magnitude simulation speedup compared to cycle-accurate simulation. Moreover interval simulation is easy to implement: our implementation of the mechanistic analytical model incurs only one thousand lines of code. Its high accuracy, fast simulation speed and ease-of-use make interval simulation a useful complement to the architect's toolbox for exploring system-level and high-level micro-architecture trade-offs

    Statistical simulation for exploring the chip multiprocessor design space

    No full text

    Statistical Simulation of Chip Multiprocessors Running Multi-Program Workloads

    Get PDF
    This paper explores statistical simulation as a fast simulation technique for driving chip multiprocessor (CMP) design space exploration. The idea of statistical simulation is to measure a number of important program execution characteristics, generate a synthetic trace, and simulate that synthetic trace. The important benefit is that a synthetic trace is very small compared to real program traces. This paper advances statistical simulation by modeling shared resources, such as shared caches and off-chip bandwidth. This is done (i) by collecting cache set access probabilities and per-set LRU stack depth profiles, and (ii) by modeling a program’s time-varying execution behavior in the synthetic trace. The key benefit is that the statistical profile is independent of a given cache configuration and the amount of multiprocessing, which enables statistical simulation to model conflict behavior in shared caches when multiple programs are co-executing on a CMP. We demonstrate that statistical simulation is both accurate and fast with average IPC prediction errors of less than 5.5 % and simulation speedups of 40X to 70X compared to the detailed simulation of 100M-instruction traces. This makes statistical simulation a viable tool for CMP design space exploration.

    Chip multiprocessor design space exploration through statistical simulation

    No full text
    Abstract—Developing fast chip multiprocessor simulation techniques is a challenging problem. Solving this problem is especially valuable for design space exploration purposes during the early stages of the design cycle where a large number of design points need to be evaluated quickly. This paper studies statistical simulation as a fast simulation technique for chip multiprocessor (CMP) design space exploration. The idea of statistical simulation is to measure a number of program execution characteristics from a real program execution through profiling, to generate a synthetic trace from it, and simulate that synthetic trace as a proxy for the original program. The important benefit is that the synthetic trace is much shorter compared to a real program trace, which leads to substantial simulation speedups. This paper enhances state-of-the-art statistical simulation: 1) by modeling the memory address stream behavior in a more microarchitecture-independent way and 2) by modeling a program’s time-varying execution behavior. These two enhancements enable accurately modeling resource conflicts in shared resources as observed in the memory hierarchy of contemporary chip multiprocessors when multiple programs are coexecuting on the CMP. Our experimental evaluation using the SPEC CPU benchmarks demonstrates average prediction error of 7.3 percent across a range of CMP configurations while varying the number of cores and memory hierarchy configurations. Index Terms—Performance of systems (modeling techniques, simulation).
    corecore