23 research outputs found

    Fast approximately timed simulation

    Get PDF
    International audienceIn this paper we present a technique for fast approximately timed simulation of software within a virtual prototyping framework. Our method performs a static analysis of the program control flow graph to construct annotations of the simulated program, combined with dynamic performance information. The static analysis estimates execution time based on a target architecture model. The delays introduced by instruction fetch and data cache misses are evaluated dynamically. At the end of each block, static and dynamic information are combined with branch target prediction to compute the total execution time of the blocks. As a result, we can provide approximate performance estimates with a high simulation speed that is still usable for software developers

    Statistical simulation: Adding efficiency to the computer designer's toolbox

    Full text link

    Technical Report: Feedback-Based Generation of Hardware Characteristics

    Get PDF
    ABSTRACT In large complex server-like computer systems it is difficult to characterise hardware usage in early stages of system development. Many times the applications running on the platform are not ready at the time of platform deployment leading to postponed metrics measurement. In our study we seek answers to the questions: (1) Can we use a feedbackbased control system to create a characteristics model of a real production system? (2) Can such a model be sufficiently accurate to detect characteristics changes instead of executing the production application? The model we have created runs a signalling application, similar to the production application, together with a PIDregulator generating L1 and L2 cache misses to the same extent as the production system. Our measurements indicate that we have managed to mimic a similar environment regarding cache characteristics. Additionally we have applied the model on a software update for a production system and detected characteristics changes using the model. This has later been verified on the complete production system, which in this study is a large scale telecommunication system with a substantial market share

    Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores

    Get PDF
    The current many-core architectures are generally evaluated by a detailed emulation with a cycle-accurate simulation of the execution time. However this detailed simulation of the architecture makes the evaluation of large programs very slow. Since the focus in many-core architecture is shifting from the performance of the individual core to the overall behavior of chip, high-level simulations are becoming neces- sary, which evaluate the same architecture at less detailed level and allow the designer to make quick and reasonably accurate design decisions. We have developed a high-level simulator for the design space exploration of the Microgrid, which is a many-core architecture comprised of many fine- grained multi-threaded cores. This simulator allows us to investigate mapping and scheduling strategies of families (i.e. groups of threads) in developing an operating environ- ment for the Microgrid. The previous method to evaluate the workload counted in basic blocks was inaccurate. The key problem is that with many concurrent threads the la- tency of certain instructions are hidden because of the multi- threaded nature of the core. This paper presents a technique to manage the execution time of different types of instruc- tions with thread concurrency. We believe to achieve high accuracy in evaluating programs in the high-level simulator

    SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

    Get PDF
    Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. We present the sampling microarchitecture simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates. Analysis of 41 of the 45 possible SPEC2K benchmark/ input combinations show CPI and energy per instruction (EPI) can be estimated to within 3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in micro-architectural state initialization introduces an additional uncertainty which we empirically bound to /spl sim/2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively

    Simulation sampling with live-points

    Get PDF
    Current simulation-sampling techniques construct accurate model state for each measurement by continuously warming large microarchitectural structures (e.g., caches and the branch predictor) while functionally simulating the billions of instructions between measurements. This approach, called functional warming, is the main performance bottleneck of simulation sampling and requires hours of runtime while the detailed simulation of the sample requires only minutes. Existing simulators can avoid functional simulation by jumping directly to particular instruction stream locations with architectural state checkpoints. To replace functional warming, these checkpoints must additionally provide microarchitectural model state that is accurate and reusable across experiments while meeting tight storage constraints. In this paper, we present a simulation-sampling framework that replaces functional warming with live-points without sacrificing accuracy. A live-point stores the bare minimum of functionally-warmed state for accurate simulation of a limited execution window while placing minimal restrictions on microarchitectural configuration. Live-points can be processed in random rather than program order, allowing simulation results and their statistical confidence to be reported while simulations are in progress. Our framework matches the accuracy of prior simulation-sampling techniques (i.e., ±3% error with 99.7% confidence), while estimating the performance of an 8-way out-of-order superscalar processor running SPEC CPU2000 in 91 seconds per benchmark, on average, using a 12 GB live-point librar

    Herramienta para la simulación y visualización de procesadores superscalares

    Get PDF
    Nuestro proyecto ha consistido en la elaboración de predictores de saltos para usar en el simulador Simplescalar. Simplescalar es una potente herramienta que permite la simulación de un procesador superescalar, desde distintos puntos de vista. Uno de estos puntos es la predicción de saltos, algo fundamental para el buen funcionamiento y rendimiento de un microprocesador. El simulador se encuentra dividido en varios módulos, cada uno de ellos tiene el código abierto, por lo que se permite su modificación para así poder asegurar a los investigadores que puedan probar con comodidad aquello en lo que estén interesados. El lenguaje de este código es C, y está estructurado y modulado de tal forma que se permiten hacer cambios con relativa facilidad. [ABSTRACT]Our project deals with the elaboration of branch predictors to use them with the Simpescalar simulator. Simplescalar is a powerful tool that allows a superscalar processor’s simulation from several viewpoints. One of these points is branch prediction, which is basic to have a good work and performance in a microprocessor. The simulator is divided into several modules, each one of them with their open code. That is the reason why it’s permitted its modification in order to make sure to investigators a comfortable testing of the topics they are interested in. The language of this code is C, and it is structured and moduled in such form that allows to make changes with relative easiness

    Empirical and Statistical Application Modeling Using on -Chip Performance Monitors.

    Get PDF
    To analyze the performance of applications and architectures, both programmers and architects desire formal methods to explain anomalous behavior. To this end, we present various methods that utilize non-intrusive, performance-monitoring hardware only recently available on microprocessors to provide further explanations of observed behavior. All the methods attempt to characterize and explain the instruction-level parallelism achieved by codes on different architectures. We also present a prototype tool automating the analysis process to exploit the advantages of the empirical and statistical methods proposed. The empirical, statistical and hybrid methods are discussed and explained with case study results provided. The given methods further the wealth of tools available to programmer\u27s and architects for generally understanding the performance of scientific applications. Specifically, the models and tools presented provide new methods for evaluating and categorizing application performance. The empirical memory model serves to quantify the hierarchical memory performance of applications by inferring the incurred latencies of codes after the effect of latency hiding techniques are realized. The instruction-level model and its extensions model on-chip performance analytically giving insight into inherent performance bottlenecks in superscalar architectures. The statistical model and its hybrid extension provide other methods of categorizing codes via their statistical variations. The PTERA performance tool automates the use of performance counters for use by these methods across platforms making the modeling process easier still. These unique methods provide alternatives to performance modeling and categorizing not available previously in an attempt to utilize the inherent modeling capabilities of performance monitors on commodity processors for scientific applications
    corecore