44 research outputs found

    Transfer of innovation - using research tools for engineering education

    Get PDF
    In this article, the authors show how, in the pursuit of research results, they can obtain excellent tools and data for engineering education. In particular, they describe one example of computer architecture in the Computer Engineering Degree Programme at the University of Cádiz in Spain. This research topic is of particular importance as it influences the execution of a range of the computer’s I/O operations, due to operations of peripherals and information devices, and determines processor performance. The simulator used in research and teaching in several engineering degree programmes at the University of Cádiz is also demonstrated in this article

    Application of compiler-assisted multiple instruction rollback recovery to speculative execution

    Get PDF
    Speculative execution is a method to increase instruction level parallelism which can be exploited by both super-scalar and VLIW architectures. The key to a successful general speculation strategy is a repair mechanism to handle mispredicted branches and accurate reporting of exceptions for speculated instructions. Multiple instruction rollback is a technique developed for recovery from transient processor failure. Many of the difficulties encountered during recovery from branch misprediction or from instruction re-execution due to exception in a speculative execution architecture are similar to those encountered during multiple instruction rollback. The applicability of a recently developed compiler-assisted multiple instruction rollback scheme to aid in speculative execution repair is investigated. Extensions to the compiler-assisted scheme to support branch and exception repair are presented along with performance measurements across ten application programs

    Fetch unit design for scalable simultaneous multithreading (ScSMT)

    Get PDF
    Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threading combines static and dynamic mechanisms to assemble a complexity-effective design that provides high instruction per cycle rates without sacrificing cycle time nor single-thread performance. This paper addresses the design of the fetch unit for a high-performance, scalable, simultaneous multithreaded processor. We present the detailed microarchitecture of a clustered and reconfigurable fetch unit based on an existing single-thread fetch unit. In order to minimize the occurrence of fetch hazards, the fetch unit dynamically adapts to the available thread-level parallelism and to the fetch characteristics of the active threads, working as a single shared unit or as two separate clusters. It combines static and dynamic methods in a complexity-efficient way. The design is supported by a simulation- based analysis of different instruction cache and branch target buffer configurations on the context of a multithreaded execution workload. Average reductions on the miss rates between 30% and 60% and peak reductions greater than 200% are obtained.Facultad de Informátic

    Fetch unit design for scalable simultaneous multithreading (ScSMT)

    Get PDF
    Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threading combines static and dynamic mechanisms to assemble a complexity-effective design that provides high instruction per cycle rates without sacrificing cycle time nor single-thread performance. This paper addresses the design of the fetch unit for a high-performance, scalable, simultaneous multithreaded processor. We present the detailed microarchitecture of a clustered and reconfigurable fetch unit based on an existing single-thread fetch unit. In order to minimize the occurrence of fetch hazards, the fetch unit dynamically adapts to the available thread-level parallelism and to the fetch characteristics of the active threads, working as a single shared unit or as two separate clusters. It combines static and dynamic methods in a complexity-efficient way. The design is supported by a simulation- based analysis of different instruction cache and branch target buffer configurations on the context of a multithreaded execution workload. Average reductions on the miss rates between 30% and 60% and peak reductions greater than 200% are obtained.Facultad de Informátic

    Avaliação de uma arquitetura SPARC com cache de desvio e barramento tipo Harvard

    Get PDF
    Variations in the SPARC architecture are studied in this paper, with particular emphasis to the use of a branch target cache and a Harvard bus. A simulator that works in a cycle per cycle basis has been developed to conduct performance measurements of some configurations. The results obtained are reported in this paper.Variações na arquitetura SPARC são estudadas neste artigo, com particular ênfase no uso de uma cache de desvio e de um barramento Harvard. Um simulador que funciona em um modo ciclo foi desenvolvido para realizar medidas de desempenho em várias configurações. Os resultados obtidos são apresentados neste artigo

    MARS: aRISC-based architecture for Lisp

    Get PDF
    [[abstract]]A RISC-based chip set architecture for Lisp is presented in this paper. This architecture contains an instruction fetch unit (IFU) and three processing units—integer processing unit (IPU), floating-point processing unit (FPU), and list processing unit (LPU). The IFU feeds instructions to the processing units and supports fast procedure call/return and branch, the IPU and FPU execute operations of different data type, and the LPU handles the Lisp runtime environment, dynamic type checking, and fast list access. In this architecture, the critical path of complex register file access and ALU operation is distributed into the LPU and IPU, and the tracing of a list can be done quickly by the non-delayed car or cdr instructions of the LPU. Performance simulation shows that this architecture would be about 6.2 times faster than SPUR and about 2.2 times faster than MIPS-X.[[booktype]]紙本[[booktype]]電子

    Alternative implementations of two-level adaptive branch prediction

    Get PDF
    As the issue rate and depth of pipelining of high performance Superscalar processors increase, the importance of an excellent branch predictor becomes more vital to delivering the potential performance of a wide-issue, deep pipelined microarchitecture. We propose a new dynamic branch predictor (Two-Level Adaptive Branch Prediction) that achieves substantially higher accuracy than any other scheme reported in the literature. The mechanism uses two levels of branch history information to make predictions, the history of the last L branches encountered, and the branch behavior for the last s occurrences of the specific pattern of these k branches. We have identified three variations of the Two-Level Adaptive Branch Prediction, depending on how finely we resolve the history information gathered. We compute the hardware costs of implementing each of the three variations, and use these costs in evaluating their relative effectiveness. We measure the branch prediction accuracy of the three variations of Two-Level Adaptive Branch Prediction, along with several other popular proposed dynamic and static prediction schemes, on the SPEC benchmarks. We show that the average prediction accuracy for TwoLevel Adaptive Branch Prediction is 97 percent, while the other known schemes achieve at most 94.4 percent average prediction accuracy. We measure the effectiveness of different prediction algorithms and different amounts of history and pattern information. We measure the costs of each variation to obtain the same prediction accuracy.

    Fetch unit design for scalable simultaneous multithreading (ScSMT)

    Get PDF
    Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threading combines static and dynamic mechanisms to assemble a complexity-effective design that provides high instruction per cycle rates without sacrificing cycle time nor single-thread performance. This paper addresses the design of the fetch unit for a high-performance, scalable, simultaneous multithreaded processor. We present the detailed microarchitecture of a clustered and reconfigurable fetch unit based on an existing single-thread fetch unit. In order to minimize the occurrence of fetch hazards, the fetch unit dynamically adapts to the available thread-level parallelism and to the fetch characteristics of the active threads, working as a single shared unit or as two separate clusters. It combines static and dynamic methods in a complexity-efficient way. The design is supported by a simulation- based analysis of different instruction cache and branch target buffer configurations on the context of a multithreaded execution workload. Average reductions on the miss rates between 30% and 60% and peak reductions greater than 200% are obtained.Facultad de Informátic
    corecore