8,252 research outputs found

    Micro-threading and FPGA implementation of a RISC microprocessor : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Appendix E removed due to copyright restrictions. Articles are available in the print copy held in the libraryThis thesis is the outcome of research in two areas of computer technology: microprocessor and multi-processor architectures (specifically from the perspective of how differently they tolerate highly-latent and non-deterministic events), and hardware design of complex digital systems containing both datapath and control (particularly microprocessors). This thesis starts by pointing out that in order to achieve high processing speeds, current popular superscalar microprocessors (e.g. Intel Pentiums, Digital Alpha, etc) rely heavily on the technique of speculating the outcome of instruction flow in order to predict the behaviour of non-deterministic computing operations (as in loading operands from high-latency memory into the processor). This is fine only if the speculation is correct. But, what if it isn't? If the speculation fails, this would mean that the processor has to abandon its current decision (which now proved to be the wrong one) for the instruction flow path taken and to start all over again with the other path (the actual correct one). This is a waste of valuable processing time and hardware resources and a reduction of performance when speculation fails. Therefore, these processors can achieve high performance only when the majority of speculations are successful (being able to predict the right path). In an attempt to overcome the above shortcomings, the first part of this thesis is an investigation of the novel vector micro-threading architecture as an alternative approach to the current superscalar-based speculative microprocessor designs. Micro-threading is based on the not-so-novel multithreading technique, which avoids speculation altogether and instead, starts running a different thread of instructions while waiting for the non-determinism to be resolved. This utilizes the chip resources more efficiently without waste of any processing power. The rest of this thesis focuses on the baseline RISC processor platform, the MIPS R2000, which is reviewed first then partially synthesized from the RTL (Register Transfer Level) description using VHDL and then simulated and tested. This is conducted in order for future research to build upon and add the micro-threading architectural add-ons and modifications. Keywords: Micro-threading, Latency Tolerance, FPGA Synthesis, RISC Architecture, MIPS R2000 processor, VHDL

    Transient fault behavior in a microprocessor: A case study

    Get PDF
    An experimental analysis is described which studies the susceptibility of a microprocessor based jet engine controller to upsets caused by current and voltage transients. A design automation environment which allows the run time injection of transients and the tracing from their impact device to the pin level is described. The resulting error data are categorized by the charge levels of the injected transients by location and by their potential to cause logic upsets, latched errors, and pin errors. The results show a 3 picoCouloumb threshold, below which the transients have little impact. An Arithmetic and Logic Unit transient is most likely to result in logic upsets and pin errors (i.e., impact the external environment). The transients in the countdown unit are potentially serious since they can result in latched errors, thus causing latent faults. Suggestions to protect the processor against these errors, by incorporating internal error detection and transient suppression techniques, are also made

    Interval simulation: raising the level of abstraction in architectural simulation

    Get PDF
    Detailed architectural simulators suffer from a long development cycle and extremely long evaluation times. This longstanding problem is further exacerbated in the multi-core processor era. Existing solutions address the simulation problem by either sampling the simulated instruction stream or by mapping the simulation models on FPGAs; these approaches achieve substantial simulation speedups while simulating performance in a cycle-accurate manner This paper proposes interval simulation which rakes a completely different approach: interval simulation raises the level of abstraction and replaces the core-level cycle-accurate simulation model by a mechanistic analytical model. The analytical model estimates core-level performance by analyzing intervals, or the timing between two miss events (branch mispredictions and TLB/cache misses); the miss events are determined through simulation of the memory hierarchy, cache coherence protocol, interconnection network and branch predictor By raising the level of abstraction, interval simulation reduces both development time and evaluation time. Our experimental results using the SPEC CPU2000 and PARSEC benchmark suites and the MS multi-core simulator show good accuracy up to eight cores (average error of 4.6% and max error of 11% for the multi-threaded full-system workloads), while achieving a one order of magnitude simulation speedup compared to cycle-accurate simulation. Moreover interval simulation is easy to implement: our implementation of the mechanistic analytical model incurs only one thousand lines of code. Its high accuracy, fast simulation speed and ease-of-use make interval simulation a useful complement to the architect's toolbox for exploring system-level and high-level micro-architecture trade-offs

    Software-Based Self-Test of Set-Associative Cache Memories

    Get PDF
    Embedded microprocessor cache memories suffer from limited observability and controllability creating problems during in-system tests. This paper presents a procedure to transform traditional march tests into software-based self-test programs for set-associative cache memories with LRU replacement. Among all the different cache blocks in a microprocessor, testing instruction caches represents a major challenge due to limitations in two areas: 1) test patterns which must be composed of valid instruction opcodes and 2) test result observability: the results can only be observed through the results of executed instructions. For these reasons, the proposed methodology will concentrate on the implementation of test programs for instruction caches. The main contribution of this work lies in the possibility of applying state-of-the-art memory test algorithms to embedded cache memories without introducing any hardware or performance overheads and guaranteeing the detection of typical faults arising in nanometer CMOS technologie

    Frontend frequency-voltage adaptation for optimal energy-delay/sup 2/

    Get PDF
    In this paper, we present a clustered, multiple-clock domain (CMCD) microarchitecture that combines the benefits of both clustering and globally asynchronous locally synchronous (GALS) designs. We also present a mechanism for dynamically adapting the frequency and voltage of the frontend of the CMCD with the goal to optimize the energy-delay/sup 2/ product (ED2P). Our mechanism has minimal hardware cost, is entirely self-adjustable, does not depend on any thresholds, and achieves results close to optimal. We evaluate it on 16 SPEC 2000 applications and report 17.5% ED2P reduction on average (80% of the upper bound).Peer ReviewedPostprint (published version
    • 

    corecore