897 research outputs found

    Architectural Approaches For Gallium Arsenide Exploitation In High-Speed Computer Design

    Get PDF
    Continued advances in the capability of Gallium Arsenide (GaAs)technology have finally drawn serious interest from computer system designers. The recent demonstration of very large scale integration (VLSI) laboratory designs incorporating very fast GaAs logic gates herald a significant role for GaAs technology in high-speed computer design:1 In this thesis we investigate design approaches to best exploit this promising technology in high-performance computer systems. We find significant differences between GaAs and Silicon technologies which are of relevance for computer design. The advantage that GaAs enjoys over Silicon in faster transistor switching speed is countered by a lower transistor count capability for GaAs integrated circuits. In addition, inter-chip signal propagation speeds in GaAs systems do not experience the same speedup exhibited by GaAs transistors; thus, GaAs designs are penalized more severely by inter-chip communication. The relatively low density of GaAs chips and the high cost of communication between them are significant obstacles to the full exploitation of the fast transistors of GaAs technology. A fast GaAs processor may be excessively underutilized unless special consideration is given to its information (instructions and data) requirements. Desirable GaAs system design approaches encourage low hardware resource requirements, and either minimize the processor’s need for off-chip information, maximize the rate of off-chip information transfer, or overlap off-chip information transfer with useful computation. We show the impact that these considerations have on the design of the instruction format, arithmetic unit, memory system, and compiler for a GaAs computer system. Through a simulation study utilizing a set of widely-used benchmark programs, we investigate several candidate instruction pipelines and candidate instruction formats in a GaAs environment. We demonstrate the clear performance advantage of an instruction pipeline based upon a pipelined memory system over a typical Silicon-like pipeline. We also show the performance advantage of packed instruction formats over typical Silicon instruction formats, and present a packed format which performs better than the experimental packed Stanford MIPS format

    Exploiting Fine-Grain Concurrency Analytical Insights in Superscalar Processor Design

    Get PDF
    This dissertation develops analytical models to provide insight into various design issues associated with superscalar-type processors, i.e., the processors capable of executing multiple instructions per cycle. A survey of the existing machines and literature has been completed with a proposed classification of various approaches for exploiting fine-grain concurrency. Optimization of a single pipeline is discussed based on an analytical model. The model-predicted performance curves are found to be in close proximity to published results using simulation techniques. A model is also developed for comparing different branch strategies for single-pipeline processors in terms of their effectiveness in reducing branch delay. The additional instruction fetch traffic generated by certain branch strategies is also studied and is shown to be a useful criterion for choosing between equally well performing strategies. Next, processors with multiple pipelines are modelled to study the tradeoffs associated with deeper pipelines versus multiple pipelines. The model developed can reveal the cause of performance bottleneck: insufficient resources to exploit discovered parallelism, insufficient instruction stream parallelism, or insufficient scope of concurrency detection. The cost associated with speculative (i.e., beyond basic block) execution is examined via probability distributions that characterize the inherent parallelism in the instruction stream. The throughput prediction of the analytic model is shown, using a variety of benchmarks, to be close to the measured static throughput of the compiler output, under resource and scope constraints. Further experiments provide misprediction delay estimates for these benchmarks under scope constraints, assuming beyond-basic-block, out-of-order execution and run-time scheduling. These results were derived using traces generated by the Multiflow TRACE SCHEDULINGâ„¢(*) compacting C and FORTRAN 77 compilers. A simplified extension to the model to include multiprocessors is also proposed. The extended model is used to analyze combined systems, such as superpipelined multiprocessors and superscalar multiprocessors, both with shared memory. It is shown that the number of pipelines (or processors) at which the maximum throughput is obtained is increasingly sensitive to the ratio of memory access time to network access delay, as memory access time increases. Further, as a function of inter-iteration dependency distance, optimum throughput is shown to vary nonlinearly, whereas the corresponding Optimum number of processors varies linearly. The predictions from the analytical model agree with published results based on simulations. (*)TRACE SCHEDULING is a trademark of Multiflow Computer, Inc

    Context flow architecture

    Get PDF

    Instruction prefetch strategies in a pipelined processor

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1983.MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERINGIncludes bibliographical references.by Hubert Rae McLellan, Jr.M.S

    A VHDL model of a superscalar implementation of the DLX instruction set architcture

    Get PDF
    The complexity of today\u27s microprocessors demands that designers have an extensive knowledge of superscalar design techniques; this knowledge is difficult to acquire outside of a professional design team. Presently, there are a limited number of adequate resources available for the student, both in textual and model form. The limited number of options available emphasizes the need for more models and simulators, allowing students the opportunity to learn more about superscalar designs prior to entering the work force. This thesis details the design and implementation of a superscalar version of the DLX instruction set architecture in behavioral VHDL. The branch prediction strategy, instruction issue model, and hazard avoidance techniques are all issues critical to superscalar processor design and are studied in this thesis. Preliminary test results demonstrate that the performance advantage of the superscalar processor is applicable even to short test sequences. Initial findings have shown a performance improvement of 26% to 57% for instruction sequences under 150 instructions

    Efficient design space exploration of embedded microprocessors

    Get PDF
    • …
    corecore