24 research outputs found

    The Effect of Code Expanding Optimizations on Instruction Cache Design

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / MIP-8809478NCRAMD 29K Advanced Processor Development DivisionNational Aeronautics and Space Administration / NASA NAG 1-613N00014-91-J-128

    Synthesis of [B,Al]-EWT-Type Zeolite and Its Catalytic Properties

    No full text
    EWT zeolite belongs to ultra-large pore zeolite with the 10MR and 21MR channels, which has good thermal stability, certain acid strength and good application prospects in petroleum refining and petrochemical reactions. However, EWT zeolite has fewer medium/strong acid sites, especially Brönsted acid sites, which makes it difficult to apply to acid-catalyzed reactions. The regulation of acid amount and distribution was achieved by boron and aluminum substitution into the siliceous framework of EWT. The physico-chemical properties of the samples were characterized by XRD, SEM, N2 adsorption-desorption, XRF, ICP, Py-IR, NH3-TPD and 11B & 27Al & 29Si MAS NMR. The results show that quantities of boron and aluminum elements can occupy the framework of [B,Al]-EWT to increase the density of medium and strong acid centers, with more acidity and Brönsted acid centers than EWT zeolite. In the reaction of glycerol with cyclohexanone, the conversion of the sample (U-90-08-10/U-90-H-HCl) is significantly higher than that of the EWT sample, approaching or exceeding the Beta zeolite. A catalytic activity study revealed a direct correlation between the Brönsted acidic site concentration and the activity of the catalyst. The U-90-08-10-H catalyst was also considerably stable in the catalytic process. This work shows, for the first time, that extra-large pore zeolites can be used in industrial acid-catalytic conversion processes with excellent catalytic performance

    Comparing Static And Dynamic Code Scheduling for Multiple-Instruction-Issue Processors

    No full text
    This paper examines two alternative approaches to supporting code scheduling for multiple-instruction-issue processors. One is to provide a set of non-trapping instructions so that the compiler can perform aggressive static code scheduling. The application of this approach to existing commercial architectures typically requires extending the instruction set. The other approach is to support out-of-order execution in the microarchitecture so that the hardware can perform aggressive dynamic code scheduling. This approach usually does not require modifying the instruction set but requires complex hardware support. In this paper, we analyze the performance of the two alternative approaches using a set of important nonnumerical C benchmark programs. A distinguishing feature of the experiment is that the code for the dynamic approach has been optimized and scheduled as much as allowed by the architecture. The hardware is only responsible for the additional reordering that cannot be performed..

    Scalar Program Performance on Multiple-Instruction-Issue Processors with a Limited Number of Registers

    No full text
    In this paper the performance of multiple-instruction-issue processors with variable register file sizes is examined for a set of scalar programs. We make several important observations. First, multiple-instruction-issue processors can perform effectively without a large number of registers. In fact, the register files of many existing architectures (16-32 registers) are capable of sustaining a high instruction execution rate. Second, even for small register files (8-12 registers), substantial performance gains can be obtained by increasing the issue rate of a processor. In general, the percentage increase in performance achieved by increasing the issue rate is relatively constant for all register file sizes. Finally, code transformations designed for multiple-instruction-issue processors are found to be effective for all register file sizes; however, for small register files, the performance improvement is limited due to the excessive spill code introduced by the transformations

    Tolerating Data Access Latency with Register Preloading

    No full text
    By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies. Keywords: data dependence analysis, load latency, register file, re..

    Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching

    No full text
    The performance of superscalar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effectively support compilerassisted data prefetching in superscalar processors. In particular, a prefetch buffer is shown to be more effective than increasing the cache dimension in solving the cache pollution problem. All in all, we show that a small data cache with compiler-assisted data prefetching can achieve a performance level close to that of an ideal cache. 1 Introduction Superscalar processors can potentially deliver more than five times speedup over conventional single-issue processors [1]. With the total execution cycle count dramatically reduced, each cycle becomes more significant to the overall performance. Because each data cache miss can introduce many extra execution cycles, a superscalar processor can easily lose the majority of its performance to the memory hierarchy. Out-of-..

    Profile-guided automatic inline expansion for C programs

    No full text
    This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro le information has been implemented and integrated into an optimizing C compiler. The experimental results show that this inliner achieves signi cant speedups for production C programs

    The Effect of Compiler Optimizations on Available Parallelism in Scalar Programs

    No full text
    In this paper we analyze the e ect of compiler optimizations on ne grain parallelism in scalar programs. We characterize three levels of optimization: classical, superscalar, and multiprocessor. We show that classical optimizations not only improve a program's e ciency but also its parallelism. Superscalar optimizations further improve the parallelism for moderately parallel machines. For highly parallel machines, however, they actually constrain available parallelism. The multiprocessor optimizations we consider are memory renaming and data migration
    corecore