30 research outputs found

    Preliminary basic performance analysis of the Cedar multiprocessor memory system

    Get PDF
    Some preliminary basic results on the performance of the Cedar multiprocessor memory system are presented. Empirical results are presented and used to calibrate a memory system simulator which is then used to discuss the scalability of the system

    Direct instruction wakeup for out-of-order processors

    Get PDF
    Instruction queues consume a significant amount of power in high-performance processors, primarily due to instruction wakeup logic access to the queue structures. The wakeup logic delay is also a critical timing parameter. This paper proposes a new queue organization using a small number of successor pointers plus a small number of dynamically allocated full successor bit vectors for cases with a larger number of successors. The details of the new organization are described and it is shown to achieve the performance of CAM-based or full dependency matrix organizations using just one pointer per instruction plus eight full bit vectors. Only two full bit vectors are needed when two successor pointers are stored per instruction. Finally, a design and pre-layout of all critical structures in 70 nm technology was performed for the proposed organization as well as for a CAM-based baseline. The new design is shown to use 1/2 to 1/5th of the baseline instruction queue power, depending on queue size. It is also shown to use significantly less power than the full dependency matrix based design.Peer ReviewedPostprint (published version

    Decoupled Access DRAM Architecture

    No full text
    This paper discusses an approach to reducing memory latency in future systems. It focuses on systems where a single chip DRAM/processor will not be feasible even in 10 years, e.g. systems requiring a large memory and/or many CPU's. In such systems a solution needs to be found to DRAM latency and bandwidth as well as to inter-chip communication. Utilizing the projected advances in chip I/O bandwidth we propose to implement a decoupled access-execute processor where the access processor is placed in memory. Aprogram is compiledtorunasacomputational process and several access processes with the latter executing in the DRAM processors. Instruction set extensions are discussedto support this paradigm. Using multi-level branch prediction the access processor stays ahead of the execute processor and keeps the latter supplied with data. The system reduces latency by moving address computation to memory and thus avoiding sending address to memory by the computational processor. This and the fetchahead capabilities of the access processor arecombined with multiple DRAM "streaming" to improve performance. DRAM caching is assumedtobeused to assist in this as well

    Line size adaptivity analysis of parameterized loop nests for direct mapped data cache

    No full text

    A Simple Low-Energy Instruction Wakeup Mechanism

    No full text
    Instruction issue consumes a large amount of energy in out of order processors, largely in the wakeup logic. Proposed solutions to the problem require prediction or additional hardware complexity to reduce energy consumption and, in some cases, may have a negative impact on processor performance. This paper proposes a mechanism for instruction wakeup, which uses a multi-block instruction queue design. The blocks are turned off until the mechanism determines which blocks to access on wakeup using a simple successor tracking mechanism. The proposed approach is shown to require as little as 1.5 comparisons per committed instruction for SPEC2000 benchmarks

    A radiative transfer module for calculating photolysis rates and solar heating in climate models: Solar-J v7.5

    No full text
    Solar-J is a comprehensive radiative transfer model for the solar spectrum that addresses the needs of both solar heating and photochemistry in Earth system models. Solar-J is a spectral extension of Cloud-J, a standard in many chemical models that calculates photolysis rates in the 0.18-0.8 μm region. The Cloud-J core consists of an eight-stream scattering, plane-parallel radiative transfer solver with corrections for sphericity. Cloud-J uses cloud quadrature to accurately average over correlated cloud layers. It uses the scattering phase function of aerosols and clouds expanded to eighth order and thus avoids isotropic-equivalent approximations prevalent in most solar heating codes. The spectral extension from 0.8 to 12 μm enables calculation of both scattered and absorbed sunlight and thus aerosol direct radiative effects and heating rates throughout the Earth's atmosphere. The Solar-J extension adopts the correlated-k gas absorption bins, primarily water vapor, from the shortwave Rapid Radiative Transfer Model for general circulation model (GCM) applications (RRTMG-SW). Solar-J successfully matches RRTMG-SW's tropospheric heating profile in a clear-sky, aerosol-free, tropical atmosphere. We compare both codes in cloudy atmospheres with a liquid-water stratus cloud and an ice-crystal cirrus cloud. For the stratus cloud, both models use the same physical properties, and we find a systematic low bias of about 3 % in planetary albedo across all solar zenith angles caused by RRTMG-SW's two-stream scattering. Discrepancies with the cirrus cloud using any of RRTMG-SW's three different parameterizations are as large as about 20-40 % depending on the solar zenith angles and occur throughout the atmosphere. Effectively, Solar-J has combined the best components of RRTMG-SW and Cloud-J to build a high-fidelity module for the scattering and absorption of sunlight in the Earth's atmosphere, for which the three major components - wavelength integration, scattering, and averaging over cloud fields - all have comparably small errors. More accurate solutions with Solar-J come with increased computational costs, about 5 times that of RRTMG-SW for a single atmosphere. There are options for reduced costs or computational acceleration that would bring costs down while maintaining improved fidelity and balanced errors
    corecore