49 research outputs found

    Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

    Full text link
    As the number of transistors integrated on a chip continues to increase, a growing challenge is accurately modeling per-formance in the early stages of processor design. Analytical models have been employed to rapidly search for higher performance designs, and can provide insights that detailed simulators may not. This paper proposes techniques to predict the impact of pending cache hits, hardware prefetching, and realistic miss status holding register (MSHR) resources on superscalar performance in the presence of long latency memory systems when employing hybrid analytical models that apply instruction trace analysis. Pending cache hits are secondary references to a cache block for which a request has already been initiated but has not yet completed. We find pending hits resulting from spatial locality and the fine-grained selection of instruction profile window blocks used for analysis both have non-negligible influences on the accuracy of hybrid analytical models and subsequently propose techniques to account for their effects. We then introduce techniques to estimate the performance impact of data prefetching by modeling the timeliness of prefetches and to account for a limited number of MSHRs by restricting the size of profile window blocks. As with earlier hybrid analytical models, our approach is roughly two orders of magnitude faster than detailed simulations. When modeling pending hits for a processor with unlimited outstanding misses we improve the accuracy of our baseline by a factor of 3.9, decreasing average error from 39.7 % to 10.3%. When modeling a processor with data prefetching, a limited number of MSHRs, or both, the techniques result in an average error of 13.8%, 9.5 % and 17.8%, respectively. 1

    Complexity effective memory access scheduling for many-core accelerator architectures

    Full text link
    Modern DRAM systems rely on memory controllers that employ out-of-order scheduling to maximize row access lo-cality and bank-level parallelism, which in turn maximizes DRAM bandwidth. This is especially important in graphics processing unit (GPU) architectures, where the large quan-tity of parallelism places a heavy demand on the memory system. The logic needed for out-of-order scheduling can be expensive in terms of area, especially when compared to an in-order scheduling approach. In this paper, we propose a complexity-effective solution to DRAM request schedul-ing which recovers most of the performance loss incurred by a naive in-order first-in first-out (FIFO) DRAM scheduler compared to an aggressive out-of-order DRAM scheduler. We observe that the memory request stream from individual GPU“shader cores ” tends to have sufficient row access local-ity to maximize DRAM efficiency in most applications with-out significant reordering. However, the interconnection net-work across which memory requests are sent from the shader cores to the DRAM controller tends to finely interleave the numerous memory request streams in a way that destroys the row access locality of the resultant stream seen at the DRAM controller. To address this, we employ an intercon-nection network arbitration scheme that preserves the row access locality of individual memory request streams and, in doing so, achieves DRAM efficiency and system perfor-mance close to that achievable by using out-of-order mem-ory request scheduling while doing so with a simpler de-sign. We evaluate our interconnection network arbitration scheme using crossbar, mesh, and ring networks for a base-line architecture of 8 memory channels, each controlled by its own DRAM controller and 28 shader cores (224 ALUs), supporting up to 1,792 in-flight memory requests. Our re-sults show that our interconnect arbitration scheme coupled with a banked FIFO in-order scheduler obtains up to 91% of the performance obtainable with an out-of-order memory scheduler for a crossbar network with eight-entry DRAM controller queues

    Single-step hydrogen production from NH3, CH4, and biogas in stacked proton ceramic reactors

    Get PDF
    Proton ceramic reactors offer efficient extraction of hydrogen from ammonia, methane, and biogas by coupling endothermic reforming reactions with heat from electrochemical gas separation and compression. Preserving this efficiency in scale-up from cell to stack level poses challenges to the distribution of heat and gas flows and electric current throughout a robust functional design. Here, we demonstrate a 36-cell well-balanced reactor stack enabled by a new interconnect that achieves complete conversion of methane with more than 99% recovery to pressurized hydrogen, leaving a concentrated stream of carbon dioxide. Comparable cell performance was also achieved with ammonia, and the operation was confirmed at pressures exceeding 140 bars. The stacking of proton ceramic reactors into practical thermo-electrochemical devices demonstrates their potential in efficient hydrogen production.This work was supported by Norway’s Ministry of Petroleum and Energy through the Gassnova project CLIMIT grant 618191 in partnership with Engie SA, Equinor, ExxonMobil, Saudi Aramco, Shell, and TotalEnergies and the Research Council of Norway NANO2021 project DynaPro grant 296548

    Case based reasoning as a model for cognitive artificial intelligence.

    Get PDF
    Cognitive Systems understand the world through learning and experience. Case Based Reasoning (CBR) systems naturally capture knowledge as experiences in memory and they are able to learn new experiences to retain in their memory. CBR's retrieve and reuse reasoning is also knowledge-rich because of its nearest neighbour retrieval and analogy-based adaptation of retrieved solutions. CBR is particularly suited to domains where there is no well-defined theory, because they have a memory of experiences of what happened, rather than why/how it happened. CBR's assumption that 'similar problems have similar solutions' enables it to understand the contexts for its experiences and the 'bigger picture' from clusters of cases, but also where its similarity assumption is challenged. Here we explore cognition and meta-cognition for CBR through self-refl ection and introspection of both memory and retrieve and reuse reasoning. Our idea is to embed and exploit cognitive functionality such as insight, intuition and curiosity within CBR to drive robust, and even explainable, intelligence that will achieve problemsolving in challenging, complex, dynamic domains
    corecore