4,783 research outputs found

    Memory bank predictors

    Get PDF
    Cache memories are commonly implemented through multiple memory banks to improve bandwidth and latency. The early knowledge of the data cache bank that an instruction will access can help to improve the performance in several ways. One scenario that is likely to become increasingly important is clustered microprocessors with a distributed cache. This work presents a study of different cache bank predictors. We show that effective bank predictors can be implemented with relatively low cost. For instance, a predictor of approximately 4 Kbytes is shown to achieve an average hit rate of 78% for SPECint2000 when used to predict accesses to an 8-bank cache memory in a contemporary superscalar processor. We also show how a predictor can be used to reduce the communication latency caused by memory accesses in a clustered microarchitecture with a distributed cache design.Peer ReviewedPostprint (published version

    Microarchitectural techniques to reduce interconnect power in clustered processors

    Get PDF
    Journal ArticleThe paper presents a preliminary evaluation of novel techniques that address a growing problem - power dissipation in on-chip interconnects. Recent studies have shown that around 50% of the dynamic power consumption in modern processors is within on-chip interconnects. The contribution of interconnect power to total chip power is expected to be higher in future communication-bound billion-transistor architectures. In this paper, we propose the design of a heterogeneous interconnect, where some wires are optimized for low latency and others are optimized for low power. We show that a large fraction of on-chip communications are latency insensitive. Effecting these non-critical transfers on low-power long-latency interconnects can result in significant power savings without unduly affecting performance. Two primary techniques are evaluated in this paper: (i) a dynamic critical path predictor that identifies results that are not urgently consumed, and (ii) an address prediction mechanism that requires addresses to be transferred off the critical path for verification purposes. Our results demonstrate that 49% of all interconnect transfers can be effected on power-efficient wires, while incurring a performance penalty of only 2.5%

    Critical dependence of morphodynamic models of fluvial and tidal systems on empirical downslope sediment transport

    Get PDF
    The morphological development of fluvial and tidal systems is forecast more and more frequently by models in scientific and engineering studies for decision making regarding climate change mitigation, flood control, navigation and engineering works. However, many existing morphodynamic models predict unrealistically high channel incision, which is often dampened by increased gravity-driven sediment transport on side-slopes by up to two orders of magnitude too high. Here we show that such arbitrary calibrations dramatically bias sediment dynamics, channel patterns, and rate of morphological change. For five different models bracketing a range of scales and environments, we found that it is impossible to calibrate a model on both sediment transport magnitude and morphology. Consequently, present calibration practice may cause an order magnitude error in either morphology or morphological change. We show how model design can be optimized for different applications. We discuss the major implications for model interpretation and a critical knowledge gap

    Gait Analysis of Horses for Lameness Detection with Radar Sensors

    Get PDF
    This paper presents the preliminary investigation of the use of radar signatures to detect and assess lameness of horses and its severity. Radar sensors in this context can provide attractive contactless sensing capabilities, as a complementary or alternative technology to the current techniques for lameness assessment using video-graphics and inertial sensors attached to the horses' body. The paper presents several examples of experimental data collected at the Weipers Centre Equine Hospital at the University of Glasgow, showing the micro- Doppler signatures of horses and preliminary results of their analysis

    An evaluation of the TRIPS computer system

    Get PDF
    The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine concurrency for high performance while tolerating emerging technology scaling challenges, such as increasing wire delays and power consumption. This paper evaluates how well TRIPS meets this goal through a detailed ISA and performance analysis. We compare performance, using cycles counts, to commercial processors. On SPEC CPU2000, the Intel Core 2 outperforms compiled TRIPS code in most cases, although TRIPS matches a Pentium 4. On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3. Compared to conventional ISAs, the block-atomic model provides a larger instruction window, increases concurrency at a cost of more instructions executed, and replaces register and memory accesses with more efficient direct instruction-to-instruction communication. Our analysis suggests ISA, microarchitecture, and compiler enhancements for addressing weaknesses in TRIPS and indicates that EDGE architectures have the potential to exploit greater concurrency in future technologies
    • …
    corecore