8,811 research outputs found

    An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

    Full text link

    The current state of biomarker research for Friedreich's ataxia: a report from the 2018 FARA biomarker meeting

    Get PDF
    The 2018 FARA Biomarker Meeting highlighted the current state of development of biomarkers for Friedreich's ataxia. A mass spectroscopy assay to sensitively measure mature frataxin (reduction of which is the root cause of disease) is being developed. Biomarkers to monitor neurological disease progression include imaging, electrophysiological measures and measures of nerve function, which may be measured either in serum and/or through imaging-based technologies. Potential pharmacodynamic biomarkers include metabolic and protein biomarkers and markers of nerve damage. Cardiac imaging and serum biomarkers may reflect cardiac disease progression. Considerable progress has been made in the development of biomarkers for various contexts of use, but further work is needed in terms of larger longitudinal multisite studies, and identification of novel biomarkers for additional use cases

    Interconnect-aware coherence protocols for chip multiprocessors

    Get PDF
    Journal ArticleImprovements in semiconductor technology have made it possible to include multiple processor cores on a single die. Chip Multi-Processors (CMP) are an attractive choice for future billion transistor architectures due to their low design complexity, high clock frequency, and high throughput. In a typical CMP architecture, the L2 cache is shared by multiple cores and data coherence is maintained among private L1s. Coherence operations entail frequent communication over global on-chip wires. In future technologies, communication between different L1s will have a significant impact on overall processor performance and power consumption. On-chip wires can be designed to have different latency, bandwidth, and energy properties. Likewise, coherence protocol messages have different latency and bandwidth needs. We propose an interconnect composed of wires with varying latency, bandwidth, and energy characteristics, and advocate intelligently mapping coherence operations to the appropriate wires. In this paper, we present a comprehensive list of techniques that allow coherence protocols to exploit a heterogeneous interconnect and evaluate a subset of these techniques to show their performance and power-efficiency potential. Most of the proposed techniques can be implemented with a minimum complexity overhead

    Quantum gases. Critical dynamics of spontaneous symmetry breaking in a homogeneous Bose gas.

    Get PDF
    Kibble-Zurek theory models the dynamics of spontaneous symmetry breaking, which plays an important role in a wide variety of physical contexts, ranging from cosmology to superconductors. We explored these dynamics in a homogeneous system by thermally quenching an atomic gas with short-range interactions through the Bose-Einstein phase transition. Using homodyne matter-wave interferometry to measure first-order correlation functions, we verified the central quantitative prediction of the Kibble-Zurek theory, namely the homogeneous-system power-law scaling of the coherence length with the quench rate. Moreover, we directly confirmed its underlying hypothesis, the freezing of the correlation length near the transition. Our measurements agree with a beyond-mean-field theory and support the expectation that the dynamical critical exponent for this universality class is z = 3/2.We thank M. Robert-de-Saint-Vincent for experimental assistance; R. Fletcher for comments on the manuscript; and N. Cooper, J. Dalibard, G. Ferrari, B. Phillips, and W. Zwerger for insightful discussions. This work was supported by AFOSR, ARO, DARPA OLE, and EPSRC (grant no. EP/K003615/1). N.N. acknowledges support from Trinity College, Cambridge, and R.P.S. from the Royal Society.This is the accepted manuscript of a paper published in Science, 9 January 2015, Vol. 347, no. 6218 pp. 167-170 DOI: 10.1126/science.125867

    Design tradeoffs for simplicity and efficient verification in the Execution Migration Machine

    Get PDF
    As transistor technology continues to scale, the architecture community has experienced exponential growth in design complexity and significantly increasing implementation and verification costs. Moreover, Moore's law has led to a ubiquitous trend of an increasing number of cores on a single chip. Often, these large-core-count chips provide a shared memory abstraction via directories and coherence protocols, which have become notoriously error-prone and difficult to verify because of subtle data races and state space explosion. Although a very simple hardware shared memory implementation can be achieved by simply not allowing ad-hoc data replication and relying on remote accesses for remotely cached data (i.e., requiring no directories or coherence protocols), such remote-access-based directoryless architectures cannot take advantage of any data locality, and therefore suffer in both performance and energy. Our recently taped-out 110-core shared-memory processor, the Execution Migration Machine (EM[superscript 2]), establishes a new design point. On the one hand, EM[superscript 2] supports shared memory but does not automatically replicate data, and thus preserves the simplicity of directoryless architectures. On the other hand, it significantly improves performance and energy over remote-access-only designs by exploiting data locality at remote cores via fast hardware-level thread migration. In this paper, we describe the design choices made in the EM[superscript 2] chip as well as our choice of design methodology, and discuss how they combine to achieve design simplicity and verification efficiency. Even though EM[superscript 2] is a fairly large design-110 cores using a total of 357 million transistors-the entire chip design and implementation process (RTL, verification, physical design, tapeout) took only 18 man-months

    TLB-Based Temporality-Aware Classification in CMPs with Multilevel TLBs

    Full text link
    "© 2017 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works."[EN] Recent proposals are based on classifying memory accesses into private or shared in order to process private accesses more efficiently and reduce coherence overhead. The classification mechanisms previously proposed are either not able to adapt to the dynamic sharing behavior of the applications or require frequent broadcast messages. Additionally, most of these classification approaches assume single-level translation lookaside buffers (TLBs). However, deeper and more efficient TLB hierarchies, such as the ones implemented in current commodity processors, have not been appropriately explored. This paper analyzes accurate classification mechanisms in multilevel TLB hierarchies. In particular, we propose an efficient data classification strategy for systems with distributed shared last-level TLBs. Our approach classifies data accounting for temporal private accesses and constrains TLB-related traffic by issuing unicast messages on first-level TLB misses. When our classification is employed to deactivate coherence for private data in directory-based protocols, it improves the directory efficiency and, consequently, reduces coherence traffic to merely 53.0%, on average. Additionally, it avoids some of the overheads of previous classification approaches for purely private TLBs, improving average execution time by nearly 9% for large-scale systems.This work has been jointly supported by the MINECO and European Commission (FEDER funds) under the project TIN2015-66972-C5-1-R and TIN2015-66972-C5-3-R and the Fundacion Seneca-Agencia de Ciencia y Tecnologia de la Region de Murcia under the project Jovenes Lideres en Investigacion 18956/JLI/13.Esteve Garcia, A.; Ros Bardisa, A.; Gómez Requena, ME.; Robles Martínez, A.; Duato Marín, JF. (2017). TLB-Based Temporality-Aware Classification in CMPs with Multilevel TLBs. IEEE Transactions on Parallel and Distributed Systems. 28(8):2401-2413. https://doi.org/10.1109/TPDS.2017.2658576S2401241328
    corecore