8 research outputs found

    Optimized Memory Access For Dynamically Scheduled High Level Synthesis

    Get PDF
    Dynamically-scheduled elastic circuits generated by High-Level Synthesis (HLS) tools are inherently out-of-order, following the flow of data rather than the evolution of an instruction pointer. Components of the circuit which access memory need to be connected to a Load-Store Queue (LSQ) that dynamically checks for memory dependencies, performs store ordering and forwarding, and allows unordered access to Random-Access Memory (RAM) whenever possible. While connecting every memory access (load/store) component to an LSQ ensures correctness of program execution, the hardware and power cost makes this solution unattractive. Statically ruling out dependencies allows circuits to access memory via lightweight components that use an arbitrator to handle RAM port sharing. Reducing the number of components using the LSQ allows the compiler to generate smaller queues which results in superlinear savings in hardware and power for the memory subsystem. This work describes additions to the Elastic Compiler (EC) that allow it to analyze algorithms expressed in LLVM-IR, an intermediate code representation, to rule out memory dependencies between load/store instructions and their underlying insights. These analyses leverage pointer analysis as well as array access patterns to narrow down the list of possibly dependent instructions. We also enhance the compiler to leverage our analyses and automatically generate relevant memory-access components for the circuit and to connect them to the relevant arbitrator or LSQ

    GluA3 subunits are required for appropriate assembly of AMPAR GluA2 and GluA4 subunits on cochlear afferent synapses and for presynaptic ribbon modiolar-pillar morphology

    Get PDF
    Cochlear sound encoding depends on α-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid receptors (AMPARs), but reliance on specific pore-forming subunits is unknown. With 5-week-old male C57BL/6

    Maturation of heterogeneity in afferent synapse ultrastructure in the mouse cochlea

    Get PDF
    Auditory nerve fibers (ANFs) innervating the same inner hair cell (IHC) may have identical frequency tuning but different sound response properties. In cat and guinea pig, ANF response properties correlate with afferent synapse morphology and position on the IHC, suggesting a causal structure-function relationship. In mice, this relationship has not been fully characterized. Here we measured the emergence of synaptic morphological heterogeneities during maturation of the C57BL/6J mouse cochlea by comparing postnatal day 17 (p17, ∼3 days after hearing onset) with p34, when the mouse cochlea is mature. Using serial block face scanning electron microscopy and three-dimensional reconstruction we measured the size, shape, vesicle content, and position of 70 ribbon synapses from the mid-cochlea. Several features matured over late postnatal development. From p17 to p34, presynaptic densities (PDs) and post-synaptic densities (PSDs) became smaller on average (PDs: 0.75 to 0.33; PSDs: 0.58 to 0.31 μ

    Shrink It or Shed It! Minimize the Use of LSQs in Dataflow Designs

    No full text
    When applications have unpredictable memory accesses or irregular control flow, dataflow circuits overcome the limitations of statically scheduled high-level synthesis (HLS). If memory dependences cannot be determined at compile time, dataflow circuits rely on load-store queues (LSQs) to resolve the dependences dynamically, as the circuit runs. However, when employed on reconfigurable platforms, these LSQs are resource-expensive, slow, and power-consuming. In this work, we explore techniques for reducing the cost of the memory interface in dataflow designs. Apart from exploiting standard memory analysis techniques, we present a novel approach which relies on the topology of the control and dataflow graphs to infer memory order with the purpose of minimizing the LSQ size and complexity. On benchmarks obtained automatically from C code, we show that our approach results in significant area reductions, as well as increased performance, compared to naive solutions

    Rebooting Virtual Memory with Midgard

    No full text
    Computer systems designers are building cache hierarchies with higher capacity to capture the ever-increasing working sets of modern workloads. Cache hierarchies with higher capacity improve system performance but shift the performance bottleneck to address translation. We propose Midgard, an intermediate address space between the virtual and the physical address spaces, to mitigate address translation overheads without program-level changes. Midgard leverages the operating system concept of virtual memory areas (VMAs) to realize a single Midgard address space where VMAs of all processes can be uniquely mapped. The Midgard address space serves as the namespace for all data in a coherence domain and the cache hierarchy. Because real-world workloads use far fewer VMAs than pages to represent their virtual address space, virtual to Midgard translation is achieved with hardware structures that are much smaller than TLB hierarchies. Costlier Midgard to physical address translations are needed only on LLC misses, which become much less frequent with larger caches. As a consequence, Midgard shows that instead of amplifying address translation overheads, memory hierarchies with large caches can reduce address translation overheads. Our evaluation shows that Midgard achieves only 5% higher address translation overhead as compared to traditional TLB hierarchies for 4KB pages when using a 16MB aggregate LLC. Midgard also breaks even with traditional TLB hierarchies for 2MB pages when using a 256MB aggregate LLC. For cache hierarchies with higher capacity, Midgard's address translation overhead drops to near zero as secondary and tertiary data working sets fit in the LLC, while traditional TLBs suffer even higher degrees of address translation overhead

    SpecROP: Speculative Exploitation of ROP Chains

    No full text
    Speculative execution attacks, such as Spectre, reuse code from the victim’s binary to access and leak secret information during speculative execution. Every variant of the attack requires very particular code sequences, necessitating elaborate gadget-search campaigns. Often, victim programs contain few, or even zero, usable gadgets. Consequently, speculative attacks are sometimes demonstrated by injecting usable code sequences into the victim. So far, attacks search for monolithic gadgets, a single sequence of code which performs all the attack steps. We introduce SpecROP, a novel speculative execution attack technique, inspired by classic code reuse attacks like Return-Oriented Programming to tackle the rarity of code gadgets. The SpecROP attacker uses multiple, small gadgets chained by poisoning multiple control-flow instructions to perform the same computation as a monolithic gadget. A key difference to classic code reuse attacks is that control-flow transfers between gadgets use speculative targets compared to targets in memory or registers. We categorize SpecROP gadgets into generic classes and demonstrate the abundance of such gadgets in victim libraries. Further, we explore the practicality of influencing multiple control-flow instructions on modern processors, and demonstrate an attack which uses gadget chaining to increase the leakage potential of a Spectre variant, SMoTherSpectre
    corecore