140 research outputs found

    Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Full text link
    Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits such as high bandwidth and memory-level parallelism. However, the implications of NoCs on the characteristics of 3D-stacked memories in terms of memory access latency and bandwidth have not been fully explored. This paper addresses this knowledge gap by (i) characterizing an HMC prototype on the AC-510 accelerator board and revealing its access latency behaviors, and (ii) by investigating the implications of such behaviors on system and software designs

    Spoofing prevention via RF power profiling in wireless network-on-chip

    Get PDF
    With increasing integration in SoCs, the Network-on-Chip (NoC) connecting of cores and accelerators is of paramount importance to provide low-latency and high-throughput communication. Due to limits of scaling of electrical wires, especially for long multi-mm distances on-chip, alternate technologies such as Wireless NoC (WNoC) have shown promise. Since WNoCs can provide low-latency one-hop transfers across the entire chip, there has been a recent surge in research demonstrating their performance and energy benefits. However, little to no work has studied the additional security challenges that are unique to WNoCs. In this work, we study the potential threat of spoofing attacks in WNoCs due to malicious hardware trojans. We introduce Veritas, a drop-in solution aimed at detecting and correcting such spoofing attacks. To this end, our solution exploits the static propagation environment of WNoCs to associate each node to a power profile. We demonstrate that, with small area and power overheads, Veritas works well in a variety of settings.Peer ReviewedPostprint (author's final draft

    FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks

    Full text link
    Attention mechanisms form the backbone of state-of-the-art machine learning models for a variety of tasks. Deploying them on deep neural network (DNN) accelerators, however, is prohibitively challenging especially under long sequences, as this work identifies. This is due to operators in attention layers exhibiting limited reuse opportunities and quadratic growth in memory footprint, leading to severe memory-boundedness. To address this, we introduce a new attention-tailored dataflow, termed FLAT, which identifies fusion opportunities within the attention layer, and implements an on-chip memory-aware interleaved execution and tiling mechanism. FLAT increases the effective memory bandwidth by efficiently utilizing the high-bandwidth, low-capacity on-chip buffer and thus achieves better run time and compute resource utilization. In our evaluation, FLAT achieves 1.94x and 1.76x speedup and 49% and 42% of energy reduction comparing to baseline execution over state-of-the-art edge and cloud accelerators
    • …
    corecore