18,414 research outputs found
Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim
NVDLA is an open-source deep neural network (DNN) accelerator which has
received a lot of attention by the community since its introduction by Nvidia.
It is a full-featured hardware IP and can serve as a good reference for
conducting research and development of SoCs with integrated accelerators.
However, an expensive FPGA board is required to do experiments with this IP in
a real SoC. Moreover, since NVDLA is clocked at a lower frequency on an FPGA,
it would be hard to do accurate performance analysis with such a setup. To
overcome these limitations, we integrate NVDLA into a real RISC-V SoC on the
Amazon cloud FPGA using FireSim, a cycle-exact FPGA-accelerated simulator. We
then evaluate the performance of NVDLA by running YOLOv3 object-detection
algorithm. Our results show that NVDLA can sustain 7.5 fps when running YOLOv3.
We further analyze the performance by showing that sharing the last-level cache
with NVDLA can result in up to 1.56x speedup. We then identify that sharing the
memory system with the accelerator can result in unpredictable execution time
for the real-time tasks running on this platform. We believe this is an
important issue that must be addressed in order for on-chip DNN accelerators to
be incorporated in real-time embedded systems.Comment: Presented at the 2nd Workshop on Energy Efficient Machine Learning
and Cognitive Computing for Embedded Applications (EMC2'19
A Detailed Analysis of Contemporary ARM and x86 Architectures
RISC vs. CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the computing landscape is significantly different: growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC ISA). Further, the traditionally low-power ARM ISA is entering the high-performance server market, while the traditionally high-performance x86 ISA is entering the mobile low-power device market. Thus, the question of whether ISA plays an intrinsic role in performance or energy efficiency is becoming important, and we seek to answer this question through a detailed measurement based study on real hardware running real applications. We analyze measurements on the ARM Cortex-A8 and Cortex-A9 and Intel Atom and Sandybridge i7 microprocessors over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors? performance and energy efficiency. We find that ARM and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
Memory consistency models (MCMs) which govern inter-module interactions in a
shared memory system, are a significant, yet often under-appreciated, aspect of
system design. MCMs are defined at the various layers of the hardware-software
stack, requiring thoroughly verified specifications, compilers, and
implementations at the interfaces between layers. Current verification
techniques evaluate segments of the system stack in isolation, such as proving
compiler mappings from a high-level language (HLL) to an ISA or proving
validity of a microarchitectural implementation of an ISA.
This paper makes a case for full-stack MCM verification and provides a
toolflow, TriCheck, capable of verifying that the HLL, compiler, ISA, and
implementation collectively uphold MCM requirements. The work showcases
TriCheck's ability to evaluate a proposed ISA MCM in order to ensure that each
layer and each mapping is correct and complete. Specifically, we apply TriCheck
to the open source RISC-V ISA, seeking to verify accurate, efficient, and legal
compilations from C11. We uncover under-specifications and potential
inefficiencies in the current RISC-V ISA documentation and identify possible
solutions for each. As an example, we find that a RISC-V-compliant
microarchitecture allows 144 outcomes forbidden by C11 to be observed out of
1,701 litmus tests examined. Overall, this paper demonstrates the necessity of
full-stack verification for detecting MCM-related bugs in the hardware-software
stack.Comment: Proceedings of the Twenty-Second International Conference on
Architectural Support for Programming Languages and Operating System
MGSim - Simulation tools for multi-core processor architectures
MGSim is an open source discrete event simulator for on-chip hardware
components, developed at the University of Amsterdam. It is intended to be a
research and teaching vehicle to study the fine-grained hardware/software
interactions on many-core and hardware multithreaded processors. It includes
support for core models with different instruction sets, a configurable
multi-core interconnect, multiple configurable cache and memory models, a
dedicated I/O subsystem, and comprehensive monitoring and interaction
facilities. The default model configuration shipped with MGSim implements
Microgrids, a many-core architecture with hardware concurrency management.
MGSim is furthermore written mostly in C++ and uses object classes to represent
chip components. It is optimized for architecture models that can be described
as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table
HardScope: Thwarting DOP with Hardware-assisted Run-time Scope Enforcement
Widespread use of memory unsafe programming languages (e.g., C and C++)
leaves many systems vulnerable to memory corruption attacks. A variety of
defenses have been proposed to mitigate attacks that exploit memory errors to
hijack the control flow of the code at run-time, e.g., (fine-grained)
randomization or Control Flow Integrity. However, recent work on data-oriented
programming (DOP) demonstrated highly expressive (Turing-complete) attacks,
even in the presence of these state-of-the-art defenses. Although multiple
real-world DOP attacks have been demonstrated, no efficient defenses are yet
available. We propose run-time scope enforcement (RSE), a novel approach
designed to efficiently mitigate all currently known DOP attacks by enforcing
compile-time memory safety constraints (e.g., variable visibility rules) at
run-time. We present HardScope, a proof-of-concept implementation of
hardware-assisted RSE for the new RISC-V open instruction set architecture. We
discuss our systematic empirical evaluation of HardScope which demonstrates
that it can mitigate all currently known DOP attacks, and has a real-world
performance overhead of 3.2% in embedded benchmarks
Recommended from our members
A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI
This work demonstrates a RISC-V vector microprocessor implemented in 28 nm FDSOI with fully integrated simultaneous-switching switched-capacitor DC-DC (SC DC-DC) converters and adaptive clocking that generates four on-chip voltages between 0.45 and 1 V using only 1.0 V core and 1.8 V IO voltage inputs. The converters achieve high efficiency at the system level by switching simultaneously to avoid charge-sharing losses and by using an adaptive clock to maximize performance for the resulting voltage ripple. Details about the implementation of the DC-DC switches, DC-DC controller, and adaptive clock are provided, and the sources of conversion loss are analyzed based on measured results. This system pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20 ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80%-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices
- …