Search CORE

45 research outputs found

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Author: Cole Murray
Dreslinski Ronald G.
Feng Siying
Franke Bjoern
Kaszyk Kuba
Mudge Trevor
O'Boyle Michael F P
Pal Subhankar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2020
Field of study

Crossref

Edinburgh Research Explorer

Multicore Performance Prediction with MPET : Using Scalability Characteristics for Statistical Cross-Architecture Prediction

Author: Arndt Oliver Jakob
Blume Holger
Lüders Matthias
Riggers Christoph
Publication venue: Norwell, Mass. : Springer
Publication date: 01/01/2020
Field of study

Multicore processors serve as target platforms in a broad variety of applications ranging from high-performance computing to embedded mobile computing and automotive applications. But, the required parallel programming opens up a huge design space of parallelization strategies each with potential bottlenecks. Therefore, an early estimation of an application’s performance is a desirable development tool. However, out-of-order execution, superscalar instruction pipelines, as well as communication costs and (shared-) cache effects essentially influence the performance of parallel programs. While offering low modeling effort and good simulation speed, current approximate analytic models provide moderate prediction results so far. Virtual prototyping requires a time-consuming simulation, but produces better accuracy. Furthermore, even existing statistical methods often require detailed knowledge of the hardware for characterization. In this work, we present a concept called Multicore Performance Evaluation Tool (MPET) and its evaluation for a statistical approach for performance prediction based on abstract runtime parameters, which describe an application’s scalability behavior and can be extracted from profiles without user input. These scalability parameters not only include information on the interference of software demands and hardware capabilities, but indicate bottlenecks as well. Depending on the database setup, we achieve a competitive accuracy of 20% mean prediction error (11% median), which we also demonstrate in a case study

Institutionelles Repositorium der Leibniz Universität Hannover

Simulation Environment with Customized RISC-V Instructions for Logic-in-Memory Architectures

Author: Chen Kuan-Hsun
Coluccio Andrea
Lee Jenq Kuen
Lu Chen-Hua
Ottavi Marco
Riente Fabrizio
Su Jia-Hui
Vacca Marco
Publication venue
Publication date: 21/03/2023
Field of study

Nowadays, various memory-hungry applications like machine learning algorithms are knocking "the memory wall". Toward this, emerging memories featuring computational capacity are foreseen as a promising solution that performs data process inside the memory itself, so-called computation-in-memory, while eliminating the need for costly data movement. Recent research shows that utilizing the custom extension of RISC-V instruction set architecture to support computation-in-memory operations is effective. To evaluate the applicability of such methods further, this work enhances the standard GNU binary utilities to generate RISC-V executables with Logic-in-Memory (LiM) operations and develop a new gem5 simulation environment, which simulates the entire system (CPU, peripherals, etc.) in a cycle-accurate manner together with a user-defined LiM module integrated into the system. This work provides a modular testbed for the research community to evaluate potential LiM solutions and co-designs between hardware and software

arXiv.org e-Print Archive

University of Twente Research Information

Recommended from our members

Scalable Emulation of Heterogeneous Systems

Author: Garcia Cota Emilio
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors. To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution. To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators. We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration

Columbia University Academic Commons