Search CORE

208 research outputs found

Understanding the Memory Consumption of the MiBench Embedded Benchmark

Author: Blin Antoine
Courtaud Cédric
Lawall Julia
Muller Gilles
Sopena Julien
Publication venue: HAL CCSD
Publication date: 18/05/2016
Field of study

International audienceComplex embedded systems today commonly involve a mix of real-time and best-effort applications. The recent emergence of small low-cost commodity multi-core processors raises the possibility of running both kinds of applications on a single machine, with virtualization ensuring that the best-effort applications cannot steal CPU cycles from the real-time applications. Nevertheless, memory contention can introduce other sources of delay, that can lead to missed deadlines. In this paper, we analyze the sources of memory consumption for the real-time applications found in the MiBench embedded benchmark suite

Crossref

INRIA a CCSD electronic archive server

Bao: A Lightweight Static Partitioning Hypervisor for Modern Multi-Core Embedded Systems

Author: Bertogna Marko
Pinto Sandro
Solieri Marco
Tavares Adriano
Publication venue: OASIcs - OpenAccess Series in Informatics. Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

Lost in translation: Exposing hidden compiler optimization opportunities

Author: Chamski Zbigniew
Eder Kerstin
Garcia Andres Amaya
Georgiou Kyriakos
May David
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/07/2020
Field of study

Existing iterative compilation and machine-learning-based optimization techniques have been proven very successful in achieving better optimizations than the standard optimization levels of a compiler. However, they were not engineered to support the tuning of a compiler's optimizer as part of the compiler's daily development cycle. In this paper, we first establish the required properties which a technique must exhibit to enable such tuning. We then introduce an enhancement to the classic nightly routine testing of compilers which exhibits all the required properties, and thus, is capable of driving the improvement and tuning of the compiler's common optimizer. This is achieved by leveraging resource usage and compilation information collected while systematically exploiting prefixes of the transformations applied at standard optimization levels. Experimental evaluation using the LLVM v6.0.1 compiler demonstrated that the new approach was able to reveal hidden cross-architecture and architecture-dependent potential optimizations on two popular processors: the Intel i5-6300U and the Arm Cortex-A53-based Broadcom BCM2837 used in the Raspberry Pi 3B+. As a case study, we demonstrate how the insights from our approach enabled us to identify and remove a significant shortcoming of the CFG simplification pass of the LLVM v6.0.1 compiler.Comment: 31 pages, 7 figures, 2 table. arXiv admin note: text overlap with arXiv:1802.0984

arXiv.org e-Print Archive

Explore Bristol Research

Detecting and Understanding Dynamically Dead Instructions for Contemporary Architectures

Author: Jantz Marianne
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2013
Field of study

Instructions executed by the processor are dynamically dead if the values they produce are not used by the program. Executing such useless instructions can potentially slow-down program execution and waste power. The goal of this work is to quantify and understand the occurrence of dynamically dead instructions (DDI) for programs compiled using modern compilers for the most relevant contemporary architectures. We expect our extensive study to highlight the issue of DDI and to play a critical role in the development of compiler and/or architectural techniques to avoid DDI execution at runtime. In this thesis, we introduce our novel GCC&ndashbased instrumentation and analysis framework to determine DDI during program execution. We present the ratio and characteristics of DDI in our benchmark programs. We find that programs compiled with GCC (with and without optimizations) execute a significant fraction of DDI on x86 and ARM based machines. Additionally, an ample amount of predication employed by GCC results in a large fraction of executed instructions on the ARM to be dynamically dead. We observe that a handful of static instructions contribute a large majority to the overall DDI in standard benchmark programs. We also find that employing a small amount of static context information can significantly benefit the detection of DDI at run&ndashtime. Additionally, we describe the results of our manual study to analyze and categorize the DDI instances in our x86 benchmarks. We briefly outline compiler and architecture based techniques that can be used to eliminate each category of DDI in future programs. Overall, we believe that a close synergy between compiler and architecture techniques may be the most effective strategy to eliminate DDI to improve sequential program performance and energy efficiency on modern machines

KU ScholarWorks

CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space Exploration

Author: Benini Luca
Martins José
Pinto Sandro
Rossi Davide
Sá Bruno
Valente Luca
Publication venue
Publication date: 28/06/2023
Field of study

Virtualization is a key technology used in a wide range of applications, from cloud computing to embedded systems. Over the last few years, mainstream computer architectures were extended with hardware virtualization support, giving rise to a set of virtualization technologies (e.g., Intel VT, Arm VE) that are now proliferating in modern processors and SoCs. In this article, we describe our work on hardware virtualization support in the RISC-V CVA6 core. Our contribution is multifold and encompasses architecture, microarchitecture, and design space exploration. In particular, we highlight the design of a set of microarchitectural enhancements (i.e., G-Stage Translation Lookaside Buffer (GTLB), L2 TLB) to alleviate the virtualization performance overhead. We also perform a Design Space Exploration (DSE) and accompanying post-layout simulations (based on 22nm FDX technology) to assess Performance, Power ,and Area (PPA). Further, we map design variants on an FPGA platform (Genesys 2) to assess the functional performance-area trade-off. Based on the DSE, we select an optimal design point for the CVA6 with hardware virtualization support. For this optimal hardware configuration, we collected functional performance results by running the MiBench benchmark on Linux atop Bao hypervisor for a single-core configuration. We observed a performance speedup of up to 16% (approx. 12.5% on average) compared with virtualization-aware non-optimized design at the minimal cost of 0.78% in area and 0.33% in power. Finally, all work described in this article is publicly available and open-sourced for the community to further evaluate additional design configurations and software stacks

arXiv.org e-Print Archive

Run-time power estimation for mobile and embedded asymmetric multi-core CPUs

Author: Das Anup K.
Hashimi B.M.
Merrett Geoff V.
Walker Matthew J.
Publication venue
Publication date
Field of study

Southampton (e-Prints Soton)

Specific Read Only Data Management for Memory Hierarchy Optimization

Author: Alexandre Guerre
Barthou Denis
Thomas Dombek
Vaumourin Gregory
Publication venue: HAL CCSD
Publication date: 13/11/2014
Field of study

International audienceThe multiplication of the number of cores inside embedded systems has raised the pressure on the memory hierarchy. The cost of coherence protocol and the scalability problem of the memory hierarchy is nowadays a major issue. In this paper, a specific data management for read-only data is in-vestigated because these data can be duplicated in several memories without being tracked. Based on analysis of stan-dard benchmarks for embedded systems, we show that read-only data represent 62% of all the data used by applications and 18% of all the memory accesses. A specific data path for read-only data is then evaluated by using simulations. On the first level of the memory hierarchy, removing read-only data of the L1 cache and placing them in another read-only cache improve the data locality of the read-write data by 30% and decrease the total energy consumption of the first level memory by 5%

CiteSeerX

INRIA a CCSD electronic archive server

HAL-CEA

TOWARDS PROCESS CONTEXT DRIVEN AND PMU UPDATED PREEMPTIVE SCHEDULING FOR SINGLE-ISA HETEROGENEOUS SYSTEMS

Author: Alifieraki Ioanna - Maria
Publication venue
Publication date: 01/08/2022
Field of study

The University of Manchester - Institutional Repository

RT-Bench: an Extensible Benchmark Framework for the Analysis and Management of Real-Time Applications

Author: Bastoni Andrea
Hoornaert Denis
Mancuso Renato
Nicolella Mattia
Roozkhosh Shahin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/07/2022
Field of study

Benchmarking is crucial for testing and validating any system, even more so in real-time systems. Typical real-time applications adhere to well-understood abstractions: they exhibit a periodic behavior, operate on a well-defined working set, and strive for stable response time avoiding non-predicable factors such as page faults. Unfortunately, available benchmark suites fail to reflect key characteristics of real-time applications. Practitioners and researchers must resort to either benchmark heavily approximated real-time environments, or to re-engineer available benchmarks to add -- if possible -- the sought-after features. Additionally, the measuring and logging capabilities provided by most benchmark suites are not tailored "out-of-the-box" to real-time environments, and changing basic parameters such as the scheduling policy often becomes a tiring and error-prone exercise. In this paper, we present RT-bench, an open-source framework adding standard real-time features to virtually any existing benchmark. Furthermore, RT-bench provides an easy-to-use, unified command line interface to customize key aspects of the real-time execution of a set of benchmarks. Our framework is guided by four main criteria: 1) cohesive interface, 2) support for periodic application behavior and deadline semantics, 3) controllable memory footprint, and 4) extensibility and portability. We have integrated within the framework applications from the widely used SD-VBS and IsolBench suites. We showcase a set of use-cases that are representative of typical real-time system evaluation scenarios and that can be easily conducted via RT-Bench.Comment: 11 pages, 12 figures; code available at https://gitlab.com/rt-bench/rt-bench, documentation available at https://rt-bench.gitlab.io/rt-bench

arXiv.org e-Print Archive