Search CORE

302 research outputs found

Combining Processor Virtualization and Split Compilation for Heterogeneous Multicore Embedded Systems

Author: Rohou Erven
Publication venue: Dagstuhl Seminar Proceedings. 08441 - Emerging Uses and Paradigms for Dynamic Binary Translation
Publication date: 01/10/2008
Field of study

Complex embedded systems have always been heterogeneous multicore systems. Because of the tight constraints on power, performance and cost, this situation is not likely to change any time soon. As a result, the software environments required to program those systems have become very complex too. We propose to apply instruction set virtualization and just-in-time compilation techniques to program heterogeneous multicore embedded systems, with several additional requirements: * the environment must be able to compile legacy C/C++ code to a target independent intermediate representation; * the just-in-time (JIT) compiler must generate high performance code; * the technology must be able to program the whole system, not just the host processor. Advantages that derive from such an environment include, among others, much simpler software engineering, reduced maintenance costs, reduced legacy code problems... It also goes beyond mere binary compatibility by providing a better exploitation of the hardware platform. We also propose to combine processor virtualization with split compilation to improve the performance of the JIT compiler. Taking advantage of the two-step compilation process, we want to make it possible to run very aggressive optimizations online, even on a very constraint system

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

HAL-Rennes 1

08441 Abstracts Collection -- Emerging Uses and Paradigms for Dynamic Binary Translation

Author: Childers Bruce R.
Davidson Jack
De Brosschere Koen
Soffa Mary Lou
Publication venue: Dagstuhl Seminar Proceedings. 08441 - Emerging Uses and Paradigms for Dynamic Binary Translation
Publication date: 01/01/2009
Field of study

From 26.10. to 31.10.2008, the Dagstuhl Seminar 08441 ``Emerging Uses and Paradigms for Dynamic Binary Translation \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

Combining Processor Virtualization and Component-Based Engineering in C for Heterogeneous Many-Core Platforms

Author: Cornero Marco
Ornstein Andrea Carlo
Rohou Erven
Özcan Ali Erdem
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

Embedded system design is driven by strong efficiency constraints in terms of performance, silicon area and power consumption, as well as pressure on the cost and time-to-market. As of today, this forms three tough problems: 1) many-core systems are becoming mainstream, however there is still no decent approach for distributing software applications on those platforms; 2) these systems still integrate heterogeneous processors for efficiency reasons, thus programming them requires complex compilation environments; 3) hardware resources are precious and low-level languages are still a must to exploit them subtly. These factors have negative impact on the programmability of many-core platforms and limit our ability to address the challenges of the next decade. This paper devises a new programming approach leveraging processor virtualization and component-based software engineering technologies to address these issues all together. It presents a programming model based on C for developing fine grain component-based applications, and a toolset that compiles them into a processor-independent bytecode representation that can be deployed on heterogeneous platforms. We also discuss the effectiveness of this approach and present some ideas that might have a key role in addressing the above challenges

INRIA a CCSD electronic archive server

Improving early design stage timing modeling in multicore based real-time systems

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Fernández Mikel
Jalle Ibarra Javier
Trilla David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper presents a modelling approach for the timing behavior of real-time embedded systems (RTES) in early design phases. The model focuses on multicore processors - accepted as the next computing platform for RTES - and in particular it predicts the contention tasks suffer in the access to multicore on-chip shared resources. The model presents the key properties of not requiring the application's source code or binary and having high-accuracy and low overhead. The former is of paramount importance in those common scenarios in which several software suppliers work in parallel implementing different applications for a system integrator, subject to different intellectual property (IP) constraints. Our model helps reducing the risk of exceeding the assigned budgets for each application in late design stages and its associated costs.This work has received funding from the European Space Agency under Project Reference AO=17722=13=NL=LvH, and has also been supported by the Spanish Ministry of Science and Innovation grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Scalable Emulation of Heterogeneous Systems

Author: Garcia Cota Emilio
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors. To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution. To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators. We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration

Columbia University Academic Commons

Vector processor virtualization: distributed memory hierarchy and simultaneous multithreading

Author: Rooholamin SeyedAmin
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2016
Field of study

Taking advantage of DLP (Data-Level Parallelism) is indispensable in most data streaming and multimedia applications. Several architectures have been proposed to improve both the performance and energy consumption for such applications. Superscalar and VLIW (Very Long Instruction Word) processors, along with SIMD (Single-Instruction Multiple-Data) and vector processor (VP) accelerators, are among the available options for designers to accomplish their desired requirements. On the other hand, these choices turn out to be large resource and energy consumers, while also not being always used efficiently due to data dependencies among instructions and limited portion of vectorizable code in single applications that deploy them. This dissertation proposes an innovative architecture for a multithreaded VP which separates the path for performing data shuffle and memory-indexed accesses from the data path for executing other vector instructions that access the memory. This separation speeds up the most common memory access operations by avoiding extra delays and unnecessary stalls. In this multilane-based VP design, each vector lane uses its own private memory to avoid any stalls during memory access instructions. More importantly, the proposed VP has an innovative multithreaded architecture which makes it highly suitable for concurrent sharing in multicore environments. To this end, the VP which is developed in VHDL and prototyped on an FPGA (Field-Programmable Gate Array), serves as a coprocessor for one or more scalar cores in various system architectures presented in the dissertation. In the first system architecture, the VP is allocated exclusively to a single scalar core. Benchmarking shows that the VP can achieve very high performance. The inclusion of distributed data shuffle engines across vector lanes has a spectacular impact on the execution time, primarily for applications like FFT (Fast-Fourier Transform) that require large amounts of data shuffling. In the second system architecture, a VP virtualization technique is presented which, when applied, enables the multithreaded VP to simultaneously execute many threads of various vector lengths. The threads compete simultaneously for the VP resources having as a goal an improved aggregate VP utilization. This approach yields high VP utilization even under low utilization for the individual threads. A vector register file (VRF) virtualization technique dynamically allocates physical vector registers to running threads. The technique is implemented for a multi-core processor embedded in an FPGA. Under the dynamic creation of threads, benchmarking demonstrates large VP speedups and drastic energy savings when compared to the first system architecture. In the last system architecture, further improvements focus on VP virtualization relying exclusively on hardware. Moreover, a pipelined data shuffle network replaces the non-pipelined shuffle engines. The VP can then take advantage of identical instruction flows that may be present in different vector applications by running in a fused instruction mode that increases its utilization. A power dissipation model is introduced as well as two optimization policies towards minimizing the consumed energy, or the product of the energy and runtime for a given application. Benchmarking shows the positive impact of these optimizations

Digital Commons @ New Jersey Institute of Technology (NJIT)

Infrastructures and Compilation Strategies for the Performance of Computing Systems

Author: Rohou Erven
Publication venue: HAL CCSD
Publication date: 02/11/2015
Field of study

This document presents our main contributions to the field of compilation, and more generally to the quest of performance ofcomputing systems.It is structured by type of execution environment, from static compilation (execution of native code), to JIT compilation, and purelydynamic optimization. We also consider interpreters. In each chapter, we give a focus on the most relevant contributions.Chapter 2 describes our work about static compilation. It covers a long time frame (from PhD work 1995--1998 to recent work on real-timesystems and worst-case execution times at Inria in 2015) and various positions, both in academia and in the industry.My research on JIT compilers started in the mid-2000s at STMicroelectronics, and is still ongoing. Chapter 3 covers the results we obtained on various aspects of JIT compilers: split-compilation, interaction with real-time systems, and obfuscation.Chapter 4 reports on dynamic binary optimization, a research effort started more recently, in 2012. This considers the optimization of a native binary (without source code), while it runs. It incurs significant challenges but also opportunities.Interpreters represent an alternative way to execute code. Instead of native code generation, an interpreter executes an infinite loop thatcontinuously reads a instruction, decodes it and executes its semantics. Interpreters are much easier to develop than compilers,they are also much more portable, often requiring a simple recompilation. The price to pay is the reduced performance. Chapter 5presents some of our work related to interpreters.All this research often required significant software infrastructures for validation, from early prototypes to robust quasi products, andfrom open-source to proprietary. We detail them in Chapter 6.The last chapter concludes and gives some perspectives

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL-Rennes 1

A Prior Study of Split Compilation and Approximate Floating-Point Computations

Author: Taguchi Takanoki
Publication venue: HAL CCSD
Publication date: 21/06/2011
Field of study

From a future perspective of heterogeneous multicore processors, we studied several optimization processes specified for floating-point numbers in this internship. To adjust this new generation processors, we believe {¥it split compilation} has an important role to receive the maximum benefits from these processors. A purpose of our internship is to know a potential of tradeoff between accuracy and speedup of floating-point computations as a prior-study to promote the notion of split compilation. In many case, we can more successfully optimize a target program if we are able to timely apply appropriate optimizations in dynamic. However, an online compiler cannot always apply aggressive optimizations because of its constraints like memory resources or compilation time, and the online compiler has to respect these constraints. To overcome these constraints, the split compilation can combine statically analyzed information and dynamic timely information. For this reason, in the bibliographic part, we mainly studied the several compilation method and floating-point numbers to smoothly move to our internship

HAL-Rennes 1

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen