302 research outputs found

    Combining Processor Virtualization and Split Compilation for Heterogeneous Multicore Embedded Systems

    Get PDF
    Complex embedded systems have always been heterogeneous multicore systems. Because of the tight constraints on power, performance and cost, this situation is not likely to change any time soon. As a result, the software environments required to program those systems have become very complex too. We propose to apply instruction set virtualization and just-in-time compilation techniques to program heterogeneous multicore embedded systems, with several additional requirements: * the environment must be able to compile legacy C/C++ code to a target independent intermediate representation; * the just-in-time (JIT) compiler must generate high performance code; * the technology must be able to program the whole system, not just the host processor. Advantages that derive from such an environment include, among others, much simpler software engineering, reduced maintenance costs, reduced legacy code problems... It also goes beyond mere binary compatibility by providing a better exploitation of the hardware platform. We also propose to combine processor virtualization with split compilation to improve the performance of the JIT compiler. Taking advantage of the two-step compilation process, we want to make it possible to run very aggressive optimizations online, even on a very constraint system

    08441 Abstracts Collection -- Emerging Uses and Paradigms for Dynamic Binary Translation

    Get PDF
    From 26.10. to 31.10.2008, the Dagstuhl Seminar 08441 ``Emerging Uses and Paradigms for Dynamic Binary Translation \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Combining Processor Virtualization and Component-Based Engineering in C for Heterogeneous Many-Core Platforms

    Get PDF
    Embedded system design is driven by strong efficiency constraints in terms of performance, silicon area and power consumption, as well as pressure on the cost and time-to-market. As of today, this forms three tough problems: 1) many-core systems are becoming mainstream, however there is still no decent approach for distributing software applications on those platforms; 2) these systems still integrate heterogeneous processors for efficiency reasons, thus programming them requires complex compilation environments; 3) hardware resources are precious and low-level languages are still a must to exploit them subtly. These factors have negative impact on the programmability of many-core platforms and limit our ability to address the challenges of the next decade. This paper devises a new programming approach leveraging processor virtualization and component-based software engineering technologies to address these issues all together. It presents a programming model based on C for developing fine grain component-based applications, and a toolset that compiles them into a processor-independent bytecode representation that can be deployed on heterogeneous platforms. We also discuss the effectiveness of this approach and present some ideas that might have a key role in addressing the above challenges

    Improving early design stage timing modeling in multicore based real-time systems

    Get PDF
    This paper presents a modelling approach for the timing behavior of real-time embedded systems (RTES) in early design phases. The model focuses on multicore processors - accepted as the next computing platform for RTES - and in particular it predicts the contention tasks suffer in the access to multicore on-chip shared resources. The model presents the key properties of not requiring the application's source code or binary and having high-accuracy and low overhead. The former is of paramount importance in those common scenarios in which several software suppliers work in parallel implementing different applications for a system integrator, subject to different intellectual property (IP) constraints. Our model helps reducing the risk of exceeding the assigned budgets for each application in late design stages and its associated costs.This work has received funding from the European Space Agency under Project Reference AO=17722=13=NL=LvH, and has also been supported by the Spanish Ministry of Science and Innovation grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Vector processor virtualization: distributed memory hierarchy and simultaneous multithreading

    Get PDF
    Taking advantage of DLP (Data-Level Parallelism) is indispensable in most data streaming and multimedia applications. Several architectures have been proposed to improve both the performance and energy consumption for such applications. Superscalar and VLIW (Very Long Instruction Word) processors, along with SIMD (Single-Instruction Multiple-Data) and vector processor (VP) accelerators, are among the available options for designers to accomplish their desired requirements. On the other hand, these choices turn out to be large resource and energy consumers, while also not being always used efficiently due to data dependencies among instructions and limited portion of vectorizable code in single applications that deploy them. This dissertation proposes an innovative architecture for a multithreaded VP which separates the path for performing data shuffle and memory-indexed accesses from the data path for executing other vector instructions that access the memory. This separation speeds up the most common memory access operations by avoiding extra delays and unnecessary stalls. In this multilane-based VP design, each vector lane uses its own private memory to avoid any stalls during memory access instructions. More importantly, the proposed VP has an innovative multithreaded architecture which makes it highly suitable for concurrent sharing in multicore environments. To this end, the VP which is developed in VHDL and prototyped on an FPGA (Field-Programmable Gate Array), serves as a coprocessor for one or more scalar cores in various system architectures presented in the dissertation. In the first system architecture, the VP is allocated exclusively to a single scalar core. Benchmarking shows that the VP can achieve very high performance. The inclusion of distributed data shuffle engines across vector lanes has a spectacular impact on the execution time, primarily for applications like FFT (Fast-Fourier Transform) that require large amounts of data shuffling. In the second system architecture, a VP virtualization technique is presented which, when applied, enables the multithreaded VP to simultaneously execute many threads of various vector lengths. The threads compete simultaneously for the VP resources having as a goal an improved aggregate VP utilization. This approach yields high VP utilization even under low utilization for the individual threads. A vector register file (VRF) virtualization technique dynamically allocates physical vector registers to running threads. The technique is implemented for a multi-core processor embedded in an FPGA. Under the dynamic creation of threads, benchmarking demonstrates large VP speedups and drastic energy savings when compared to the first system architecture. In the last system architecture, further improvements focus on VP virtualization relying exclusively on hardware. Moreover, a pipelined data shuffle network replaces the non-pipelined shuffle engines. The VP can then take advantage of identical instruction flows that may be present in different vector applications by running in a fused instruction mode that increases its utilization. A power dissipation model is introduced as well as two optimization policies towards minimizing the consumed energy, or the product of the energy and runtime for a given application. Benchmarking shows the positive impact of these optimizations

    Infrastructures and Compilation Strategies for the Performance of Computing Systems

    Get PDF
    This document presents our main contributions to the field of compilation, and more generally to the quest of performance ofcomputing systems.It is structured by type of execution environment, from static compilation (execution of native code), to JIT compilation, and purelydynamic optimization. We also consider interpreters. In each chapter, we give a focus on the most relevant contributions.Chapter 2 describes our work about static compilation. It covers a long time frame (from PhD work 1995--1998 to recent work on real-timesystems and worst-case execution times at Inria in 2015) and various positions, both in academia and in the industry.My research on JIT compilers started in the mid-2000s at STMicroelectronics, and is still ongoing. Chapter 3 covers the results we obtained on various aspects of JIT compilers: split-compilation, interaction with real-time systems, and obfuscation.Chapter 4 reports on dynamic binary optimization, a research effort started more recently, in 2012. This considers the optimization of a native binary (without source code), while it runs. It incurs significant challenges but also opportunities.Interpreters represent an alternative way to execute code. Instead of native code generation, an interpreter executes an infinite loop thatcontinuously reads a instruction, decodes it and executes its semantics. Interpreters are much easier to develop than compilers,they are also much more portable, often requiring a simple recompilation. The price to pay is the reduced performance. Chapter 5presents some of our work related to interpreters.All this research often required significant software infrastructures for validation, from early prototypes to robust quasi products, andfrom open-source to proprietary. We detail them in Chapter 6.The last chapter concludes and gives some perspectives

    A Prior Study of Split Compilation and Approximate Floating-Point Computations

    No full text
    From a future perspective of heterogeneous multicore processors, we studied several optimization processes specified for floating-point numbers in this internship. To adjust this new generation processors, we believe {ÂĄit split compilation} has an important role to receive the maximum benefits from these processors. A purpose of our internship is to know a potential of tradeoff between accuracy and speedup of floating-point computations as a prior-study to promote the notion of split compilation. In many case, we can more successfully optimize a target program if we are able to timely apply appropriate optimizations in dynamic. However, an online compiler cannot always apply aggressive optimizations because of its constraints like memory resources or compilation time, and the online compiler has to respect these constraints. To overcome these constraints, the split compilation can combine statically analyzed information and dynamic timely information. For this reason, in the bibliographic part, we mainly studied the several compilation method and floating-point numbers to smoothly move to our internship
    • …
    corecore