271 research outputs found

    Multimedia terminal system-on-chip design and simulation

    Get PDF
    This paper proposes a design approach based on integrated architectural and system-on-chip (SoC) simulations. The main idea is to have an efficient framework for the design and the evaluation of multimedia terminals, allowing a fast system simulation with a definable degree of accuracy. The design approach includes the simulation of very long instruction word (VLIW) digital signal processors (DSPs), the utilization of a device multiplexing the media streams, and the emulation of the real-time media acquisition. This methodology allows the evaluation of both the multimedia algorithm implementations and the hardware platform, giving feedback on the complete SoC including the interaction between modules and conflicts in accessing either the bus or shared resources. An instruction set architecture (ISA) simulator and an SoC simulation environment compose the integrated framework. In order to validate this approach, the evaluation of an audio-video multiprocessor terminal is presented, and the complete simulation test results are reported

    Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors

    Get PDF
    Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures

    Elementary operations: a novel concept for source-level timing estimation

    Get PDF
    Early application timing estimation is essential in decision making during design space exploration of heterogeneous embedded systems in terms of hardware platform dimensioning and component selection. The decisions which have the impact on project duration and cost must be made before a platform prototype is available and software code is ready to be linked and thus timing estimation must be done using high-level models and simulators. Because of the ever increasing need to shorten the time to market, reducing the amount of time required to obtain the results is as important as achieving high estimation accuracy. In this paper, we propose a novel approach to source-level timing estimation with the aim to close the speed-accuracy gap by raising the level of abstraction and improving result reusability. We introduce a concept – elementary operations as distinct parts of source code which enable capturing platform behaviour without having the exact model of the processor pipeline, cache etc. We also present a timing estimation method which relies on elementary operations to craft hardware profiling benchmark and to build application and platform profiles. Experiments show an average estimation error of 5%, with maximum below 16%

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    A hybrid genetic algorithm for constrained hardware- software partitioning

    Get PDF
    In this article, we propose a novel partitioning method for hardware-software codesign based on a genetic algorithm that has been enhanced for this specific task. Given a high- level program and an area constraint, our software considers different granularities levels to discover the most interesting blocks to be implemented in ad hoc functional units that can then be used as new instructions in a Move processor. Various optimizations are conducted to obtain a clean, very fast (in the order of a few seconds) and efficient partitioning on programs ranging from a few to several hundreds of lines of code

    Compiled Low-Level Virtual Instruction Set Simulation and Profiling for Code Partitioning and ASIP-Synthesis

    Get PDF
    Abstract We present ongoing work and first results in static and detailed quantitative runtime analysis of LLVM byte code for the purpose of automatic procedural level partitioning and cosynthesis of complex software systems. Runtime behaviour is captured by reverse compilation of LLVM bytecode into augmented, self-profiling ANSI-C simulator programs retaining the LLVM instruction level. The actual global data flow is captured both in quantity and value range to guide function unit layout in the synthesis of application specific processors. Currently the implemented tool LLILA (Low Level Intermediate Language Analyzer) focuses on static code analysis on the inter-procedural data flow via e.g. function parameters and global variables to uncover a program's potential paths of data exchange

    HW/SW codesign techniques for dynamically reconfigurable architectures

    Full text link

    An Efficient Power Estimation Methodology for Complex RISC Processor-based Platforms

    Get PDF
    International audienceIn this contribution, we propose an efficient power estima- tion methodology for complex RISC processor-based plat- forms. In this methodology, the Functional Level Power Analysis (FLPA) is used to set up generic power models for the different parts of the system. Then, a simulation framework based on virtual platform is developed to evalu- ate accurately the activities used in the related power mod- els. The combination of the two parts above leads to a het- erogeneous power estimation that gives a better trade-off be- tween accuracy and speed. The usefulness and effectiveness of our proposed methodology is validated through ARM9 and ARM CortexA8 processor designed respectively around the OMAP5912 and OMAP3530 boards. This efficiency and the accuracy of our proposed methodology is evaluated by using a variety of basic programs to complete media bench- marks. Estimated power values are compared to real board measurements for the both ARM940T and ARM CortexA8 architectures. Our obtained power estimation results pro- vide less than 3% of error for ARM940T processor, 3.5% for ARM CortexA8 processor-based system and 1x faster compared to the state-of-the-art power estimation tools

    Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator

    Get PDF
    Abstract. Instruction set simulators (ISS) are vital tools for compiler and processor architecture design space exploration and verification. State-of-the-art simulators using just-in-time (JIT) dynamic binary translation (DBT) techniques are able to simulate complex embedded processors at speeds above 500 MIPS. However, these functional ISS do not provide microarchitectural observability. In contrast, low-level cycle-accurate ISS are too slow to simulate full-scale applications, forcing developers to revert to FPGA-based simulations. In this paper we demonstrate that it is possible to run ultra-high speed cycle-accurate instruction set simulations surpassing FPGA-based simulation speeds. We extend the JIT DBT engine of our ISS and augment JIT generated code with a verified cycle-accurate processor model. Our approach can model any microarchitectural configuration, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded processor implementing the ARCompact TM instruction set architecture (ISA). We achieve simulation speeds up to 88 MIPS on a standard x86 desktop computer for the industry standard EEMBC, COREMARK and BIOPERF benchmark suites.
    • 

    corecore