5 research outputs found

    Average-case analysis of power consumption in embedded systems

    Get PDF
    Power efficiency is one of the most important constraints in the design of embedded systems since such systems are generally driven by batteries with limited energy budget or restricted power supply. In every embedded system, there are one or more processor cores to run the software and interact with the other hardware components of the system. The power consumption of the processor core(s) has an important impact on the total power dissipated in the system. Hence, the processor power optimization is crucial in satisfying the power consumption constraints, and developing low-power embedded systems. A key aspect of research in processor power optimization and management is “power estimation”. Having a fast and accurate method for processor power estimation at design time helps the designer to explore a large space of design possibilities, to make the optimal choices for developing a power efficient processor. Likewise, understanding the processor power dissipation behaviour of a specific software/application is the key for choosing appropriate algorithms in order to write power efficient software. Simulation-based methods for measuring the processor power achieve very high accuracy, but are available only late in the design process, and are often quite slow. Therefore, the need has arisen for faster, higher-level power prediction methods that allow the system designer to explore many alternatives for developing powerefficient hardware and software. The aim of this thesis is to present fast and high-level power models for the prediction of processor power consumption. Power predictability in this work is achieved in two ways: first, using a design method to develop power predictable circuits; second, analysing the power of the functions in the code which repeat during execution, then building the power model based on average number of repetitions. In the first case, a design method called Asynchronous Charge Sharing Logic (ACSL) is used to implement the Arithmetic Logic Unit (ALU) for the 8051 microcontroller. The ACSL circuits are power predictable due to the independency of their power consumption to the input data. Based on this property, a fast prediction method is presented to estimate the power of ALU by analysing the software program, and extracting the number of ALU-related instructions. This method achieves less than 1% error in power estimation and more than 100 times speedup in comparison to conventional simulation-based methods. In the second case, an average-case processor energy model is developed for the Insertion sort algorithm based on the number of comparisons that take place in the execution of the algorithm. The average number of comparisons is calculated using a high level methodology called MOdular Quantitative Analysis (MOQA). The parameters of the energy model are measured for the LEON3 processor core, but the model is general and can be used for any processor. The model has been validated through the power measurement experiments, and offers high accuracy and orders of magnitude speedup over the simulation-based method

    High-level synthesis for FPGAs: From prototyping to deployment

    Get PDF
    Abstract-Escalating System-on-Chip design complexity is pushing the design community to raise the level of abstraction beyond RTL. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS methodology is happening now, especially for FPGA designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. Index Terms-Domain-specific design, field-programmable gate array (FPGA), high-level synthesis (HLS), quality of results (QoR)

    Exploring abstract interfaces in system-on-chip integration

    Get PDF
    Modern mobile devices are marvels of computation. They can encode high defnition video, processing and compressing over 350MB/s of image data in real time. They have no trouble driving displays with as much resolution as a full laptop, and smartphone manufacturers boast of running games with console quality graphics. Mobile devices pack all of this computational power into a 12\ handheld package by integrating a number of specialized hardware accelerators (IP) along with conventional CPU and GPUs in a system on chip (SoC). Unfortunately, creating these specialized systems is becoming increasingly expensive. Since hardware accelerators come from a number of different sources and design cycles, different accelerator blocks will often contain incompatible hardware interfaces. Therefore, a large portion of SoC design cost comes in the form of designers manually interfacing each accelerator into a system. This work includes everything from building custom logic to wire up a block, to developing the drivers and API needed to take advantage of the hardware. My research focuses on generating these interfaces, including the physical hardware used to tie IP blocks into a system and the associated software collateral. Leveraging recent trends such as High Level Synthesis and other hardware generator methodologies, I propose an IP interface abstraction and parameterization designed to describe the interface of most current IP blocks. By encoding this knowledge at a higher level of abstraction, I am able to construct and demonstrate a hardware generator that maps an interface protocol description into synthesizable register transfer language (RTL), and that can automatically create hardware bridges between different interconnect standards. iv To ease the integration of the next generation of IP blocks-blocks that are automatically generated based of of user specification. I propose a set of interface primitives. \hen integrated into an IP generator, these primitives can automatically generate an interface that my interface system can tie to the rest of the system. I also demonstrate how the information stored in these types of primitives can be used to automatically generate a low level software driver that manages access to the IP blocks. Finally, I show how the simulation environment provided with an IP generator can be used to provide a domain appropriate application programming interface (API) to drive the software. Using an image signal processor generator as my platform, I demonstrate the construction of a map between the simulation software and hardware driver that enables a full one-button flow from algorithm development to applications running on specialized hardware within a working system

    Execution platform modeling for system-level architecture performance analysis

    Get PDF
    Today's embedded systems are designed for more complex and more computationally-intensive applications than they were a decade ago. Most if not every embedded system designed today is a sort of parallel computing system - only called differently - a platform. A platform is essentially a heterogeneous system consisting of communicating processing units of different types and mostly distributed memory units. A platform can be anything: from multiprocessors comprising task-dedicated processors and a dedicated communication network, to a (semi-)programmable multiprocessor that can run parallel processes by means of both interleaving and overlapping. However, the specification, exploration and design of application multiprocessor system platforms from user requirements is still a painstaking process that takes too long and is too costly. Our answer to the above mentioned issues is the Archer approach. It embodies: Application representations (Symbolic Programs - SP), a platform-based library of the architecture components and their configurations (all-in-hardware, all-in-software, hybrid multiprocessor, with dedicated network, hared-bus, highway, burst-bus, or hybrid network), and a mapping methodology (managing the aforementioned representations while transforming application SPs to Archer architecture components), that we have been developing and experimenting with.UBL - phd migration 201
    corecore