229 research outputs found

    Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software

    Full text link
    Safety-critical embedded systems having to meet real-time constraints are expected to be highly predictable in order to guarantee at design time that certain timing deadlines will always be met. This requirement usually prevents designers from utilizing caches due to their highly dynamic, thus hardly predictable behavior. The integration of scratchpad memories represents an alternative approach which allows the system to benefit from a performance gain comparable to that of caches while at the same time maintaining predictability. In this work, we compare the impact of scratchpad memories and caches on worst case execution time (WCET) analysis results. We show that caches, despite requiring complex techniques, can have a negative impact on the predicted WCET, while the estimated WCET for scratchpad memories scales with the achieved Performance gain at no extra analysis cost.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    An improved instruction-level power model for ARM11 microprocessor

    No full text
    The power and energy consumed by a chip has become the primary design constraint for embedded systems, which has led to a lot of work in hardware design techniques such as clock gating and power gating. The software can also affect the power usage of a chip, hence good software design can be used to reduce the power further. In this paper we present an instruction-level power model based on an ARM1176JZF-S processor to predict the power of software applications. Our model takes substantially less input data than existing high accuracy models and does not need to consider each instruction individually. We show that the power is related to both the distribution of instruction types and the operations per clock cycle (OPC) of the program. Our model does not need to consider the effect of two adjacent instructions, which saves a lot of calculation and measurements. Pipeline stall effects are also considered by OPC instead of cache miss, because there are a lot of other reasons that can cause the pipeline to stall. The model shows good performance with a maximum estimation error of -8.28\% and an average absolute estimation error is 4.88\% over six benchmarks. Finally, we prove that energy per operation (EPO) decreases with increasing operations per clock cycle, and we confirm the relationship empirically

    HW-SW Emulation Framework for Temperature-Aware Design in MPSoCs

    Get PDF
    New tendencies envisage Multi-Processor Systems-On-Chip (MPSoCs) as a promising solution for the consumer electronics market. MPSoCs are complex to design, as they must execute multiple applications (games, video), while meeting additional design constraints (energy consumption, time-to-market). Moreover, the rise of temperature in the die for MPSoCs can seriously affect their final performance and reliability. In this paper, we present a new hardware-software emulation framework that allows designers a complete exploration of the thermal behavior of final MPSoC designs early in the design flow. The proposed framework uses FPGA emulation as the key element to model the hardware components of the considered MPSoC platform at multi-megahertz speeds. It automatically extracts detailed system statistics that are used as input to our software thermal library running in a host computer. This library calculates at run-time the temperature of on-chip components, based on the collected statistics from the emulated system and the final floorplan of the MPSoC. This enables fast testing of various thermal management techniques. Our results show speed-ups of three orders of magnitude compared to cycle-accurate MPSoC simulator

    A prototype node for wireless vision sensor network applications development

    Get PDF
    This paper presents a prototype vision-enabled sensor node based on a commercial vision system of reduced size and power consumption. The wireless infrastructure for the deployment of a distributed smart camera network based on these nodes is provided by commercial motes. The smart camera, based on a low-power bio-inspired processing scheme, enables in-node image processing and vision tools. This permits to elaborate a lighter representation of the scene, keeping the relevant information in terms of detected elements, features and events, alleviating the data transmission through the network. Therefore by passing only the relevant information to the neighboring sensor nodes, distributed and collaborative vision is possible with the limited data rates available in commercial wireless sensor networks. Communication between the different components of the system is supported by the available UARTs and GPIOs. Several examples of in-node image processing and feature detection has been tested in the prototype, and information at different abstraction levels has been broadcasted to the network.Junta de AndalucĂ­a 2006-TIC-2352Ministerio de Ciencia e InnovaciĂłn TEC2009-1181

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    Life Cycle Aware Computing: Reusing Silicon Technology

    Get PDF
    Despite the high costs associated with processor manufacturing, the typical chip is used for only a fraction of its expected lifetime. Reusing processors would create a food chain of electronic devices that amortizes the energy required to build chips over several computing generations

    Loop Nest Splitting for WCET-Optimization and Predictability Improvement

    Get PDF
    This paper presents the influence of the loop nest splitting source code optimization on the worst-case execution time (WCET). Loop nest splitting minimizes the number of executed if-statements in loop nests of embedded multimedia applications. Especially loops and if-statements of high-level languages are an inherent source of unpredictability and loss of precision for WCET analysis. This is caused by the fact that it is difficult to obtain safe and tight worst-case estimates of an application\u27s flow of control through these high-level constructs. In addition, the corresponding control flow redirections expressed at the assembly level reduce predictability even more due to the complex pipeline and branch prediction behavior of modern embedded processors. The analysis techniques for loop nest splitting are based on precise mathematical models combined with genetic algorithms. On the one hand, these techniques achieve a significantly more homogeneous structure of the control flow. On the other hand, the precision of our analyses leads to the generation of very accurate high-level flow facts for loops and if-statements. The application of our implemented algorithms to three real-life multimedia benchmarks leads to average speed-ups by 25.0% - 30.1%, while WCET is reduced between 34.0% and 36.3%

    Canada’s Smallest Satellite: The Canadian Advanced Nanospace Experiment (CanX-1)

    Get PDF
    The Canadian Advanced Nanospace eXperiment (CanX) Program of the Space Flight Laboratory at the University of Toronto Institute for Aerospace Studies (UTIAS/SFL) is a Canadian first, allowing engineering researchers to test nano- and micro-scale devices rapidly and inexpensively in space. CanX is a “picosatellite” program for research and education, with graduate students leading the design, development, testing, and operations of Canada’s smallest satellites having a mass under 1 kg. The first UTIAS/SFL picosatellite, CanX-1, is scheduled for launch in early 2003 together with CubeSats from other university and industry developers. The objective of the CanX-1 mission is to verify the functionality of several novel electronic technologies in orbital space. This paper outlines the features, capabilities and performance of CanX-1, including horizon and star-tracking experiments using two CMOS imagers, active threeaxis magnetic stabilization, GPS-based position determination, and an ARM7 central processor

    Dynamically reconfigurable asynchronous processor

    Get PDF
    The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13Όm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks

    A framework to experiment optimizations for real-time and embedded software

    Get PDF
    Typical constraints on embedded systems include code size limits, upper bounds on energy consumption and hard or soft deadlines. To meet these requirements, it may be necessary to improve the software by applying various kinds of transformations like compiler optimizations, specific mapping of code and data in the available memories, code compression, etc. However, a transformation that aims at improving the software with respect to a given criterion might engender side effects on other criteria and these effects must be carefully analyzed. For this purpose, we have developed a common framework that makes it possible to experiment various code transfor-mations and to evaluate their impact of various criteria. This work has been carried out within the French ANR MORE project.Comment: International Conference on Embedded Real Time Software and Systems (ERTS2), Toulouse : France (2010
    • 

    corecore