Search CORE

229 research outputs found

Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software

Author: Marwedel Peter
Wehmeyer Lars
Publication venue
Publication date: 25/10/2007
Field of study

Safety-critical embedded systems having to meet real-time constraints are expected to be highly predictable in order to guarantee at design time that certain timing deadlines will always be met. This requirement usually prevents designers from utilizing caches due to their highly dynamic, thus hardly predictable behavior. The integration of scratchpad memories represents an alternative approach which allows the system to benefit from a performance gain comparable to that of caches while at the same time maintaining predictability. In this work, we compare the impact of scratchpad memories and caches on worst case execution time (WCET) analysis results. We show that caches, despite requiring complex techniques, can have a negative impact on the predicted WCET, while the estimated WCET for scratchpad memories scales with the achieved Performance gain at no extra analysis cost.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

arXiv.org e-Print Archive

CiteSeerX

An improved instruction-level power model for ARM11 microprocessor

Author: Wang Wei
Zwolinski Mark
Publication venue
Publication date: 23/01/2014
Field of study

The power and energy consumed by a chip has become the primary design constraint for embedded systems, which has led to a lot of work in hardware design techniques such as clock gating and power gating. The software can also affect the power usage of a chip, hence good software design can be used to reduce the power further. In this paper we present an instruction-level power model based on an ARM1176JZF-S processor to predict the power of software applications. Our model takes substantially less input data than existing high accuracy models and does not need to consider each instruction individually. We show that the power is related to both the distribution of instruction types and the operations per clock cycle (OPC) of the program. Our model does not need to consider the effect of two adjacent instructions, which saves a lot of calculation and measurements. Pipeline stall effects are also considered by OPC instead of cache miss, because there are a lot of other reasons that can cause the pipeline to stall. The model shows good performance with a maximum estimation error of -8.28\% and an average absolute estimation error is 4.88\% over six benchmarks. Finally, we prove that energy per operation (EPO) decreases with increasing operations per clock cycle, and we confirm the relationship empirically

Southampton (e-Prints Soton)

HW-SW Emulation Framework for Temperature-Aware Design in MPSoCs

Author: Braun G.
Brooks D.
David Atienza
Floyd E. A.
Francesco Poletti
Giacomo Paci
Giovanni De Micheli
Jalabert A.
Jose M. Mendias
Luca Benini
Pablo G. Del Valle
Paci G.
Rohou E.
Roman Hermida
Skadron K.
Vandevelde B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

New tendencies envisage Multi-Processor Systems-On-Chip (MPSoCs) as a promising solution for the consumer electronics market. MPSoCs are complex to design, as they must execute multiple applications (games, video), while meeting additional design constraints (energy consumption, time-to-market). Moreover, the rise of temperature in the die for MPSoCs can seriously affect their final performance and reliability. In this paper, we present a new hardware-software emulation framework that allows designers a complete exploration of the thermal behavior of final MPSoC designs early in the design flow. The proposed framework uses FPGA emulation as the key element to model the hardware components of the considered MPSoC platform at multi-megahertz speeds. It automatically extracts detailed system statistics that are used as input to our software thermal library running in a host computer. This library calculates at run-time the temperature of on-chip components, based on the collected statistics from the emulated system and the final floorplan of the MPSoC. This enables fast testing of various thermal management techniques. Our results show speed-ups of three orders of magnitude compared to cycle-accurate MPSoC simulator

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A prototype node for wireless vision sensor network applications development

Author: Bakkali M.
Carmona Galán Ricardo
Rodríguez Vázquez Ángel Benito
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

This paper presents a prototype vision-enabled sensor node based on a commercial vision system of reduced size and power consumption. The wireless infrastructure for the deployment of a distributed smart camera network based on these nodes is provided by commercial motes. The smart camera, based on a low-power bio-inspired processing scheme, enables in-node image processing and vision tools. This permits to elaborate a lighter representation of the scene, keeping the relevant information in terms of detected elements, features and events, alleviating the data transmission through the network. Therefore by passing only the relevant information to the neighboring sensor nodes, distributed and collaborative vision is possible with the limited data rates available in commercial wireless sensor networks. Communication between the different components of the system is supported by the available UARTs and GPIOs. Several examples of in-node image processing and feature detection has been tested in the prototype, and information at different abstraction levels has been broadcasted to the network.Junta de Andalucía 2006-TIC-2352Ministerio de Ciencia e Innovación TEC2009-1181

Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

Author: Khawam Sami
Publication venue: The University of Edinburgh
Publication date: 01/01/2006
Field of study

Life Cycle Aware Computing: Reusing Silicon Technology

Author: Akella Venkatesh
Amirtharajah Rajeevan
Chong Frederic T.
Geyer Roland
Oliver John Y.
Publication venue: DigitalCommons@CalPoly
Publication date: 01/12/2007
Field of study

Despite the high costs associated with processor manufacturing, the typical chip is used for only a fraction of its expected lifetime. Reusing processors would create a food chain of electronic devices that amortizes the energy required to build chips over several computing generations

Loop Nest Splitting for WCET-Optimization and Predictability Improvement

Author: Compilers and WCET
Falk Heiko
Schwarzer Martin
Publication venue: OASIcs - OpenAccess Series in Informatics. 6th International Workshop on Worst-Case Execution Time Analysis (WCET\u2706)
Publication date: 01/01/2006
Field of study

This paper presents the influence of the loop nest splitting source code optimization on the worst-case execution time (WCET). Loop nest splitting minimizes the number of executed if-statements in loop nests of embedded multimedia applications. Especially loops and if-statements of high-level languages are an inherent source of unpredictability and loss of precision for WCET analysis. This is caused by the fact that it is difficult to obtain safe and tight worst-case estimates of an application\u27s flow of control through these high-level constructs. In addition, the corresponding control flow redirections expressed at the assembly level reduce predictability even more due to the complex pipeline and branch prediction behavior of modern embedded processors. The analysis techniques for loop nest splitting are based on precise mathematical models combined with genetic algorithms. On the one hand, these techniques achieve a significantly more homogeneous structure of the control flow. On the other hand, the precision of our analyses leads to the generation of very accurate high-level flow facts for loops and if-statements. The application of our implemented algorithms to three real-life multimedia benchmarks leads to average speed-ups by 25.0% - 30.1%, while WCET is reduced between 34.0% and 36.3%

Dagstuhl Research Online Publication Server

Canada’s Smallest Satellite: The Canadian Advanced Nanospace Experiment (CanX-1)

Author: Jeans Tiger
Stras Luke
Wells James
Publication venue: DigitalCommons@USU
Publication date: 14/08/2002
Field of study

The Canadian Advanced Nanospace eXperiment (CanX) Program of the Space Flight Laboratory at the University of Toronto Institute for Aerospace Studies (UTIAS/SFL) is a Canadian first, allowing engineering researchers to test nano- and micro-scale devices rapidly and inexpensively in space. CanX is a “picosatellite” program for research and education, with graduate students leading the design, development, testing, and operations of Canada’s smallest satellites having a mass under 1 kg. The first UTIAS/SFL picosatellite, CanX-1, is scheduled for launch in early 2003 together with CubeSats from other university and industry developers. The objective of the CanX-1 mission is to verify the functionality of several novel electronic technologies in orbital space. This paper outlines the features, capabilities and performance of CanX-1, including horizon and star-tracking experiments using two CMOS imagers, active threeaxis magnetic stabilization, GPS-based position determination, and an ARM7 central processor

Dynamically reconfigurable asynchronous processor

Author: Fawaz Khodor Ahmad
Publication venue: The University of Edinburgh
Publication date: 25/06/2012
Field of study

The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13μm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks

A framework to experiment optimizations for real-time and embedded software

Author: Cassé Hugues
Heydemann Karine
Ozaktas Haluk
Ponroy Jonathan
Rochange Christine
Zendra Olivier
Publication venue
Publication date: 01/01/2010
Field of study

Typical constraints on embedded systems include code size limits, upper bounds on energy consumption and hard or soft deadlines. To meet these requirements, it may be necessary to improve the software by applying various kinds of transformations like compiler optimizations, specific mapping of code and data in the available memories, code compression, etc. However, a transformation that aims at improving the software with respect to a given criterion might engender side effects on other criteria and these effects must be carefully analyzed. For this purpose, we have developed a common framework that makes it possible to experiment various code transfor-mations and to evaluate their impact of various criteria. This work has been carried out within the French ANR MORE project.Comment: International Conference on Embedded Real Time Software and Systems (ERTS2), Toulouse : France (2010

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server