229 research outputs found
Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software
Safety-critical embedded systems having to meet real-time constraints are
expected to be highly predictable in order to guarantee at design time that
certain timing deadlines will always be met. This requirement usually prevents
designers from utilizing caches due to their highly dynamic, thus hardly
predictable behavior. The integration of scratchpad memories represents an
alternative approach which allows the system to benefit from a performance gain
comparable to that of caches while at the same time maintaining predictability.
In this work, we compare the impact of scratchpad memories and caches on worst
case execution time (WCET) analysis results. We show that caches, despite
requiring complex techniques, can have a negative impact on the predicted WCET,
while the estimated WCET for scratchpad memories scales with the achieved
Performance gain at no extra analysis cost.Comment: Submitted on behalf of EDAA (http://www.edaa.com/
An improved instruction-level power model for ARM11 microprocessor
The power and energy consumed by a chip has become the primary design constraint for embedded systems, which has led to a lot of work in hardware design techniques such as clock gating and power gating. The software can also affect the power usage of a chip, hence good software design can be used to reduce the power further. In this paper we present an instruction-level power model based on an ARM1176JZF-S processor to predict the power of software applications. Our model takes substantially less input data than existing high accuracy models and does not need to consider each instruction individually. We show that the power is related to both the distribution of instruction types and the operations per clock cycle (OPC) of the program. Our model does not need to consider the effect of two adjacent instructions, which saves a lot of calculation and measurements. Pipeline stall effects are also considered by OPC instead of cache miss, because there are a lot of other reasons that can cause the pipeline to stall. The model shows good performance with a maximum estimation error of -8.28\% and an average absolute estimation error is 4.88\% over six benchmarks. Finally, we prove that energy per operation (EPO) decreases with increasing operations per clock cycle, and we confirm the relationship empirically
HW-SW Emulation Framework for Temperature-Aware Design in MPSoCs
New tendencies envisage Multi-Processor Systems-On-Chip (MPSoCs) as a promising solution for the consumer electronics market. MPSoCs are complex to design, as they must execute multiple applications (games, video), while meeting additional design constraints (energy consumption, time-to-market). Moreover, the rise of temperature in the die for MPSoCs can seriously affect their final performance and reliability. In this paper, we present a new hardware-software emulation framework that allows designers a complete exploration of the thermal behavior of final MPSoC designs early in the design flow. The proposed framework uses FPGA emulation as the key element to model the hardware components of the considered MPSoC platform at multi-megahertz speeds. It automatically extracts detailed system statistics that are used as input to our software thermal library running in a host computer. This library calculates at run-time the temperature of on-chip components, based on the collected statistics from the emulated system and the final floorplan of the MPSoC. This enables fast testing of various thermal management techniques. Our results show speed-ups of three orders of magnitude compared to cycle-accurate MPSoC simulator
A prototype node for wireless vision sensor network applications development
This paper presents a prototype vision-enabled sensor node based on a commercial vision system of reduced size and power consumption. The wireless infrastructure for the deployment of a distributed smart camera network based on these nodes is provided by commercial motes. The smart camera, based on a low-power bio-inspired processing scheme, enables in-node image processing and vision tools. This permits to elaborate a lighter representation of the scene, keeping the relevant information in terms of detected elements, features and events, alleviating the data transmission through the network. Therefore by passing only the relevant information to the neighboring sensor nodes, distributed and collaborative vision is possible with the limited data rates available in commercial wireless sensor networks. Communication between the different components of the system is supported by the available UARTs and GPIOs. Several examples of in-node image processing and feature detection has been tested in the prototype, and information at different abstraction levels has been broadcasted to the network.Junta de AndalucĂa 2006-TIC-2352Ministerio de Ciencia e InnovaciĂłn TEC2009-1181
Life Cycle Aware Computing: Reusing Silicon Technology
Despite the high costs associated with processor manufacturing, the typical chip is used for only a fraction of its expected lifetime. Reusing processors would create a food chain of electronic devices that amortizes the energy required to build chips over several computing generations
Loop Nest Splitting for WCET-Optimization and Predictability Improvement
This paper presents the influence of the loop nest splitting source code optimization on the worst-case execution time (WCET). Loop nest splitting minimizes the number of executed if-statements in loop nests of embedded multimedia applications. Especially loops and if-statements of high-level languages are an inherent source of unpredictability and loss of precision for WCET analysis. This is caused by the fact that it is difficult to obtain safe and tight worst-case estimates of an application\u27s flow of control through these high-level constructs. In addition, the corresponding control flow redirections expressed at the assembly level reduce predictability even more due to the complex pipeline and branch prediction behavior of modern embedded processors.
The analysis techniques for loop nest splitting are based on precise mathematical models combined with genetic algorithms. On the one hand, these techniques achieve a significantly more homogeneous structure of the control flow. On the other hand, the precision of our analyses leads to the generation of very accurate high-level flow facts for loops and if-statements. The application of our implemented algorithms to three real-life multimedia benchmarks leads to average speed-ups by 25.0% - 30.1%, while WCET is reduced between 34.0% and 36.3%
Canadaâs Smallest Satellite: The Canadian Advanced Nanospace Experiment (CanX-1)
The Canadian Advanced Nanospace eXperiment (CanX) Program of the Space Flight Laboratory at the University of Toronto Institute for Aerospace Studies (UTIAS/SFL) is a Canadian first, allowing engineering researchers to test nano- and micro-scale devices rapidly and inexpensively in space. CanX is a âpicosatelliteâ program for research and education, with graduate students leading the design, development, testing, and operations of Canadaâs smallest satellites having a mass under 1 kg. The first UTIAS/SFL picosatellite, CanX-1, is scheduled for launch in early 2003 together with CubeSats from other university and industry developers. The objective of the CanX-1 mission is to verify the functionality of several novel electronic technologies in orbital space. This paper outlines the features, capabilities and performance of CanX-1, including horizon and star-tracking experiments using two CMOS imagers, active threeaxis magnetic stabilization, GPS-based position determination, and an ARM7 central processor
Dynamically reconfigurable asynchronous processor
The main design requirements for today's mobile applications are:
· high throughput performance.
· high energy efficiency.
· high programmability.
Until now, the choice of platform has often been limited to Application-Specific
Integrated Circuits (ASICs), due to their best-of-breed performance and power
consumption. The economies of scale possible with these high-volume markets have
traditionally been able to hide the high Non-Recurring Engineering (NRE) costs
required for designing and fabricating new ASICs. However, with the NREs and
design time escalating with each generation of mobile applications, this practice may
be reaching its limit.
Designers today are looking at programmable solutions, so that they can respond
more rapidly to changes in the market and spread costs over several generations of
mobile applications. However, there have been few feasible alternatives to ASICs:
Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput
requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much
area and power.
Coarse-grained dynamically reconfigurable architectures offer better solutions for
high throughput applications, when power and area considerations are taken into
account. One promising example is the Reconfigurable Instruction Cell Array
(RICA). RICA consists of an array of cells with an interconnect that can be
dynamically reconfigured on every cycle. This allows quite complex datapaths to be
rendered onto the fabric and executed in a single configuration - making these
architectures particularly suitable to stream processing. Furthermore, RICA can be
programmed from C, making it a good fit with existing design methodologies.
However the RICA architecture has a drawback: poor scalability in terms of area and
power. As the core gets bigger, the number of sequential elements in the array must
be increased significantly to maintain the ability to achieve high throughputs through
pipelining. As a result, a larger clock tree is required to synchronise the increased
number of sequential elements. The clock tree therefore takes up a larger percentage
of the area and power consumption of the core.
This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor
(DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA
architecture, but uses asynchronous design techniques - methods of designing digital
systems without clocks. The absence of a global clock signal makes DRAP more
scalable in terms of power and area overhead than its synchronous counterpart.
The DRAP architecture maintains most of the benefits of custom asynchronous
design, whilst also providing programmability via conventional high-level languages.
Results show that the DRAP processor delivers considerably lower power
consumption when compared to a market-leading Very Long Instruction Word
(VLIW) processor and a low-power ARM processor. For example, DRAP resulted in
a reduction in power consumption of 20 times compared to the ARM7 processor, and
29 times compared to the TIC64x VLIW, when running the same benchmark capped
to the same throughput and for the same process technology (0.13ÎŒm). When
compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but
resulted in a power reduction of up to 1.9 times. It was also capable of achieving up
to 2.8 times higher throughputs than RICA for the same benchmarks
A framework to experiment optimizations for real-time and embedded software
Typical constraints on embedded systems include code size limits, upper
bounds on energy consumption and hard or soft deadlines. To meet these
requirements, it may be necessary to improve the software by applying various
kinds of transformations like compiler optimizations, specific mapping of code
and data in the available memories, code compression, etc. However, a
transformation that aims at improving the software with respect to a given
criterion might engender side effects on other criteria and these effects must
be carefully analyzed. For this purpose, we have developed a common framework
that makes it possible to experiment various code transfor-mations and to
evaluate their impact of various criteria. This work has been carried out
within the French ANR MORE project.Comment: International Conference on Embedded Real Time Software and Systems
(ERTS2), Toulouse : France (2010
- âŠ