Search CORE

23,370 research outputs found

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

Optimizing the flash-RAM energy trade-off in deeply embedded systems

Author: Eder Kerstin
Hollis Simon
Pallister James
Publication venue
Publication date: 03/06/2014
Field of study

Deeply embedded systems often have the tightest constraints on energy consumption, requiring that they consume tiny amounts of current and run on batteries for years. However, they typically execute code directly from flash, instead of the more energy efficient RAM. We implement a novel compiler optimization that exploits the relative efficiency of RAM by statically moving carefully selected basic blocks from flash to RAM. Our technique uses integer linear programming, with an energy cost model to select a good set of basic blocks to place into RAM, without impacting stack or data storage. We evaluate our optimization on a common ARM microcontroller and succeed in reducing the average power consumption by up to 41% and reducing energy consumption by up to 22%, while increasing execution time. A case study is presented, where an application executes code then sleeps for a period of time. For this example we show that our optimization could allow the application to run on battery for up to 32% longer. We also show that for this scenario the total application energy can be reduced, even if the optimization increases the execution time of the code

arXiv.org e-Print Archive

Explore Bristol Research

A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

Author: Benini Luca
Conti Francesco
Flamand Eric
Loquercio Antonio
Palossi Daniele
Scaramuzza Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/08/2018
Field of study

Fully-autonomous miniaturized robots (e.g., drones), with artificial intelligence (AI) based visual navigation capabilities are extremely challenging drivers of Internet-of-Things edge intelligence capabilities. Visual navigation based on AI approaches, such as deep neural networks (DNNs) are becoming pervasive for standard-size drones, but are considered out of reach for nanodrones with size of a few cm

{}^\mathrm{2}

. In this work, we present the first (to the best of our knowledge) demonstration of a navigation engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based visual navigation. To achieve this goal we developed a complete methodology for parallel execution of complex DNNs directly on-bard of resource-constrained milliwatt-scale nodes. Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and a 27 g commercial, open-source CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average. Our navigation engine is flexible and can be used to span a wide performance range: at its peak performance corner it achieves 18 fps while still consuming on average just 3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication in the IEEE Internet of Things Journal (IEEE IOTJ

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

ZORA

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

Recommended from our members

Design Space Exploration in Cyber-Physical Systems

Author: Amir Maral
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Cyber physical systems (CPS) integrate a variety of engineering areas such as control, mechanical and computer engineering in a holistic design effort. While interdependencies between the different disciplines are key attributes of CPS design science, little is known about the impact of design decisions of the cyber part on the overall system qualities. To investigate these interdependencies, this paper proposes a simulation-based Design Space Exploration (DSE) framework that considers detailed cyber system parameters such as cache size, bus width, and voltage levels in addition to physical and control parameters of the CPS. We propose an exploration algorithm that surfs the parameter configurations in the cyber physical sub-systems, in order to approximate the Pareto-optimal design points with regards to the trade-os among the design objectives, such as energy consumption and control stability. We apply the proposed framework to a network control system for an inverted-pendulum application. The presented holistic evaluation of the identified Pareto-points reveals the presence of non-trivial trade-os, which are imposed by the control, physical, and detailed cyber parameters. For instance the identified energy and control optimal design points comprise configurations with a wide range of CPU speeds, sample times and cache configuration following non-trivial zig-zag patterns. The proposed framework could identify and manage those trade-os and, as a result, is an imperative rst step to automate the search for superior CSP configurations

eScholarship - University of California